ExamGecko

MLS-C01: AWS Certified Machine Learning - Specialty

AWS Certified Machine Learning - Specialty
Vendor:

Amazon

AWS Certified Machine Learning - Specialty Exam Questions: 308
AWS Certified Machine Learning - Specialty   2.370 Learners
Take Practice Tests
Comming soon
PDF | VPLUS

The AWS Certified Machine Learning - Specialty (MLS-C01) exam is a crucial certification for anyone aiming to advance their career in machine learning on AWS. Our topic is your ultimate resource for MLS-C01 practice test shared by individuals who have successfully passed the exam. These practice tests provide real-world scenarios and invaluable insights to help you ace your preparation.

Why Use MLS-C01 Practice Test?

  • Real Exam Experience: Our practice test accurately replicates the format and difficulty of the actual AWS MLS-C01 exam, providing you with a realistic preparation experience.

  • Identify Knowledge Gaps: Practicing with these tests helps you identify areas where you need more study, allowing you to focus your efforts effectively.

  • Boost Confidence: Regular practice with exam-like questions builds your confidence and reduces test anxiety.

  • Track Your Progress: Monitor your performance over time to see your improvement and adjust your study plan accordingly.

Key Features of MLS-C01 Practice Test:

  • Up-to-Date Content: Our community ensures that the questions are regularly updated to reflect the latest exam objectives and technology trends.

  • Detailed Explanations: Each question comes with detailed explanations, helping you understand the correct answers and learn from any mistakes.

  • Comprehensive Coverage: The practice test covers all key topics of the AWS MLS-C01 exam, including machine learning models, data processing, and model deployment.

  • Customizable Practice: Create your own practice sessions based on specific topics or difficulty levels to tailor your study experience to your needs.

Exam number: MLS-C01

Exam name: AWS Certified Machine Learning – Specialty

Length of test: 180 minutes

Exam format: Multiple-choice and multiple-response questions.

Exam language: English

Number of questions in the actual exam: Maximum of 65 questions

Passing score: 750/1000

Use the member-shared AWS MLS-C01 Practice Test to ensure you’re fully prepared for your certification exam. Start practicing today and take a significant step towards achieving your certification goals!

Related questions

A Machine Learning Specialist is using Apache Spark for pre-processing training data As part of the Spark pipeline, the Specialist wants to use Amazon SageMaker for training a model and hosting it Which of the following would the Specialist do to integrate the Spark application with SageMaker? (Select THREE)

A.
Download the AWS SDK for the Spark environment
A.
Download the AWS SDK for the Spark environment
Answers
B.
Install the SageMaker Spark library in the Spark environment.
B.
Install the SageMaker Spark library in the Spark environment.
Answers
C.
Use the appropriate estimator from the SageMaker Spark Library to train a model.
C.
Use the appropriate estimator from the SageMaker Spark Library to train a model.
Answers
D.
Compress the training data into a ZIP file and upload it to a pre-defined Amazon S3 bucket.
D.
Compress the training data into a ZIP file and upload it to a pre-defined Amazon S3 bucket.
Answers
E.
Use the sageMakerModel. transform method to get inferences from the model hosted in SageMaker
E.
Use the sageMakerModel. transform method to get inferences from the model hosted in SageMaker
Answers
F.
Convert the DataFrame object to a CSV file, and use the CSV file as input for obtaining inferences from SageMaker.
F.
Convert the DataFrame object to a CSV file, and use the CSV file as input for obtaining inferences from SageMaker.
Answers
Suggested answer: B, C, E

Explanation:

The SageMaker Spark library is a library that enables Apache Spark applications to integrate with Amazon SageMaker for training and hosting machine learning models. The library provides several features, such as:

Estimators: Classes that allow Spark users to train Amazon SageMaker models and host them on Amazon SageMaker endpoints using the Spark MLlib Pipelines API. The library supports various built-in algorithms, such as linear learner, XGBoost, K-means, etc., as well as custom algorithms using Docker containers.

Model classes: Classes that wrap Amazon SageMaker models in a Spark MLlib Model abstraction. This allows Spark users to use Amazon SageMaker endpoints for inference within Spark applications.

Data sources: Classes that allow Spark users to read data from Amazon S3 using the Spark Data Sources API. The library supports various data formats, such as CSV, LibSVM, RecordIO, etc.

To integrate the Spark application with SageMaker, the Machine Learning Specialist should do the following:

Install the SageMaker Spark library in the Spark environment. This can be done by using Maven, pip, or downloading the JAR file from GitHub.

Use the appropriate estimator from the SageMaker Spark Library to train a model. For example, to train a linear learner model, the Specialist can use the following code:

Use the sageMakerModel. transform method to get inferences from the model hosted in SageMaker. For example, to get predictions for a test DataFrame, the Specialist can use the following code:

References:

[SageMaker Spark]: A documentation page that introduces the SageMaker Spark library and its features.

[SageMaker Spark GitHub Repository]: A GitHub repository that contains the source code, examples, and installation instructions for the SageMaker Spark library.

asked 16/09/2024
Maryna Zarytska
30 questions

A data scientist receives a collection of insurance claim records. Each record includes a claim ID. the final outcome of the insurance claim, and the date of the final outcome.

The final outcome of each claim is a selection from among 200 outcome categories. Some claim records include only partial information. However, incomplete claim records include only 3 or 4 outcome ...gones from among the 200 available outcome categories. The collection includes hundreds of records for each outcome category. The records are from the previous 3 years.

The data scientist must create a solution to predict the number of claims that will be in each outcome category every month, several months in advance.

Which solution will meet these requirements?

Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member

A real estate company wants to create a machine learning model for predicting housing prices based on a historical dataset. The dataset contains 32 features.

Which model will meet the business requirement?

A.
Logistic regression
A.
Logistic regression
Answers
B.
Linear regression
B.
Linear regression
Answers
C.
K-means
C.
K-means
Answers
D.
Principal component analysis (PCA)
D.
Principal component analysis (PCA)
Answers
Suggested answer: B

Explanation:

The best model for predicting housing prices based on a historical dataset with 32 features is linear regression. Linear regression is a supervised learning algorithm that fits a linear relationship between a dependent variable (housing price) and one or more independent variables (features). Linear regression can handle multiple features and output a continuous value for the housing price. Linear regression can also return the coefficients of the features, which indicate how each feature affects the housing price. Linear regression is suitable for this problem because the outcome of interest is numerical and continuous, and the model needs to capture the linear relationship between the features and the outcome.

References:

AWS Machine Learning Specialty Exam Guide

AWS Machine Learning Training - Regression vs Classification in Machine Learning

AWS Machine Learning Training - Linear Regression with Amazon SageMaker

asked 16/09/2024
Omar Olaya
38 questions

IT leadership wants Jo transition a company's existing machine learning data storage environment to AWS as a temporary ad hoc solution The company currently uses a custom software process that heavily leverages SOL as a query language and exclusively stores generated csv documents for machine learning

The ideal state for the company would be a solution that allows it to continue to use the current workforce of SQL experts The solution must also support the storage of csv and JSON files, and be able to query over semi-structured data The following are high priorities for the company:

* Solution simplicity

* Fast development time

* Low cost

* High flexibility

What technologies meet the company's requirements?

A.
Amazon S3 and Amazon Athena
A.
Amazon S3 and Amazon Athena
Answers
B.
Amazon Redshift and AWS Glue
B.
Amazon Redshift and AWS Glue
Answers
C.
Amazon DynamoDB and DynamoDB Accelerator (DAX)
C.
Amazon DynamoDB and DynamoDB Accelerator (DAX)
Answers
D.
Amazon RDS and Amazon ES
D.
Amazon RDS and Amazon ES
Answers
Suggested answer: A

Explanation:

Amazon S3 and Amazon Athena are technologies that meet the company's requirements for a temporary ad hoc solution for machine learning data storage and query. Amazon S3 and Amazon Athena have the following features and benefits:

Amazon S3 is a service that provides scalable, durable, and secure object storage for any type of data. Amazon S3 can store csv and JSON files, as well as other formats, and can handle large volumes of data with high availability and performance. Amazon S3 also integrates with other AWS services, such as Amazon Athena, for further processing and analysis of the data.

Amazon Athena is a service that allows querying data stored in Amazon S3 using standard SQL. Amazon Athena can query over semi-structured data, such as JSON, as well as structured data, such as csv, without requiring any loading or transformation. Amazon Athena is serverless, meaning that there is no infrastructure to manage and users only pay for the queries they run. Amazon Athena also supports the use of AWS Glue Data Catalog, which is a centralized metadata repository that can store and manage the schema and partition information of the data in Amazon S3.

Using Amazon S3 and Amazon Athena, the company can achieve the following high priorities:

Solution simplicity: Amazon S3 and Amazon Athena are easy to use and require minimal configuration and maintenance. The company can simply upload the csv and JSON files to Amazon S3 and use Amazon Athena to query them using SQL. The company does not need to worry about provisioning, scaling, or managing any servers or clusters.

Fast development time: Amazon S3 and Amazon Athena can enable the company to quickly access and analyze the data without any data preparation or loading. The company can use the existing workforce of SQL experts to write and run queries on Amazon Athena and get results in seconds or minutes.

Low cost: Amazon S3 and Amazon Athena are cost-effective and offer pay-as-you-go pricing models. Amazon S3 charges based on the amount of storage used and the number of requests made. Amazon Athena charges based on the amount of data scanned by the queries. The company can also reduce the costs by using compression, encryption, and partitioning techniques to optimize the data storage and query performance.

High flexibility: Amazon S3 and Amazon Athena are flexible and can support various data types, formats, and sources. The company can store and query any type of data in Amazon S3, such as csv, JSON, Parquet, ORC, etc. The company can also query data from multiple sources in Amazon S3, such as data lakes, data warehouses, log files, etc.

The other options are not as suitable as option A for the company's requirements for the following reasons:

Option B: Amazon Redshift and AWS Glue are technologies that can be used for data warehousing and data integration, but they are not ideal for a temporary ad hoc solution. Amazon Redshift is a service that provides a fully managed, petabyte-scale data warehouse that can run complex analytical queries using SQL. AWS Glue is a service that provides a fully managed extract, transform, and load (ETL) service that can prepare and load data for analytics. However, using Amazon Redshift and AWS Glue would require more effort and cost than using Amazon S3 and Amazon Athena. The company would need to load the data from Amazon S3 to Amazon Redshift using AWS Glue, which can take time and incur additional charges. The company would also need to manage the capacity and performance of the Amazon Redshift cluster, which can be complex and expensive.

Option C: Amazon DynamoDB and DynamoDB Accelerator (DAX) are technologies that can be used for fast and scalable NoSQL database and caching, but they are not suitable for the company's data storage and query needs. Amazon DynamoDB is a service that provides a fully managed, key-value and document database that can deliver single-digit millisecond performance at any scale. DynamoDB Accelerator (DAX) is a service that provides a fully managed, in-memory cache for DynamoDB that can improve the read performance by up to 10 times. However, using Amazon DynamoDB and DAX would not allow the company to continue to use SQL as a query language, as Amazon DynamoDB does not support SQL. The company would need to use the DynamoDB API or the AWS SDKs to access and query the data, which can require more coding and learning effort. The company would also need to transform the csv and JSON files into DynamoDB items, which can involve additional processing and complexity.

Option D: Amazon RDS and Amazon ES are technologies that can be used for relational database and search and analytics, but they are not optimal for the company's data storage and query scenario. Amazon RDS is a service that provides a fully managed, relational database that supports various database engines, such as MySQL, PostgreSQL, Oracle, etc. Amazon ES is a service that provides a fully managed, Elasticsearch cluster, which is mainly used for search and analytics purposes. However, using Amazon RDS and Amazon ES would not be as simple and cost-effective as using Amazon S3 and Amazon Athena. The company would need to load the data from Amazon S3 to Amazon RDS, which can take time and incur additional charges. The company would also need to manage the capacity and performance of the Amazon RDS and Amazon ES clusters, which can be complex and expensive. Moreover, Amazon RDS and Amazon ES are not designed to handle semi-structured data, such as JSON, as well as Amazon S3 and Amazon Athena.

References:

Amazon S3

Amazon Athena

Amazon Redshift

AWS Glue

Amazon DynamoDB

[DynamoDB Accelerator (DAX)]

[Amazon RDS]

[Amazon ES]

asked 16/09/2024
Tyler Evans
38 questions

A data engineer needs to provide a team of data scientists with the appropriate dataset to run machine learning training jobs. The data will be stored in Amazon S3. The data engineer is obtaining the data from an Amazon Redshift database and is using join queries to extract a single tabular dataset. A portion of the schema is as follows:

...traction Timestamp (Timeslamp)

...JName(Varchar)

...JNo (Varchar)

Th data engineer must provide the data so that any row with a CardNo value of NULL is removed. Also, the TransactionTimestamp column must be separated into a TransactionDate column and a isactionTime column Finally, the CardName column must be renamed to NameOnCard.

The data will be extracted on a monthly basis and will be loaded into an S3 bucket. The solution must minimize the effort that is needed to set up infrastructure for the ingestion and transformation. The solution must be automated and must minimize the load on the Amazon Redshift cluster

Which solution meets these requirements?

Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member

An engraving company wants to automate its quality control process for plaques. The company performs the process before mailing each customized plaque to a customer. The company has created an Amazon S3 bucket that contains images of defects that should cause a plaque to be rejected. Low-confidence predictions must be sent to an internal team of reviewers who are using Amazon Augmented Al (Amazon A2I).

Which solution will meet these requirements?

Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member

A company wants to detect credit card fraud. The company has observed that an average of 2% of credit card transactions are fraudulent. A data scientist trains a classifier on a year's worth of credit card transaction data. The classifier needs to identify the fraudulent transactions. The company wants to accurately capture as many fraudulent transactions as possible.

Which metrics should the data scientist use to optimize the classifier? (Select TWO.)

Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member

An e commerce company wants to launch a new cloud-based product recommendation feature for its web application. Due to data localization regulations, any sensitive data must not leave its on-premises data center, and the product recommendation model must be trained and tested using nonsensitive data only. Data transfer to the cloud must use IPsec. The web application is hosted on premises with a PostgreSQL database that contains all the data. The company wants the data to be uploaded securely to Amazon S3 each day for model retraining.

How should a machine learning specialist meet these requirements?

A.
Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest tables without sensitive data through an AWS Site-to-Site VPN connection directly into Amazon S3.
A.
Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest tables without sensitive data through an AWS Site-to-Site VPN connection directly into Amazon S3.
Answers
B.
Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest all data through an AWS Site- to-Site VPN connection into Amazon S3 while removing sensitive data using a PySpark job.
B.
Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest all data through an AWS Site- to-Site VPN connection into Amazon S3 while removing sensitive data using a PySpark job.
Answers
C.
Use AWS Database Migration Service (AWS DMS) with table mapping to select PostgreSQL tables with no sensitive data through an SSL connection. Replicate data directly into Amazon S3.
C.
Use AWS Database Migration Service (AWS DMS) with table mapping to select PostgreSQL tables with no sensitive data through an SSL connection. Replicate data directly into Amazon S3.
Answers
D.
Use PostgreSQL logical replication to replicate all data to PostgreSQL in Amazon EC2 through AWS Direct Connect with a VPN connection. Use AWS Glue to move data from Amazon EC2 to Amazon S3.
D.
Use PostgreSQL logical replication to replicate all data to PostgreSQL in Amazon EC2 through AWS Direct Connect with a VPN connection. Use AWS Glue to move data from Amazon EC2 to Amazon S3.
Answers
Suggested answer: C

Explanation:

The best option is to use AWS Database Migration Service (AWS DMS) with table mapping to select PostgreSQL tables with no sensitive data through an SSL connection. Replicate data directly into Amazon S3. This option meets the following requirements:

It ensures that only nonsensitive data is transferred to the cloud by using table mapping to filter out the tables that contain sensitive data1.

It uses IPsec to secure the data transfer by enabling SSL encryption for the AWS DMS endpoint2.

It uploads the data to Amazon S3 each day for model retraining by using the ongoing replication feature of AWS DMS3.

The other options are not as effective or feasible as the option above. Creating an AWS Glue job to connect to the PostgreSQL DB instance and ingest data through an AWS Site-to-Site VPN connection directly into Amazon S3 is possible, but it requires more steps and resources than using AWS DMS. Also, it does not specify how to filter out the sensitive data from the tables. Creating an AWS Glue job to connect to the PostgreSQL DB instance and ingest all data through an AWS Site-to-Site VPN connection into Amazon S3 while removing sensitive data using a PySpark job is also possible, but it is more complex and error-prone than using AWS DMS. Also, it does not use IPsec as required.Using PostgreSQL logical replication to replicate all data to PostgreSQL in Amazon EC2 through AWS Direct Connect with a VPN connection, and then using AWS Glue to move data from Amazon EC2 to Amazon S3 is not feasible, because PostgreSQL logical replication does not support replicating only a subset of data4. Also, it involves unnecessary data movement and additional costs.

References:

Table mapping - AWS Database Migration Service

Using SSL to encrypt a connection to a DB instance - AWS Database Migration Service

Ongoing replication - AWS Database Migration Service

Logical replication - PostgreSQL

asked 16/09/2024
isaac linares
26 questions

A machine learning specialist needs to analyze comments on a news website with users across the globe. The specialist must find the most discussed topics in the comments that are in either English or Spanish.

What steps could be used to accomplish this task? (Choose two.)

A.
Use an Amazon SageMaker BlazingText algorithm to find the topics independently from language. Proceed with the analysis.
A.
Use an Amazon SageMaker BlazingText algorithm to find the topics independently from language. Proceed with the analysis.
Answers
B.
Use an Amazon SageMaker seq2seq algorithm to translate from Spanish to English, if necessary. Use a SageMaker Latent Dirichlet Allocation (LDA) algorithm to find the topics.
B.
Use an Amazon SageMaker seq2seq algorithm to translate from Spanish to English, if necessary. Use a SageMaker Latent Dirichlet Allocation (LDA) algorithm to find the topics.
Answers
C.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Comprehend topic modeling to find the topics.
C.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Comprehend topic modeling to find the topics.
Answers
D.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Lex to extract topics form the content.
D.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Lex to extract topics form the content.
Answers
E.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon SageMaker Neural Topic Model (NTM) to find the topics.
E.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon SageMaker Neural Topic Model (NTM) to find the topics.
Answers
Suggested answer: C, E

Explanation:

To find the most discussed topics in the comments that are in either English or Spanish, the machine learning specialist needs to perform two steps: first, translate the comments from Spanish to English if necessary, and second, apply a topic modeling algorithm to the comments. The following options are valid ways to accomplish these steps using AWS services:

Option C: Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Comprehend topic modeling to find the topics. Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend topic modeling is a feature that automatically organizes a collection of text documents into topics that contain commonly used words and phrases.

Option E: Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon SageMaker Neural Topic Model (NTM) to find the topics. Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker Neural Topic Model (NTM) is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution.

The other options are not valid because:

Option A: Amazon SageMaker BlazingText algorithm is not a topic modeling algorithm, but a text classification and word embedding algorithm. It cannot find the topics independently from language, as different languages have different word distributions and semantics.

Option B: Amazon SageMaker seq2seq algorithm is not a translation algorithm, but a sequence-to-sequence learning algorithm that can be used for tasks such as summarization, chatbot, and question answering. Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm is a topic modeling algorithm, but it requires the input documents to be in the same language and preprocessed into a bag-of-words format.

Option D: Amazon Lex is not a topic modeling algorithm, but a service for building conversational interfaces into any application using voice and text. It cannot extract topics from the content, but only intents and slots based on a predefined bot configuration.References:

Amazon Translate

Amazon Comprehend

Amazon SageMaker

Amazon SageMaker Neural Topic Model (NTM) Algorithm

Amazon SageMaker BlazingText

Amazon SageMaker Seq2Seq

Amazon SageMaker Latent Dirichlet Allocation (LDA) Algorithm

Amazon Lex

asked 16/09/2024
Billy Raymond
31 questions

A machine learning (ML) engineer has created a feature repository in Amazon SageMaker Feature Store for the company. The company has AWS accounts for development, integration, and production. The company hosts a feature store in the development account. The company uses Amazon S3 buckets to store feature values offline. The company wants to share features and to allow the integration account and the production account to reuse the features that are in the feature repository.

Which combination of steps will meet these requirements? (Select TWO.)

Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member