ExamGecko
Home Home / Amazon / MLS-C01

Amazon MLS-C01 Practice Test - Questions Answers, Page 9

Question list
Search
Search

List of questions

Search

Related questions











A Machine Learning Specialist prepared the following graph displaying the results of k-means for k = [1:10]

Considering the graph, what is a reasonable selection for the optimal choice of k?

A.
1
A.
1
Answers
B.
4
B.
4
Answers
C.
7
C.
7
Answers
D.
10
D.
10
Answers
Suggested answer: B

Explanation:

The elbow method is a technique that we use to determine the number of centroids (k) to use in a k-means clustering algorithm. In this method, we plot the within-cluster sum of squares (WCSS) against the number of clusters (k) and look for the point where the curve bends sharply. This point is called the elbow point and it indicates that adding more clusters does not improve the model significantly. The graph in the question shows that the elbow point is at k = 4, which means that 4 is a reasonable choice for the optimal number of clusters.References:

Elbow Method for optimal value of k in KMeans: A tutorial on how to use the elbow method with Amazon SageMaker.

K-Means Clustering: A video that explains the concept and benefits of k-means clustering.

A company is using Amazon Polly to translate plaintext documents to speech for automated company announcements However company acronyms are being mispronounced in the current documents How should a Machine Learning Specialist address this issue for future documents?

A.
Convert current documents to SSML with pronunciation tags
A.
Convert current documents to SSML with pronunciation tags
Answers
B.
Create an appropriate pronunciation lexicon.
B.
Create an appropriate pronunciation lexicon.
Answers
C.
Output speech marks to guide in pronunciation
C.
Output speech marks to guide in pronunciation
Answers
D.
Use Amazon Lex to preprocess the text files for pronunciation
D.
Use Amazon Lex to preprocess the text files for pronunciation
Answers
Suggested answer: B

Explanation:

A pronunciation lexicon is a file that defines how words or phrases should be pronounced by Amazon Polly. A lexicon can help customize the speech output for words that are uncommon, foreign, or have multiple pronunciations. A lexicon must conform to the Pronunciation Lexicon Specification (PLS) standard and can be stored in an AWS region using the Amazon Polly API. To use a lexicon for synthesizing speech, the lexicon name must be specified in the <speak> SSML tag. For example, the following lexicon defines how to pronounce the acronym W3C:

<lexicon version=''1.0'' xmlns=''http://www.w3.org/2005/01/pronunciation-lexicon'' alphabet=''ipa'' xml:lang=''en-US''> <lexeme> <grapheme>W3C</grapheme> World Wide Web Consortium </lexeme> </lexicon>

To use this lexicon, the text input must include the following SSML tag:

<speak version=''1.1'' xmlns=''http://www.w3.org/2001/10/synthesis'' xml:lang=''en-US''> <voice name=''Joanna''> <lexicon name=''w3c_lexicon''/> The <say-as interpret-as=''characters''>W3C</say-as> is an international community that develops open standards to ensure the long-term growth of the Web. </voice> </speak>

References:

Customize pronunciation using lexicons in Amazon Polly: A blog post that explains how to use lexicons for creating custom pronunciations.

Managing Lexicons: A documentation page that describes how to store and retrieve lexicons using the Amazon Polly API.

A Machine Learning Specialist is using Apache Spark for pre-processing training data As part of the Spark pipeline, the Specialist wants to use Amazon SageMaker for training a model and hosting it Which of the following would the Specialist do to integrate the Spark application with SageMaker? (Select THREE)

A.
Download the AWS SDK for the Spark environment
A.
Download the AWS SDK for the Spark environment
Answers
B.
Install the SageMaker Spark library in the Spark environment.
B.
Install the SageMaker Spark library in the Spark environment.
Answers
C.
Use the appropriate estimator from the SageMaker Spark Library to train a model.
C.
Use the appropriate estimator from the SageMaker Spark Library to train a model.
Answers
D.
Compress the training data into a ZIP file and upload it to a pre-defined Amazon S3 bucket.
D.
Compress the training data into a ZIP file and upload it to a pre-defined Amazon S3 bucket.
Answers
E.
Use the sageMakerModel. transform method to get inferences from the model hosted in SageMaker
E.
Use the sageMakerModel. transform method to get inferences from the model hosted in SageMaker
Answers
F.
Convert the DataFrame object to a CSV file, and use the CSV file as input for obtaining inferences from SageMaker.
F.
Convert the DataFrame object to a CSV file, and use the CSV file as input for obtaining inferences from SageMaker.
Answers
Suggested answer: B, C, E

Explanation:

The SageMaker Spark library is a library that enables Apache Spark applications to integrate with Amazon SageMaker for training and hosting machine learning models. The library provides several features, such as:

Estimators: Classes that allow Spark users to train Amazon SageMaker models and host them on Amazon SageMaker endpoints using the Spark MLlib Pipelines API. The library supports various built-in algorithms, such as linear learner, XGBoost, K-means, etc., as well as custom algorithms using Docker containers.

Model classes: Classes that wrap Amazon SageMaker models in a Spark MLlib Model abstraction. This allows Spark users to use Amazon SageMaker endpoints for inference within Spark applications.

Data sources: Classes that allow Spark users to read data from Amazon S3 using the Spark Data Sources API. The library supports various data formats, such as CSV, LibSVM, RecordIO, etc.

To integrate the Spark application with SageMaker, the Machine Learning Specialist should do the following:

Install the SageMaker Spark library in the Spark environment. This can be done by using Maven, pip, or downloading the JAR file from GitHub.

Use the appropriate estimator from the SageMaker Spark Library to train a model. For example, to train a linear learner model, the Specialist can use the following code:

Use the sageMakerModel. transform method to get inferences from the model hosted in SageMaker. For example, to get predictions for a test DataFrame, the Specialist can use the following code:

References:

[SageMaker Spark]: A documentation page that introduces the SageMaker Spark library and its features.

[SageMaker Spark GitHub Repository]: A GitHub repository that contains the source code, examples, and installation instructions for the SageMaker Spark library.

A Machine Learning Specialist is working with a large cybersecurily company that manages security events in real time for companies around the world The cybersecurity company wants to design a solution that will allow it to use machine learning to score malicious events as anomalies on the data as it is being ingested The company also wants be able to save the results in its data lake for later processing and analysis

What is the MOST efficient way to accomplish these tasks'?

A.
Ingest the data using Amazon Kinesis Data Firehose, and use Amazon Kinesis Data Analytics Random Cut Forest (RCF) for anomaly detection Then use Kinesis Data Firehose to stream the results to Amazon S3
A.
Ingest the data using Amazon Kinesis Data Firehose, and use Amazon Kinesis Data Analytics Random Cut Forest (RCF) for anomaly detection Then use Kinesis Data Firehose to stream the results to Amazon S3
Answers
B.
Ingest the data into Apache Spark Streaming using Amazon EMR. and use Spark MLlib with k-means to perform anomaly detection Then store the results in an Apache Hadoop Distributed File System (HDFS) using Amazon EMR with a replication factor of three as the data lake
B.
Ingest the data into Apache Spark Streaming using Amazon EMR. and use Spark MLlib with k-means to perform anomaly detection Then store the results in an Apache Hadoop Distributed File System (HDFS) using Amazon EMR with a replication factor of three as the data lake
Answers
C.
Ingest the data and store it in Amazon S3 Use AWS Batch along with the AWS Deep Learning AMIs to train a k-means model using TensorFlow on the data in Amazon S3.
C.
Ingest the data and store it in Amazon S3 Use AWS Batch along with the AWS Deep Learning AMIs to train a k-means model using TensorFlow on the data in Amazon S3.
Answers
D.
Ingest the data and store it in Amazon S3. Have an AWS Glue job that is triggered on demand transform the new data Then use the built-in Random Cut Forest (RCF) model within Amazon SageMaker to detect anomalies in the data
D.
Ingest the data and store it in Amazon S3. Have an AWS Glue job that is triggered on demand transform the new data Then use the built-in Random Cut Forest (RCF) model within Amazon SageMaker to detect anomalies in the data
Answers
Suggested answer: A

Explanation:

Amazon Kinesis Data Firehose is a fully managed service that can capture, transform, and load streaming data into AWS data stores, such as Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk. It can also invoke AWS Lambda functions to perform custom transformations on the data. Amazon Kinesis Data Analytics is a service that can analyze streaming data in real time using SQL or Apache Flink applications. It can also use machine learning algorithms, such as Random Cut Forest (RCF), to perform anomaly detection on streaming data. RCF is an unsupervised learning algorithm that assigns an anomaly score to each data point based on how different it is from the rest of the data. By using Kinesis Data Firehose and Kinesis Data Analytics, the cybersecurity company can ingest the data in real time, score the malicious events as anomalies, and stream the results to Amazon S3, which can serve as a data lake for later processing and analysis. This is the most efficient way to accomplish these tasks, as it does not require any additional infrastructure, coding, or training.

References:

Amazon Kinesis Data Firehose - Amazon Web Services

Amazon Kinesis Data Analytics - Amazon Web Services

Anomaly Detection with Amazon Kinesis Data Analytics - Amazon Web Services

[AWS Certified Machine Learning - Specialty Sample Questions]

A Machine Learning Specialist works for a credit card processing company and needs to predict which transactions may be fraudulent in near-real time. Specifically, the Specialist must train a model that returns the probability that a given transaction may be fraudulent

How should the Specialist frame this business problem'?

A.
Streaming classification
A.
Streaming classification
Answers
B.
Binary classification
B.
Binary classification
Answers
C.
Multi-category classification
C.
Multi-category classification
Answers
D.
Regression classification
D.
Regression classification
Answers
Suggested answer: B

Explanation:

Binary classification is a type of supervised learning problem where the goal is to predict a categorical label that has only two possible values, such as Yes or No, True or False, Positive or Negative. In this case, the label is whether a transaction is fraudulent or not, which is a binary outcome. Binary classification can be used to estimate the probability of an observation belonging to a certain class, such as the probability of a transaction being fraudulent. This can help the business to make decisions based on the risk level of each transaction.References:

Binary Classification - Amazon Machine Learning

AWS Certified Machine Learning - Specialty Sample Questions

Amazon Connect has recently been tolled out across a company as a contact call center The solution has been configured to store voice call recordings on Amazon S3

The content of the voice calls are being analyzed for the incidents being discussed by the call operators Amazon Transcribe is being used to convert the audio to text, and the output is stored on Amazon S3

Which approach will provide the information required for further analysis?

A.
Use Amazon Comprehend with the transcribed files to build the key topics
A.
Use Amazon Comprehend with the transcribed files to build the key topics
Answers
B.
Use Amazon Translate with the transcribed files to train and build a model for the key topics
B.
Use Amazon Translate with the transcribed files to train and build a model for the key topics
Answers
C.
Use the AWS Deep Learning AMI with Gluon Semantic Segmentation on the transcribed files to train and build a model for the key topics
C.
Use the AWS Deep Learning AMI with Gluon Semantic Segmentation on the transcribed files to train and build a model for the key topics
Answers
D.
Use the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the transcribed files to generate a word embeddings dictionary for the key topics
D.
Use the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the transcribed files to generate a word embeddings dictionary for the key topics
Answers
Suggested answer: A

Explanation:

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. It can analyze text documents and identify the key topics, entities, sentiments, languages, and more. In this case, Amazon Comprehend can be used with the transcribed files from Amazon Transcribe to extract the main topics that are being discussed by the call operators. This can help to understand the common issues and concerns of the customers, and provide insights for further analysis and improvement.References:

Amazon Comprehend - Amazon Web Services

AWS Certified Machine Learning - Specialty Sample Questions

A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression During exploratory data analysis the Specialist observes that many features are highly correlated with each other This may make the model unstable

What should be done to reduce the impact of having such a large number of features?

A.
Perform one-hot encoding on highly correlated features
A.
Perform one-hot encoding on highly correlated features
Answers
B.
Use matrix multiplication on highly correlated features.
B.
Use matrix multiplication on highly correlated features.
Answers
C.
Create a new feature space using principal component analysis (PCA)
C.
Create a new feature space using principal component analysis (PCA)
Answers
D.
Apply the Pearson correlation coefficient
D.
Apply the Pearson correlation coefficient
Answers
Suggested answer: C

Explanation:

Principal component analysis (PCA) is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible. This is done by finding a new set of features called components, which are composites of the original features that are uncorrelated with one another. They are also constrained so that the first component accounts for the largest possible variability in the data, the second component the second most variability, and so on. By using PCA, the impact of having a large number of features that are highly correlated with each other can be reduced, as the new feature space will have fewer dimensions and less redundancy. This can make the linear models more stable and less prone to overfitting.References:

Principal Component Analysis (PCA) Algorithm - Amazon SageMaker

Perform a large-scale principal component analysis faster using Amazon SageMaker | AWS Machine Learning Blog

Machine Learning- Prinicipal Component Analysis | i2tutorials

A Machine Learning Specialist wants to determine the appropriate SageMaker Variant Invocations Per Instance setting for an endpoint automatic scaling configuration. The Specialist has performed a load test on a single instance and determined that peak requests per second (RPS) without service degradation is about 20 RPS As this is the first deployment, the Specialist intends to set the invocation safety factor to 0 5

Based on the stated parameters and given that the invocations per instance setting is measured on a per-minute basis, what should the Specialist set as the sageMaker variant invocations Per instance setting?

A.
10
A.
10
Answers
B.
30
B.
30
Answers
C.
600
C.
600
Answers
D.
2,400
D.
2,400
Answers
Suggested answer: C

Explanation:

The SageMaker Variant Invocations Per Instance setting is the target value for the average number of invocations per instance per minute for the model variant. It is used by the automatic scaling policy to add or remove instances to keep the metric close to the specified value. To determine this value, the following equation can be used in combination with load testing:

SageMakerVariantInvocationsPerInstance = (MAX_RPS * SAFETY_FACTOR) * 60

Where MAX_RPS is the maximum requests per second that the model variant can handle without service degradation, SAFETY_FACTOR is a factor that ensures that the clients do not exceed the maximum RPS, and 60 is the conversion factor from seconds to minutes. In this case, the given parameters are:

MAX_RPS = 20 SAFETY_FACTOR = 0.5

Plugging these values into the equation, we get:

SageMakerVariantInvocationsPerInstance = (20 * 0.5) * 60 SageMakerVariantInvocationsPerInstance = 600

Therefore, the Specialist should set the SageMaker Variant Invocations Per Instance setting to 600.

References:

Load testing your auto scaling configuration - Amazon SageMaker

Configure model auto scaling with the console - Amazon SageMaker

A Machine Learning Specialist deployed a model that provides product recommendations on a company's website Initially, the model was performing very well and resulted in customers buying more products on average However within the past few months the Specialist has noticed that the effect of product recommendations has diminished and customers are starting to return to their original habits of spending less The Specialist is unsure of what happened, as the model has not changed from its initial deployment over a year ago

Which method should the Specialist try to improve model performance?

A.
The model needs to be completely re-engineered because it is unable to handle product inventory changes
A.
The model needs to be completely re-engineered because it is unable to handle product inventory changes
Answers
B.
The model's hyperparameters should be periodically updated to prevent drift
B.
The model's hyperparameters should be periodically updated to prevent drift
Answers
C.
The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes
C.
The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes
Answers
D.
The model should be periodically retrained using the original training data plus new data as product inventory changes
D.
The model should be periodically retrained using the original training data plus new data as product inventory changes
Answers
Suggested answer: D

Explanation:

The problem that the Machine Learning Specialist is facing is likely due to concept drift, which is a phenomenon where the statistical properties of the target variable change over time, making the model less accurate and relevant. Concept drift can occur due to various reasons, such as changes in customer preferences, market trends, product inventory, seasonality, etc. In this case, the product recommendations model may have become outdated as the product inventory changed over time, making the recommendations less appealing to the customers. To address this issue, the model should be periodically retrained using the original training data plus new data as product inventory changes. This way, the model can learn from the latest data and adapt to the changing customer behavior and preferences. Retraining the model from scratch using the original data while adding a regularization term may not be sufficient, as it does not account for the new data. Updating the model's hyperparameters may not help either, as it does not address the underlying data distribution change. Re-engineering the model completely may not be necessary, as the model may still be valid and useful with periodic retraining.

References:

Concept Drift - Amazon SageMaker

Detecting and Handling Concept Drift - Amazon SageMaker

Machine Learning Concepts - Amazon Machine Learning

A manufacturer of car engines collects data from cars as they are being driven The data collected includes timestamp, engine temperature, rotations per minute (RPM), and other sensor readings The company wants to predict when an engine is going to have a problem so it can notify drivers in advance to get engine maintenance The engine data is loaded into a data lake for training

Which is the MOST suitable predictive model that can be deployed into production'?

A.
Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem Use a recurrent neural network (RNN) to train the model to recognize when an engine might need maintenance for a certain fault.
A.
Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem Use a recurrent neural network (RNN) to train the model to recognize when an engine might need maintenance for a certain fault.
Answers
B.
This data requires an unsupervised learning algorithm Use Amazon SageMaker k-means to cluster the data
B.
This data requires an unsupervised learning algorithm Use Amazon SageMaker k-means to cluster the data
Answers
C.
Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem Use a convolutional neural network (CNN) to train the model to recognize when an engine might need maintenance for a certain fault.
C.
Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem Use a convolutional neural network (CNN) to train the model to recognize when an engine might need maintenance for a certain fault.
Answers
D.
This data is already formulated as a time series Use Amazon SageMaker seq2seq to model the time series.
D.
This data is already formulated as a time series Use Amazon SageMaker seq2seq to model the time series.
Answers
Suggested answer: A

Explanation:

A recurrent neural network (RNN) is a type of neural network that can process sequential data, such as time series, by maintaining a hidden state that captures the temporal dependencies between the inputs. RNNs are well suited for predicting future events based on past observations, such as forecasting engine failures based on sensor readings. To train an RNN model, the data needs to be labeled with the target variable, which in this case is the type and time of the engine fault. This makes the problem a supervised learning problem, where the goal is to learn a mapping from the input sequence (sensor readings) to the output sequence (engine faults). By using an RNN model, the manufacturer can leverage the temporal information in the data and detect patterns that indicate when an engine might need maintenance for a certain fault.

References:

Recurrent Neural Networks - Amazon SageMaker

Use Amazon SageMaker Built-in Algorithms or Pre-trained Models

Recurrent Neural Network Definition | DeepAI

What are Recurrent Neural Networks? An Ultimate Guide for Newbies!

Lee and Carter go Machine Learning: Recurrent Neural Networks - SSRN

Total 308 questions
Go to page: of 31