ExamGecko
Home Home / Amazon / MLS-C01

Amazon MLS-C01 Practice Test - Questions Answers, Page 8

Question list
Search
Search

List of questions

Search

Related questions











A manufacturing company has a large set of labeled historical sales data The manufacturer would like to predict how many units of a particular part should be produced each quarter Which machine learning approach should be used to solve this problem?

A.
Logistic regression
A.
Logistic regression
Answers
B.
Random Cut Forest (RCF)
B.
Random Cut Forest (RCF)
Answers
C.
Principal component analysis (PCA)
C.
Principal component analysis (PCA)
Answers
D.
Linear regression
D.
Linear regression
Answers
Suggested answer: D

Explanation:

Linear regression is a machine learning approach that can be used to solve this problem. Linear regression is a supervised learning technique that can model the relationship between one or more input variables (features) and an output variable (target). In this case, the input variables could be the historical sales data of the part, such as the quarter, the demand, the price, the inventory, etc. The output variable could be the number of units to be produced for the part. Linear regression can learn the coefficients (weights) of the input variables that best fit the output variable, and then use them to make predictions for new data. Linear regression is suitable for problems that involve continuous and numeric output variables, such as predicting house prices, stock prices, or sales volumes.References:

AWS Machine Learning Specialty Exam Guide

Linear Regression

A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket A Machine Learning Specialist wants to use SQL to run queries on this data. Which solution requires the LEAST effort to be able to query this data?

A.
Use AWS Data Pipeline to transform the data and Amazon RDS to run queries.
A.
Use AWS Data Pipeline to transform the data and Amazon RDS to run queries.
Answers
B.
Use AWS Glue to catalogue the data and Amazon Athena to run queries
B.
Use AWS Glue to catalogue the data and Amazon Athena to run queries
Answers
C.
Use AWS Batch to run ETL on the data and Amazon Aurora to run the quenes
C.
Use AWS Batch to run ETL on the data and Amazon Aurora to run the quenes
Answers
D.
Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries
D.
Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries
Answers
Suggested answer: B

Explanation:

AWS Glue is a serverless data integration service that can catalogue, clean, enrich, and move data between various data stores. Amazon Athena is an interactive query service that can run SQL queries on data stored in Amazon S3. By using AWS Glue to catalogue the data and Amazon Athena to run queries, the Machine Learning Specialist can leverage the existing data in Amazon S3 without any additional data transformation or loading. This solution requires the least effort compared to the other options, which involve more complex and costly data processing and storage services.References:AWS Glue,Amazon Athena

A Machine Learning Specialist is packaging a custom ResNet model into a Docker container so the company can leverage Amazon SageMaker for training The Specialist is using Amazon EC2 P3 instances to train the model and needs to properly configure the Docker container to leverage the NVIDIA GPUs

What does the Specialist need to do1?

A.
Bundle the NVIDIA drivers with the Docker image
A.
Bundle the NVIDIA drivers with the Docker image
Answers
B.
Build the Docker container to be NVIDIA-Docker compatible
B.
Build the Docker container to be NVIDIA-Docker compatible
Answers
C.
Organize the Docker container's file structure to execute on GPU instances.
C.
Organize the Docker container's file structure to execute on GPU instances.
Answers
D.
Set the GPU flag in the Amazon SageMaker Create TrainingJob request body
D.
Set the GPU flag in the Amazon SageMaker Create TrainingJob request body
Answers
Suggested answer: B

Explanation:

To leverage the NVIDIA GPUs on Amazon EC2 P3 instances, the Machine Learning Specialist needs to build the Docker container to be NVIDIA-Docker compatible. NVIDIA-Docker is a tool that enables GPU-accelerated containers to run on Docker. It automatically configures the container to access the NVIDIA drivers and libraries on the host system. The Specialist does not need to bundle the NVIDIA drivers with the Docker image, as they are already installed on the EC2 P3 instances. The Specialist does not need to organize the Docker container's file structure to execute on GPU instances, as this is not relevant for GPU compatibility. The Specialist does not need to set the GPU flag in the Amazon SageMaker Create TrainingJob request body, as this is only required for using Elastic Inference accelerators, not EC2 P3 instances.References:NVIDIA-Docker,Using GPU-Accelerated Containers,Using Elastic Inference in Amazon SageMaker

A large JSON dataset for a project has been uploaded to a private Amazon S3 bucket The Machine Learning Specialist wants to securely access and explore the data from an Amazon SageMaker notebook instance A new VPC was created and assigned to the Specialist

How can the privacy and integrity of the data stored in Amazon S3 be maintained while granting access to the Specialist for analysis?

A.
Launch the SageMaker notebook instance within the VPC with SageMaker-provided internet access enabled Use an S3 ACL to open read privileges to the everyone group
A.
Launch the SageMaker notebook instance within the VPC with SageMaker-provided internet access enabled Use an S3 ACL to open read privileges to the everyone group
Answers
B.
Launch the SageMaker notebook instance within the VPC and create an S3 VPC endpoint for the notebook to access the data Copy the JSON dataset from Amazon S3 into the ML storage volume on the SageMaker notebook instance and work against the local dataset
B.
Launch the SageMaker notebook instance within the VPC and create an S3 VPC endpoint for the notebook to access the data Copy the JSON dataset from Amazon S3 into the ML storage volume on the SageMaker notebook instance and work against the local dataset
Answers
C.
Launch the SageMaker notebook instance within the VPC and create an S3 VPC endpoint for the notebook to access the data Define a custom S3 bucket policy to only allow requests from your VPC to access the S3 bucket
C.
Launch the SageMaker notebook instance within the VPC and create an S3 VPC endpoint for the notebook to access the data Define a custom S3 bucket policy to only allow requests from your VPC to access the S3 bucket
Answers
D.
Launch the SageMaker notebook instance within the VPC with SageMaker-provided internet access enabled. Generate an S3 pre-signed URL for access to data in the bucket
D.
Launch the SageMaker notebook instance within the VPC with SageMaker-provided internet access enabled. Generate an S3 pre-signed URL for access to data in the bucket
Answers
Suggested answer: C

Explanation:

The best way to maintain the privacy and integrity of the data stored in Amazon S3 is to use a combination of VPC endpoints and S3 bucket policies. A VPC endpoint allows the SageMaker notebook instance to access the S3 bucket without going through the public internet. A bucket policy allows the S3 bucket owner to specify which VPCs or VPC endpoints can access the bucket. This way, the data is protected from unauthorized access and tampering. The other options are either insecure (A and D) or inefficient (B).References:Using Amazon S3 VPC Endpoints,Using Bucket Policies and User Policies

Given the following confusion matrix for a movie classification model, what is the true class frequency for Romance and the predicted class frequency for Adventure?

A.
The true class frequency for Romance is 77.56% and the predicted class frequency for Adventure is 20 85%
A.
The true class frequency for Romance is 77.56% and the predicted class frequency for Adventure is 20 85%
Answers
B.
The true class frequency for Romance is 57.92% and the predicted class frequency for Adventure is 1312%
B.
The true class frequency for Romance is 57.92% and the predicted class frequency for Adventure is 1312%
Answers
C.
The true class frequency for Romance is 0 78 and the predicted class frequency for Adventure is (0 47 - 0.32).
C.
The true class frequency for Romance is 0 78 and the predicted class frequency for Adventure is (0 47 - 0.32).
Answers
D.
The true class frequency for Romance is 77.56% * 0.78 and the predicted class frequency for Adventure is 20 85% ' 0.32
D.
The true class frequency for Romance is 77.56% * 0.78 and the predicted class frequency for Adventure is 20 85% ' 0.32
Answers
Suggested answer: B

Explanation:

The true class frequency for Romance is the percentage of movies that are actually Romance out of all the movies. This can be calculated by dividing the sum of the true values for Romance by the total number of movies. The predicted class frequency for Adventure is the percentage of movies that are predicted to be Adventure out of all the movies. This can be calculated by dividing the sum of the predicted values for Adventure by the total number of movies. Based on the confusion matrix, the true class frequency for Romance is 57.92% and the predicted class frequency for Adventure is 13.12%.References: Confusion Matrix, Classification Metrics

A Machine Learning Specialist is building a supervised model that will evaluate customers' satisfaction with their mobile phone service based on recent usage The model's output should infer whether or not a customer is likely to switch to a competitor in the next 30 days

Which of the following modeling techniques should the Specialist use1?

A.
Time-series prediction
A.
Time-series prediction
Answers
B.
Anomaly detection
B.
Anomaly detection
Answers
C.
Binary classification
C.
Binary classification
Answers
D.
Regression
D.
Regression
Answers
Suggested answer: C

Explanation:

The modeling technique that the Machine Learning Specialist should use is binary classification. Binary classification is a type of supervised learning that predicts whether an input belongs to one of two possible classes. In this case, the input is the customer's recent usage data and the output is whether or not the customer is likely to switch to a competitor in the next 30 days. This is a binary outcome, either yes or no, so binary classification is suitable for this problem. The other options are not appropriate for this problem. Time-series prediction is a type of supervised learning that forecasts future values based on past and present data. Anomaly detection is a type of unsupervised learning that identifies outliers or abnormal patterns in the data. Regression is a type of supervised learning that estimates a continuous numerical value based on the input features.References:Binary Classification,Time Series Prediction,Anomaly Detection,Regression

A web-based company wants to improve its conversion rate on its landing page Using a large historical dataset of customer visits, the company has repeatedly trained a multi-class deep learning network algorithm on Amazon SageMaker However there is an overfitting problem training data shows 90% accuracy in predictions, while test data shows 70% accuracy only

The company needs to boost the generalization of its model before deploying it into production to maximize conversions of visits to purchases

Which action is recommended to provide the HIGHEST accuracy model for the company's test and validation data?

A.
Increase the randomization of training data in the mini-batches used in training.
A.
Increase the randomization of training data in the mini-batches used in training.
Answers
B.
Allocate a higher proportion of the overall data to the training dataset
B.
Allocate a higher proportion of the overall data to the training dataset
Answers
C.
Apply L1 or L2 regularization and dropouts to the training.
C.
Apply L1 or L2 regularization and dropouts to the training.
Answers
D.
Reduce the number of layers and units (or neurons) from the deep learning network.
D.
Reduce the number of layers and units (or neurons) from the deep learning network.
Answers
Suggested answer: C

Explanation:

Regularization and dropouts are techniques that can help reduce overfitting in deep learning models. Overfitting occurs when the model learns too much from the training data and fails to generalize well to new data. Regularization adds a penalty term to the loss function that penalizes the model for having large or complex weights. This prevents the model from memorizing the noise or irrelevant features in the training data. L1 and L2 are two types of regularization that differ in how they calculate the penalty term. L1 regularization uses the absolute value of the weights, while L2 regularization uses the square of the weights. Dropouts are another technique that randomly drops out some units or neurons from the network during training. This creates a thinner network that is less prone to overfitting. Dropouts also act as a form of ensemble learning, where multiple sub-models are combined to produce a better prediction. By applying regularization and dropouts to the training, the web-based company can improve the generalization and accuracy of its deep learning model on the test and validation data.References:

Regularization: A video that explains the concept and benefits of regularization in deep learning.

Dropout: A video that demonstrates how dropout works and why it helps reduce overfitting.

A Machine Learning Specialist was given a dataset consisting of unlabeled data The Specialist must create a model that can help the team classify the data into different buckets What model should be used to complete this work?

A.
K-means clustering
A.
K-means clustering
Answers
B.
Random Cut Forest (RCF)
B.
Random Cut Forest (RCF)
Answers
C.
XGBoost
C.
XGBoost
Answers
D.
BlazingText
D.
BlazingText
Answers
Suggested answer: A

Explanation:

K-means clustering is a machine learning technique that can be used to classify unlabeled data into different groups based on their similarity. It is an unsupervised learning method, which means it does not require any prior knowledge or labels for the data. K-means clustering works by randomly assigning data points to a number of clusters, then iteratively updating the cluster centers and reassigning the data points until the clusters are stable. The result is a partition of the data into distinct and homogeneous groups. K-means clustering can be useful for exploratory data analysis, data compression, anomaly detection, and feature extraction.References:

K-Means Clustering: A tutorial on how to use K-means clustering with Amazon SageMaker.

Unsupervised Learning: A video that explains the concept and applications of unsupervised learning.

A retail company intends to use machine learning to categorize new products A labeled dataset of current products was provided to the Data Science team The dataset includes 1 200 products The labeled dataset has 15 features for each product such as title dimensions, weight, and price Each product is labeled as belonging to one of six categories such as books, games, electronics, and movies.

Which model should be used for categorizing new products using the provided dataset for training?

A.
An XGBoost model where the objective parameter is set to multi: softmax
A.
An XGBoost model where the objective parameter is set to multi: softmax
Answers
B.
A deep convolutional neural network (CNN) with a softmax activation function for the last layer
B.
A deep convolutional neural network (CNN) with a softmax activation function for the last layer
Answers
C.
A regression forest where the number of trees is set equal to the number of product categories
C.
A regression forest where the number of trees is set equal to the number of product categories
Answers
D.
A DeepAR forecasting model based on a recurrent neural network (RNN)
D.
A DeepAR forecasting model based on a recurrent neural network (RNN)
Answers
Suggested answer: A

Explanation:

XGBoost is a machine learning framework that can be used for classification, regression, ranking, and other tasks. It is based on the gradient boosting algorithm, which builds an ensemble of weak learners (usually decision trees) to produce a strong learner. XGBoost has several advantages over other algorithms, such as scalability, parallelization, regularization, and sparsity handling. For categorizing new products using the provided dataset, an XGBoost model would be a suitable choice, because it can handle multiple features and multiple classes efficiently and accurately. To train an XGBoost model for multi-class classification, the objective parameter should be set to multi: softmax, which means that the model will output a probability distribution over the classes and predict the class with the highest probability. Alternatively, the objective parameter can be set to multi: softprob, which means that the model will output the raw probability of each class instead of the predicted class label. This can be useful for evaluating the model performance or for post-processing the predictions.References:

XGBoost: A tutorial on how to use XGBoost with Amazon SageMaker.

XGBoost Parameters: A reference guide for the parameters of XGBoost.

A Machine Learning Specialist is building a model to predict future employment rates based on a wide range of economic factors While exploring the data, the Specialist notices that the magnitude of the input features vary greatly The Specialist does not want variables with a larger magnitude to dominate the model

What should the Specialist do to prepare the data for model training'?

A.
Apply quantile binning to group the data into categorical bins to keep any relationships in the data by replacing the magnitude with distribution
A.
Apply quantile binning to group the data into categorical bins to keep any relationships in the data by replacing the magnitude with distribution
Answers
B.
Apply the Cartesian product transformation to create new combinations of fields that are independent of the magnitude
B.
Apply the Cartesian product transformation to create new combinations of fields that are independent of the magnitude
Answers
C.
Apply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude
C.
Apply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude
Answers
D.
Apply the orthogonal sparse Diagram (OSB) transformation to apply a fixed-size sliding window to generate new features of a similar magnitude.
D.
Apply the orthogonal sparse Diagram (OSB) transformation to apply a fixed-size sliding window to generate new features of a similar magnitude.
Answers
Suggested answer: C

Explanation:

Normalization is a data preprocessing technique that can be used to scale the input features to a common range, such as [-1, 1] or [0, 1]. Normalization can help reduce the effect of outliers, improve the convergence of gradient-based algorithms, and prevent variables with a larger magnitude from dominating the model. One common method of normalization is standardization, which transforms each feature to have a mean of 0 and a variance of 1. This can be done by subtracting the mean and dividing by the standard deviation of each feature. Standardization can be useful for models that assume the input features are normally distributed, such as linear regression, logistic regression, and support vector machines.References:

Data normalization and standardization: A video that explains the concept and benefits of data normalization and standardization.

Standardize or Normalize?: A blog post that compares different methods of scaling the input features.

Total 308 questions
Go to page: of 31