ExamGecko
Home Home / Amazon / MLS-C01

Amazon MLS-C01 Practice Test - Questions Answers, Page 17

Question list
Search
Search

List of questions

Search

Related questions











A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant will default on a credit card payment. The company has collected data from a large number of sources with thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are highly correlated, the large number of features slows down the training speed significantly, and that there are some overfitting issues.

The Data Scientist on this project would like to speed up the model training time without losing a lot of information from the original dataset.

Which feature engineering technique should the Data Scientist use to meet the objectives?

A.
Run self-correlation on all features and remove highly correlated features
A.
Run self-correlation on all features and remove highly correlated features
Answers
B.
Normalize all numerical values to be between 0 and 1
B.
Normalize all numerical values to be between 0 and 1
Answers
C.
Use an autoencoder or principal component analysis (PCA) to replace original features with new features
C.
Use an autoencoder or principal component analysis (PCA) to replace original features with new features
Answers
D.
Cluster raw data using k-means and use sample data from each cluster to build a new dataset
D.
Cluster raw data using k-means and use sample data from each cluster to build a new dataset
Answers
Suggested answer: C

Explanation:

The best feature engineering technique to speed up the model training time without losing a lot of information from the original dataset is to use an autoencoder or principal component analysis (PCA) to replace original features with new features. An autoencoder is a type of neural network that learns a compressed representation of the input data, called the latent space, by minimizing the reconstruction error between the input and the output. PCA is a statistical technique that reduces the dimensionality of the data by finding a set of orthogonal axes, called the principal components, that capture the maximum variance of the data. Both techniques can help reduce the number of features and remove the noise and redundancy in the data, which can improve the model performance and speed up the training process.References:

AWS Machine Learning Specialty Exam Guide

AWS Machine Learning Training - Dimensionality Reduction for Machine Learning

AWS Machine Learning Training - Deep Learning with Amazon SageMaker

A Data Scientist is training a multilayer perception (MLP) on a dataset with multiple classes. The target class of interest is unique compared to the other classes within the dataset, but it does not achieve and acceptable ecall metric. The Data Scientist has already tried varying the number and size of the MLP's hidden layers, which has not significantly improved the results. A solution to improve recall must be implemented as quickly as possible.

Which techniques should be used to meet these requirements?

A.
Gather more data using Amazon Mechanical Turk and then retrain
A.
Gather more data using Amazon Mechanical Turk and then retrain
Answers
B.
Train an anomaly detection model instead of an MLP
B.
Train an anomaly detection model instead of an MLP
Answers
C.
Train an XGBoost model instead of an MLP
C.
Train an XGBoost model instead of an MLP
Answers
D.
Add class weights to the MLP's loss function and then retrain
D.
Add class weights to the MLP's loss function and then retrain
Answers
Suggested answer: D

Explanation:

The best technique to improve the recall of the MLP for the target class of interest is to add class weights to the MLP's loss function and then retrain. Class weights are a way of assigning different importance to each class in the dataset, such that the model will pay more attention to the classes with higher weights. This can help mitigate the class imbalance problem, where the model tends to favor the majority class and ignore the minority class. By increasing the weight of the target class of interest, the model will try to reduce the false negatives and increase the true positives, which will improve the recall metric. Adding class weights to the loss function is also a quick and easy solution, as it does not require gathering more data, changing the model architecture, or switching to a different algorithm.

References:

AWS Machine Learning Specialty Exam Guide

AWS Machine Learning Training - Deep Learning with Amazon SageMaker

AWS Machine Learning Training - Class Imbalance and Weighted Loss Functions

A Machine Learning Specialist works for a credit card processing company and needs to predict which transactions may be fraudulent in near-real time. Specifically, the Specialist must train a model that returns the probability that a given transaction may fraudulent.

How should the Specialist frame this business problem?

A.
Streaming classification
A.
Streaming classification
Answers
B.
Binary classification
B.
Binary classification
Answers
C.
Multi-category classification
C.
Multi-category classification
Answers
D.
Regression classification
D.
Regression classification
Answers
Suggested answer: B

Explanation:

The business problem of predicting whether a new credit card applicant will default on a credit card payment can be framed as a binary classification problem. Binary classification is the task of predicting a discrete class label output for an example, where the class label can only take one of two possible values. In this case, the class label can be either ''default'' or ''no default'', indicating whether the applicant will or will not default on a credit card payment. A binary classification model can return the probability that a given applicant belongs to each class, and then assign the applicant to the class with the highest probability. For example, if the model predicts that an applicant has a 0.8 probability of defaulting and a 0.2 probability of not defaulting, then the model will classify the applicant as ''default''. Binary classification is suitable for this problem because the outcome of interest is categorical and binary, and the model needs to return the probability of each outcome.

References:

AWS Machine Learning Specialty Exam Guide

AWS Machine Learning Training - Classification vs Regression in Machine Learning

A real estate company wants to create a machine learning model for predicting housing prices based on a historical dataset. The dataset contains 32 features.

Which model will meet the business requirement?

A.
Logistic regression
A.
Logistic regression
Answers
B.
Linear regression
B.
Linear regression
Answers
C.
K-means
C.
K-means
Answers
D.
Principal component analysis (PCA)
D.
Principal component analysis (PCA)
Answers
Suggested answer: B

Explanation:

The best model for predicting housing prices based on a historical dataset with 32 features is linear regression. Linear regression is a supervised learning algorithm that fits a linear relationship between a dependent variable (housing price) and one or more independent variables (features). Linear regression can handle multiple features and output a continuous value for the housing price. Linear regression can also return the coefficients of the features, which indicate how each feature affects the housing price. Linear regression is suitable for this problem because the outcome of interest is numerical and continuous, and the model needs to capture the linear relationship between the features and the outcome.

References:

AWS Machine Learning Specialty Exam Guide

AWS Machine Learning Training - Regression vs Classification in Machine Learning

AWS Machine Learning Training - Linear Regression with Amazon SageMaker

A Machine Learning Specialist wants to bring a custom algorithm to Amazon SageMaker. The Specialist implements the algorithm in a Docker container supported by Amazon SageMaker.

How should the Specialist package the Docker container so that Amazon SageMaker can launch the training correctly?

A.
Modify the bash_profile file in the container and add a bash command to start the training program
A.
Modify the bash_profile file in the container and add a bash command to start the training program
Answers
B.
Use CMD config in the Dockerfile to add the training program as a CMD of the image
B.
Use CMD config in the Dockerfile to add the training program as a CMD of the image
Answers
C.
Configure the training program as an ENTRYPOINT named train
C.
Configure the training program as an ENTRYPOINT named train
Answers
D.
Copy the training program to directory /opt/ml/train
D.
Copy the training program to directory /opt/ml/train
Answers
Suggested answer: C

Explanation:

To use a custom algorithm in Amazon SageMaker, the Docker container image must have an executable file namedtrainthat acts as theENTRYPOINTfor the container. This file is responsible for running the training code and communicating with the Amazon SageMaker service. Thetrainfile must be in thePATHof the container and haveexecute permissions. The other options are not valid ways to package the Docker container for Amazon SageMaker.References:

Use Docker containers to build models - Amazon SageMaker

Create a container with your own algorithms and models - Amazon SageMaker

A Data Scientist needs to analyze employment data. The dataset contains approximately 10 million observations on people across 10 different features. During the preliminary analysis, the Data Scientist notices that income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distribution also show a right skew, with fewer older individuals participating in the workforce.

Which feature transformations can the Data Scientist apply to fix the incorrectly skewed data? (Choose two.)

A.
Cross-validation
A.
Cross-validation
Answers
B.
Numerical value binning
B.
Numerical value binning
Answers
C.
High-degree polynomial transformation
C.
High-degree polynomial transformation
Answers
D.
Logarithmic transformation
D.
Logarithmic transformation
Answers
E.
One hot encoding
E.
One hot encoding
Answers
Suggested answer: B, D

Explanation:

To fix the incorrectly skewed data, the Data Scientist can apply two feature transformations: numerical value binning and logarithmic transformation. Numerical value binning is a technique that groups continuous values into discrete bins or categories. This can help reduce the skewness of the data by creating more balanced frequency distributions. Logarithmic transformation is a technique that applies the natural logarithm function to each value in the data. This can help reduce the right skewness of the data by compressing the large values and expanding the small values. Both of these transformations can make the data more suitable for machine learning algorithms that assume normality of the data.References:

Data Transformation - Amazon SageMaker

Transforming Skewed Data for Machine Learning

A Machine Learning Specialist is given a structured dataset on the shopping habits of a company's customer base. The dataset contains thousands of columns of data and hundreds of numerical columns for each customer. The Specialist wants to identify whether there are natural groupings for these columns across all customers and visualize the results as quickly as possible.

What approach should the Specialist take to accomplish these tasks?

A.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot.
A.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot.
Answers
B.
Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.
B.
Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.
Answers
C.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph.
C.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph.
Answers
D.
Run k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster.
D.
Run k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster.
Answers
Suggested answer: A

Explanation:

The best approach to identify and visualize the natural groupings for the numerical columns across all customers is to embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot. t-SNE is a dimensionality reduction technique that can project high-dimensional data into a lower-dimensional space, while preserving the local structure and distances of the data points. A scatter plot can then show the clusters of data points in the reduced space, where each point represents a customer and the color indicates the cluster membership. This approach can help the Specialist quickly explore the patterns and similarities among the customers based on their numerical features.

The other options are not as effective or efficient as the t-SNE approach. Running k-means for different values of k and creating an elbow plot can help determine the optimal number of clusters, but it does not provide a visual representation of the clusters or the customers. Embedding the numerical features using t-SNE and creating a line graph does not make sense, as a line graph is used to show the change of a variable over time, not the distribution of data points in a space. Running k-means for different values of k and creating box plots for each numerical column within each cluster can provide some insights into the statistics of each cluster, but it is very time-consuming and cumbersome to create and compare thousands of box plots.References:

Dimensionality Reduction - Amazon SageMaker

Visualize high dimensional data using t-SNE - Amazon SageMaker

A Machine Learning Specialist is planning to create a long-running Amazon EMR cluster. The EMR cluster will have 1 master node, 10 core nodes, and 20 task nodes. To save on costs, the Specialist will use Spot

Instances in the EMR cluster.

Which nodes should the Specialist launch on Spot Instances?

A.
Master node
A.
Master node
Answers
B.
Any of the core nodes
B.
Any of the core nodes
Answers
C.
Any of the task nodes
C.
Any of the task nodes
Answers
D.
Both core and task nodes
D.
Both core and task nodes
Answers
Suggested answer: C

Explanation:

The best option for using Spot Instances in a long-running Amazon EMR cluster is to use them for the task nodes. Task nodes are optional nodes that are used to increase the processing power of the cluster. They do not store any data and can be added or removed without affecting the cluster's operation. Therefore, they are more resilient to interruptions caused by Spot Instance termination. Using Spot Instances for the master node or the core nodes is not recommended, as they store important data and metadata for the cluster. If they are terminated, the cluster may fail or lose data.References:

Amazon EMR on EC2 Spot Instances

Instance purchasing options - Amazon EMR

A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company's dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices.

Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model's complexity?

A.
Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
A.
Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
Answers
B.
Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
B.
Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
Answers
C.
Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
C.
Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
Answers
D.
Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.
D.
Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.
Answers
Suggested answer: D

Explanation:

Feature selection is the process of reducing the number of input variables to those that are most relevant for predicting the target variable. One way to do this is to run a correlation check of all features against the target variable and remove features with low target variable correlation scores. This means that these features have little or no linear relationship with the target variable and are not useful for the prediction. This can reduce the model's complexity and improve its performance.References:

Feature engineering - Machine Learning Lens

Feature Selection For Machine Learning in Python

A health care company is planning to use neural networks to classify their X-ray images into normal and abnormal classes. The labeled data is divided into a training set of 1,000 images and a test set of 200 images. The initial training of a neural network model with 50 hidden layers yielded 99% accuracy on the training set, but only 55% accuracy on the test set.

What changes should the Specialist consider to solve this issue? (Choose three.)

A.
Choose a higher number of layers
A.
Choose a higher number of layers
Answers
B.
Choose a lower number of layers
B.
Choose a lower number of layers
Answers
C.
Choose a smaller learning rate
C.
Choose a smaller learning rate
Answers
D.
Enable dropout
D.
Enable dropout
Answers
E.
Include all the images from the test set in the training set
E.
Include all the images from the test set in the training set
Answers
F.
Enable early stopping
F.
Enable early stopping
Answers
Suggested answer: B, D, F

Explanation:

The problem described in the question is a case of overfitting, where the neural network model performs well on the training data but poorly on the test data. This means that the model has learned the noise and specific patterns of the training data, but cannot generalize to new and unseen data. To solve this issue, the Specialist should consider the following changes:

Choose a lower number of layers: Reducing the number of layers can reduce the complexity and capacity of the neural network model, making it less prone to overfitting. A model with 50 hidden layers is likely too deep for the given data size and task. A simpler model with fewer layers can learn the essential features of the data without memorizing the noise.

Enable dropout: Dropout is a regularization technique that randomly drops out some units in the neural network during training. This prevents the units from co-adapting too much and forces the model to learn more robust features. Dropout can improve the generalization and test performance of the model by reducing overfitting.

Enable early stopping: Early stopping is another regularization technique that monitors the validation error during training and stops the training process when the validation error stops decreasing or starts increasing. This prevents the model from overtraining on the training data and reduces overfitting.

References:

Deep Learning - Machine Learning Lens

How to Avoid Overfitting in Deep Learning Neural Networks

How to Identify Overfitting Machine Learning Models in Scikit-Learn

Total 308 questions
Go to page: of 31