ExamGecko
Home Home / Amazon / MLS-C01

Amazon MLS-C01 Practice Test - Questions Answers, Page 28

Question list
Search
Search

List of questions

Search

Related questions











A company that manufactures mobile devices wants to determine and calibrate the appropriate sales price for its devices. The company is collecting the relevant data and is determining data features that it can use to train machine learning (ML) models. There are more than 1,000 features, and the company wants to determine the primary features that contribute to the sales price.

Which techniques should the company use for feature selection? (Choose three.)

A.
Data scaling with standardization and normalization
A.
Data scaling with standardization and normalization
Answers
B.
Correlation plot with heat maps
B.
Correlation plot with heat maps
Answers
C.
Data binning
C.
Data binning
Answers
D.
Univariate selection
D.
Univariate selection
Answers
E.
Feature importance with a tree-based classifier
E.
Feature importance with a tree-based classifier
Answers
F.
Data augmentation
F.
Data augmentation
Answers
Suggested answer: B, D, E

Explanation:

Feature selection is the process of selecting a subset of extracted features that are relevant and contribute to minimizing the error rate of a trained model. Some techniques for feature selection are:

Correlation plot with heat maps: This technique visualizes the correlation between features using a color-coded matrix. Features that are highly correlated with each other or with the target variable can be identified and removed to reduce redundancy and noise.

Univariate selection: This technique evaluates each feature individually based on a statistical test, such as chi-square, ANOVA, or mutual information, and selects the features that have the highest scores or p-values. This technique is simple and fast, but it does not consider the interactions between features.

Feature importance with a tree-based classifier: This technique uses a tree-based classifier, such as random forest or gradient boosting, to rank the features based on their importance in splitting the nodes. Features that have low importance scores can be dropped from the model. This technique can capture the non-linear relationships and interactions between features.

The other options are not techniques for feature selection, but rather for feature engineering, which is the process of creating, transforming, or extracting features from the original data. Feature engineering can improve the performance and interpretability of the model, but it does not reduce the number of features.

Data scaling with standardization and normalization: This technique transforms the features to have a common scale, such as zero mean and unit variance, or a range between 0 and 1. This technique can help some algorithms, such as k-means or logistic regression, to converge faster and avoid numerical instability, but it does not change the number of features.

Data binning: This technique groups the continuous features into discrete bins or categories based on some criteria, such as equal width, equal frequency, or clustering. This technique can reduce the noise and outliers in the data, and also create ordinal or nominal features that can be used for some algorithms, such as decision trees or naive Bayes, but it does not reduce the number of features.

Data augmentation: This technique generates new data from the existing data by applying some transformations, such as rotation, flipping, cropping, or noise addition. This technique can increase the size and diversity of the data, and help prevent overfitting, but it does not reduce the number of features.

References:

Feature engineering - Machine Learning Lens

Amazon SageMaker Autopilot now provides feature selection and the ability to change data types while creating an AutoML experiment

Feature Selection in Machine Learning | Baeldung on Computer Science

Feature Selection in Machine Learning: An easy Introduction

A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages.

The Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions The ML spoctafst must ensure that the data does not contain outliers before training the ..el.

What can the ML specialist meet these requirements with the LEAST operational overhead?

A.
Load the data into an Amazon SagcMaker Studio notebook. Calculate the first and third quartile Use a SageMaker Data Wrangler data (low to remove only values that are outside of those quartiles.
A.
Load the data into an Amazon SagcMaker Studio notebook. Calculate the first and third quartile Use a SageMaker Data Wrangler data (low to remove only values that are outside of those quartiles.
Answers
B.
Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset Use a Data Wrangler data flow to remove outliers based on the bias report.
B.
Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset Use a Data Wrangler data flow to remove outliers based on the bias report.
Answers
C.
Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers.
C.
Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers.
Answers
D.
Use Amazon Lookout for Equipment to find and remove outliers from the dataset.
D.
Use Amazon Lookout for Equipment to find and remove outliers from the dataset.
Answers
Suggested answer: C

Explanation:

Amazon SageMaker Data Wrangler is a tool that helps data scientists and ML developers to prepare data for ML. One of the features of Data Wrangler is the anomaly detection visualization, which uses an unsupervised ML algorithm to identify outliers in the dataset based on statistical properties. The ML specialist can use this feature to quickly explore the sensor data and find any anomalous values that may affect the model performance. The ML specialist can then add a transformation to a Data Wrangler data flow to remove the outliers from the dataset. The data flow can be exported as a script or a pipeline to automate the data preparation process. This option requires the least operational overhead compared to the other options.

References:

Amazon SageMaker Data Wrangler - Amazon Web Services (AWS)

Anomaly Detection Visualization - Amazon SageMaker

Transform Data - Amazon SageMaker

A data engineer needs to provide a team of data scientists with the appropriate dataset to run machine learning training jobs. The data will be stored in Amazon S3. The data engineer is obtaining the data from an Amazon Redshift database and is using join queries to extract a single tabular dataset. A portion of the schema is as follows:

...traction Timestamp (Timeslamp)

...JName(Varchar)

...JNo (Varchar)

Th data engineer must provide the data so that any row with a CardNo value of NULL is removed. Also, the TransactionTimestamp column must be separated into a TransactionDate column and a isactionTime column Finally, the CardName column must be renamed to NameOnCard.

The data will be extracted on a monthly basis and will be loaded into an S3 bucket. The solution must minimize the effort that is needed to set up infrastructure for the ingestion and transformation. The solution must be automated and must minimize the load on the Amazon Redshift cluster

Which solution meets these requirements?

A.
Set up an Amazon EMR cluster Create an Apache Spark job to read the data from the Amazon Redshift cluster and transform the data. Load the data into the S3 bucket. Schedule the job to run monthly.
A.
Set up an Amazon EMR cluster Create an Apache Spark job to read the data from the Amazon Redshift cluster and transform the data. Load the data into the S3 bucket. Schedule the job to run monthly.
Answers
B.
Set up an Amazon EC2 instance with a SQL client tool, such as SQL Workbench/J. to query the data from the Amazon Redshift cluster directly. Export the resulting dataset into a We. Upload the file into the S3 bucket. Perform these tasks monthly.
B.
Set up an Amazon EC2 instance with a SQL client tool, such as SQL Workbench/J. to query the data from the Amazon Redshift cluster directly. Export the resulting dataset into a We. Upload the file into the S3 bucket. Perform these tasks monthly.
Answers
C.
Set up an AWS Glue job that has the Amazon Redshift cluster as the source and the S3 bucket as the destination Use the built-in transforms Filter, Map. and RenameField to perform the required transformations. Schedule the job to run monthly.
C.
Set up an AWS Glue job that has the Amazon Redshift cluster as the source and the S3 bucket as the destination Use the built-in transforms Filter, Map. and RenameField to perform the required transformations. Schedule the job to run monthly.
Answers
D.
Use Amazon Redshift Spectrum to run a query that writes the data directly to the S3 bucket. Create an AWS Lambda function to run the query monthly
D.
Use Amazon Redshift Spectrum to run a query that writes the data directly to the S3 bucket. Create an AWS Lambda function to run the query monthly
Answers
Suggested answer: C

Explanation:

The best solution for this scenario is to set up an AWS Glue job that has the Amazon Redshift cluster as the source and the S3 bucket as the destination, and use the built-in transforms Filter, Map, and RenameField to perform the required transformations. This solution has the following advantages:

It minimizes the effort that is needed to set up infrastructure for the ingestion and transformation, as AWS Glue is a fully managed service that provides a serverless Apache Spark environment, a graphical interface to define data sources and targets, and a code generation feature to create and edit scripts1.

It automates the extraction and transformation process, as AWS Glue can schedule the job to run monthly, and handle the connection, authentication, and configuration of the Amazon Redshift cluster and the S3 bucket2.

It minimizes the load on the Amazon Redshift cluster, as AWS Glue can read the data from the cluster in parallel and use a JDBC connection that supports SSL encryption3.

It performs the required transformations, as AWS Glue can use the built-in transforms Filter, Map, and RenameField to remove the rows with NULL values, split the timestamp column into date and time columns, and rename the card name column, respectively4.

The other solutions are not optimal or suitable, because they have the following drawbacks:

A: Setting up an Amazon EMR cluster and creating an Apache Spark job to read the data from the Amazon Redshift cluster and transform the data is not the most efficient or convenient solution, as it requires more effort and resources to provision, configure, and manage the EMR cluster, and to write and maintain the Spark code5.

B: Setting up an Amazon EC2 instance with a SQL client tool to query the data from the Amazon Redshift cluster directly and export the resulting dataset into a CSV file is not a scalable or reliable solution, as it depends on the availability and performance of the EC2 instance, and the manual execution and upload of the SQL queries and the CSV file6.

D: Using Amazon Redshift Spectrum to run a query that writes the data directly to the S3 bucket and creating an AWS Lambda function to run the query monthly is not a feasible solution, as Amazon Redshift Spectrum does not support writing data to external tables or S3 buckets, only reading data from them7.

References:

1:What Is AWS Glue? - AWS Glue

2:Populating the Data Catalog - AWS Glue

3:Best Practices When Using AWS Glue with Amazon Redshift - AWS Glue

4:Built-In Transforms - AWS Glue

5:What Is Amazon EMR? - Amazon EMR

6:Amazon EC2 - Amazon Web Services (AWS)

7:Using Amazon Redshift Spectrum to Query External Data - Amazon Redshift

A manufacturing company needs to identify returned smartphones that have been damaged by moisture. The company has an automated process that produces 2.000 diagnostic values for each phone. The database contains more than five million phone evaluations. The evaluation process is consistent, and there are no missing values in the data. A machine learning (ML) specialist has trained an Amazon SageMaker linear learner ML model to classify phones as moisture damaged or not moisture damaged by using all available features. The model's F1 score is 0.6.

What changes in model training would MOST likely improve the model's F1 score? (Select TWO.)

A.
Continue to use the SageMaker linear learner algorithm. Reduce the number of features with the SageMaker principal component analysis (PCA) algorithm.
A.
Continue to use the SageMaker linear learner algorithm. Reduce the number of features with the SageMaker principal component analysis (PCA) algorithm.
Answers
B.
Continue to use the SageMaker linear learner algorithm. Reduce the number of features with the scikit-iearn multi-dimensional scaling (MDS) algorithm.
B.
Continue to use the SageMaker linear learner algorithm. Reduce the number of features with the scikit-iearn multi-dimensional scaling (MDS) algorithm.
Answers
C.
Continue to use the SageMaker linear learner algorithm. Set the predictor type to regressor.
C.
Continue to use the SageMaker linear learner algorithm. Set the predictor type to regressor.
Answers
D.
Use the SageMaker k-means algorithm with k of less than 1.000 to train the model
D.
Use the SageMaker k-means algorithm with k of less than 1.000 to train the model
Answers
E.
Use the SageMaker k-nearest neighbors (k-NN) algorithm. Set a dimension reduction target of less than 1,000 to train the model.
E.
Use the SageMaker k-nearest neighbors (k-NN) algorithm. Set a dimension reduction target of less than 1,000 to train the model.
Answers
Suggested answer: A, E

Explanation:

Option A is correct because reducing the number of features with the SageMaker PCA algorithm can help remove noise and redundancy from the data, and improve the model's performance. PCA is a dimensionality reduction technique that transforms the original features into a smaller set of linearly uncorrelated features called principal components. The SageMaker linear learner algorithm supports PCA as a built-in feature transformation option.

Option E is correct because using the SageMaker k-NN algorithm with a dimension reduction target of less than 1,000 can help the model learn from the similarity of the data points, and improve the model's performance. k-NN is a non-parametric algorithm that classifies an input based on the majority vote of its k nearest neighbors in the feature space. The SageMaker k-NN algorithm supports dimension reduction as a built-in feature transformation option.

Option B is incorrect because using the scikit-learn MDS algorithm to reduce the number of features is not a feasible option, as MDS is a computationally expensive technique that does not scale well to large datasets. MDS is a dimensionality reduction technique that tries to preserve the pairwise distances between the original data points in a lower-dimensional space.

Option C is incorrect because setting the predictor type to regressor would change the model's objective from classification to regression, which is not suitable for the given problem. A regressor model would output a continuous value instead of a binary label for each phone.

Option D is incorrect because using the SageMaker k-means algorithm with k of less than 1,000 would not help the model classify the phones, as k-means is a clustering algorithm that groups the data points into k clusters based on their similarity, without using any labels. A clustering model would not output a binary label for each phone.

References:

Amazon SageMaker Linear Learner Algorithm

Amazon SageMaker K-Nearest Neighbors (k-NN) Algorithm

[Principal Component Analysis - Scikit-learn]

[Multidimensional Scaling - Scikit-learn]

A company deployed a machine learning (ML) model on the company website to predict real estate prices. Several months after deployment, an ML engineer notices that the accuracy of the model has gradually decreased.

The ML engineer needs to improve the accuracy of the model. The engineer also needs to receive notifications for any future performance issues.

Which solution will meet these requirements?

A.
Perform incremental training to update the model. Activate Amazon SageMaker Model Monitor to detect model performance issues and to send notifications.
A.
Perform incremental training to update the model. Activate Amazon SageMaker Model Monitor to detect model performance issues and to send notifications.
Answers
B.
Use Amazon SageMaker Model Governance. Configure Model Governance to automatically adjust model hyper para meters. Create a performance threshold alarm in Amazon CloudWatch to send notifications.
B.
Use Amazon SageMaker Model Governance. Configure Model Governance to automatically adjust model hyper para meters. Create a performance threshold alarm in Amazon CloudWatch to send notifications.
Answers
C.
Use Amazon SageMaker Debugger with appropriate thresholds. Configure Debugger to send Amazon CloudWatch alarms to alert the team Retrain the model by using only data from the previous several months.
C.
Use Amazon SageMaker Debugger with appropriate thresholds. Configure Debugger to send Amazon CloudWatch alarms to alert the team Retrain the model by using only data from the previous several months.
Answers
D.
Use only data from the previous several months to perform incremental training to update the model. Use Amazon SageMaker Model Monitor to detect model performance issues and to send notifications.
D.
Use only data from the previous several months to perform incremental training to update the model. Use Amazon SageMaker Model Monitor to detect model performance issues and to send notifications.
Answers
Suggested answer: A

Explanation:

The best solution to improve the accuracy of the model and receive notifications for any future performance issues is to perform incremental training to update the model and activate Amazon SageMaker Model Monitor to detect model performance issues and to send notifications. Incremental training is a technique that allows you to update an existing model with new data without retraining the entire model from scratch. This can save time and resources, and help the model adapt to changing data patterns. Amazon SageMaker Model Monitor is a feature that continuously monitors the quality of machine learning models in production and notifies you when there are deviations in the model quality, such as data drift and anomalies. You can set up alerts that trigger actions, such as sending notifications to Amazon Simple Notification Service (Amazon SNS) topics, when certain conditions are met.

Option B is incorrect because Amazon SageMaker Model Governance is a set of tools that help you implement ML responsibly by simplifying access control and enhancing transparency. It does not provide a mechanism to automatically adjust model hyperparameters or improve model accuracy.

Option C is incorrect because Amazon SageMaker Debugger is a feature that helps you debug and optimize your model training process by capturing relevant data and providing real-time analysis. However, using Debugger alone does not update the model or monitor its performance in production. Also, retraining the model by using only data from the previous several months may not capture the full range of data variability and may introduce bias or overfitting.

Option D is incorrect because using only data from the previous several months to perform incremental training may not be sufficient to improve the model accuracy, as explained above. Moreover, this option does not specify how to activate Amazon SageMaker Model Monitor or configure the alerts and notifications.

References:

Incremental training

Amazon SageMaker Model Monitor

Amazon SageMaker Model Governance

Amazon SageMaker Debugger

A university wants to develop a targeted recruitment strategy to increase new student enrollment. A data scientist gathers information about the academic performance history of students. The data scientist wants to use the data to build student profiles. The university will use the profiles to direct resources to recruit students who are likely to enroll in the university.

Which combination of steps should the data scientist take to predict whether a particular student applicant is likely to enroll in the university? (Select TWO)

A.
Use Amazon SageMaker Ground Truth to sort the data into two groups named 'enrolled' or 'not enrolled.'
A.
Use Amazon SageMaker Ground Truth to sort the data into two groups named 'enrolled' or 'not enrolled.'
Answers
B.
Use a forecasting algorithm to run predictions.
B.
Use a forecasting algorithm to run predictions.
Answers
C.
Use a regression algorithm to run predictions.
C.
Use a regression algorithm to run predictions.
Answers
D.
Use a classification algorithm to run predictions
D.
Use a classification algorithm to run predictions
Answers
E.
Use the built-in Amazon SageMaker k-means algorithm to cluster the data into two groups named 'enrolled' or 'not enrolled.'
E.
Use the built-in Amazon SageMaker k-means algorithm to cluster the data into two groups named 'enrolled' or 'not enrolled.'
Answers
Suggested answer: A, D

Explanation:

The data scientist should use Amazon SageMaker Ground Truth to sort the data into two groups named ''enrolled'' or ''not enrolled.'' This will create a labeled dataset that can be used for supervised learning. The data scientist should then use a classification algorithm to run predictions on the test data. A classification algorithm is a suitable choice for predicting a binary outcome, such as enrollment status, based on the input features, such as academic performance. A classification algorithm will output a probability for each class label and assign the most likely label to each observation.

References:

Use Amazon SageMaker Ground Truth to Label Data

Classification Algorithm in Machine Learning

A real-estate company is launching a new product that predicts the prices of new houses. The historical data for the properties and prices is stored in .csv format in an Amazon S3 bucket. The data has a header, some categorical fields, and some missing values. The company's data scientists have used Python with a common open-source library to fill the missing values with zeros. The data scientists have dropped all of the categorical fields and have trained a model by using the open-source linear regression algorithm with the default parameters.

The accuracy of the predictions with the current model is below 50%. The company wants to improve the model performance and launch the new product as soon as possible.

Which solution will meet these requirements with the LEAST operational overhead?

A.
Create a service-linked role for Amazon Elastic Container Service (Amazon ECS) with access to the S3 bucket. Create an ECS cluster that is based on an AWS Deep Learning Containers image. Write the code to perform the feature engineering. Train a logistic regression model for predicting the price, pointing to the bucket with the dataset. Wait for the training job to complete. Perform the inferences.
A.
Create a service-linked role for Amazon Elastic Container Service (Amazon ECS) with access to the S3 bucket. Create an ECS cluster that is based on an AWS Deep Learning Containers image. Write the code to perform the feature engineering. Train a logistic regression model for predicting the price, pointing to the bucket with the dataset. Wait for the training job to complete. Perform the inferences.
Answers
B.
Create an Amazon SageMaker notebook with a new IAM role that is associated with the notebook. Pull the dataset from the S3 bucket. Explore different combinations of feature engineering transformations, regression algorithms, and hyperparameters. Compare all the results in the notebook, and deploy the most accurate configuration in an endpoint for predictions.
B.
Create an Amazon SageMaker notebook with a new IAM role that is associated with the notebook. Pull the dataset from the S3 bucket. Explore different combinations of feature engineering transformations, regression algorithms, and hyperparameters. Compare all the results in the notebook, and deploy the most accurate configuration in an endpoint for predictions.
Answers
C.
Create an IAM role with access to Amazon S3, Amazon SageMaker, and AWS Lambda. Create a training job with the SageMaker built-in XGBoost model pointing to the bucket with the dataset. Specify the price as the target feature. Wait for the job to complete. Load the model artifact to a Lambda function for inference on prices of new houses.
C.
Create an IAM role with access to Amazon S3, Amazon SageMaker, and AWS Lambda. Create a training job with the SageMaker built-in XGBoost model pointing to the bucket with the dataset. Specify the price as the target feature. Wait for the job to complete. Load the model artifact to a Lambda function for inference on prices of new houses.
Answers
D.
Create an IAM role for Amazon SageMaker with access to the S3 bucket. Create a SageMaker AutoML job with SageMaker Autopilot pointing to the bucket with the dataset. Specify the price as the target attribute. Wait for the job to complete. Deploy the best model for predictions.
D.
Create an IAM role for Amazon SageMaker with access to the S3 bucket. Create a SageMaker AutoML job with SageMaker Autopilot pointing to the bucket with the dataset. Specify the price as the target attribute. Wait for the job to complete. Deploy the best model for predictions.
Answers
Suggested answer: D

Explanation:

The solution D meets the requirements with the least operational overhead because it uses Amazon SageMaker Autopilot, which is a fully managed service that automates the end-to-end process of building, training, and deploying machine learning models. Amazon SageMaker Autopilot can handle data preprocessing, feature engineering, algorithm selection, hyperparameter tuning, and model deployment. The company only needs to create an IAM role for Amazon SageMaker with access to the S3 bucket, create a SageMaker AutoML job pointing to the bucket with the dataset, specify the price as the target attribute, and wait for the job to complete.Amazon SageMaker Autopilot will generate a list of candidate models with different configurations and performance metrics, and the company can deploy the best model for predictions1.

The other options are not suitable because:

Option A: Creating a service-linked role for Amazon Elastic Container Service (Amazon ECS) with access to the S3 bucket, creating an ECS cluster based on an AWS Deep Learning Containers image, writing the code to perform the feature engineering, training a logistic regression model for predicting the price, and performing the inferences will incur more operational overhead than using Amazon SageMaker Autopilot. The company will have to manage the ECS cluster, the container image, the code, the model, and the inference endpoint.Moreover, logistic regression may not be the best algorithm for predicting the price, as it is more suitable for binary classification tasks2.

Option B: Creating an Amazon SageMaker notebook with a new IAM role that is associated with the notebook, pulling the dataset from the S3 bucket, exploring different combinations of feature engineering transformations, regression algorithms, and hyperparameters, comparing all the results in the notebook, and deploying the most accurate configuration in an endpoint for predictions will incur more operational overhead than using Amazon SageMaker Autopilot. The company will have to write the code for the feature engineering, the model training, the model evaluation, and the model deployment.The company will also have to manually compare the results and select the best configuration3.

Option C: Creating an IAM role with access to Amazon S3, Amazon SageMaker, and AWS Lambda, creating a training job with the SageMaker built-in XGBoost model pointing to the bucket with the dataset, specifying the price as the target feature, loading the model artifact to a Lambda function for inference on prices of new houses will incur more operational overhead than using Amazon SageMaker Autopilot. The company will have to create and manage the Lambda function, the model artifact, and the inference endpoint.Moreover, XGBoost may not be the best algorithm for predicting the price, as it is more suitable for classification and ranking tasks4.

References:

1: Amazon SageMaker Autopilot

2: Amazon Elastic Container Service

3: Amazon SageMaker Notebook Instances

4: Amazon SageMaker XGBoost Algorithm

A data scientist is training a large PyTorch model by using Amazon SageMaker. It takes 10 hours on average to train the model on GPU instances. The data scientist suspects that training is not converging and that resource utilization is not optimal.

What should the data scientist do to identify and address training issues with the LEAST development effort?

A.
Use CPU utilization metrics that are captured in Amazon CloudWatch. Configure a CloudWatch alarm to stop the training job early if low CPU utilization occurs.
A.
Use CPU utilization metrics that are captured in Amazon CloudWatch. Configure a CloudWatch alarm to stop the training job early if low CPU utilization occurs.
Answers
B.
Use high-resolution custom metrics that are captured in Amazon CloudWatch. Configure an AWS Lambda function to analyze the metrics and to stop the training job early if issues are detected.
B.
Use high-resolution custom metrics that are captured in Amazon CloudWatch. Configure an AWS Lambda function to analyze the metrics and to stop the training job early if issues are detected.
Answers
C.
Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.
C.
Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.
Answers
D.
Use the SageMaker Debugger confusion and feature_importance_overweight built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.
D.
Use the SageMaker Debugger confusion and feature_importance_overweight built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.
Answers
Suggested answer: C

Explanation:

The solution C is the best option to identify and address training issues with the least development effort. The solution C involves the following steps:

Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues. SageMaker Debugger is a feature of Amazon SageMaker that allows data scientists to monitor, analyze, and debug machine learning models during training.SageMaker Debugger provides a set of built-in rules that can automatically detect common issues and anomalies in model training, such as vanishing or exploding gradients, overfitting, underfitting, low GPU utilization, and more1. The data scientist can use the vanishing_gradient rule to check if the gradients are becoming too small and causing the training to not converge.The data scientist can also use the LowGPUUtilization rule to check if the GPU resources are underutilized and causing the training to be inefficient2.

Launch the StopTrainingJob action if issues are detected. SageMaker Debugger can also take actions based on the status of the rules. One of the actions is StopTrainingJob, which can terminate the training job if a rule is in an error state.This can help the data scientist to save time and money by stopping the training early if issues are detected3.

The other options are not suitable because:

Option A: Using CPU utilization metrics that are captured in Amazon CloudWatch and configuring a CloudWatch alarm to stop the training job early if low CPU utilization occurs will not identify and address training issues effectively. CPU utilization is not a good indicator of model training performance, especially for GPU instances.Moreover, CloudWatch alarms can only trigger actions based on simple thresholds, not complex rules or conditions4.

Option B: Using high-resolution custom metrics that are captured in Amazon CloudWatch and configuring an AWS Lambda function to analyze the metrics and to stop the training job early if issues are detected will incur more development effort than using SageMaker Debugger. The data scientist will have to write the code for capturing, sending, and analyzing the custom metrics, as well as for invoking the Lambda function and stopping the training job.Moreover, this solution may not be able to detect all the issues that SageMaker Debugger can5.

Option D: Using the SageMaker Debugger confusion and feature_importance_overweight built-in rules and launching the StopTrainingJob action if issues are detected will not identify and address training issues effectively. The confusion rule is used to monitor the confusion matrix of a classification model, which is not relevant for a regression model that predicts prices.The feature_importance_overweight rule is used to check if some features have too much weight in the model, which may not be related to the convergence or resource utilization issues2.

References:

1: Amazon SageMaker Debugger

2: Built-in Rules for Amazon SageMaker Debugger

3: Actions for Amazon SageMaker Debugger

4: Amazon CloudWatch Alarms

5: Amazon CloudWatch Custom Metrics

A company needs to deploy a chatbot to answer common questions from customers. The chatbot must base its answers on company documentation.

Which solution will meet these requirements with the LEAST development effort?

A.
Index company documents by using Amazon Kendra. Integrate the chatbot with Amazon Kendra by using the Amazon Kendra Query API operation to answer customer questions.
A.
Index company documents by using Amazon Kendra. Integrate the chatbot with Amazon Kendra by using the Amazon Kendra Query API operation to answer customer questions.
Answers
B.
Train a Bidirectional Attention Flow (BiDAF) network based on past customer questions and company documents. Deploy the model as a real-time Amazon SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.
B.
Train a Bidirectional Attention Flow (BiDAF) network based on past customer questions and company documents. Deploy the model as a real-time Amazon SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.
Answers
C.
Train an Amazon SageMaker BlazingText model based on past customer questions and company documents. Deploy the model as a real-time SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.
C.
Train an Amazon SageMaker BlazingText model based on past customer questions and company documents. Deploy the model as a real-time SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.
Answers
D.
Index company documents by using Amazon OpenSearch Service. Integrate the chatbot with OpenSearch Service by using the OpenSearch Service k-nearest neighbors (k-NN) Query API operation to answer customer questions.
D.
Index company documents by using Amazon OpenSearch Service. Integrate the chatbot with OpenSearch Service by using the OpenSearch Service k-nearest neighbors (k-NN) Query API operation to answer customer questions.
Answers
Suggested answer: A

Explanation:

The solution A will meet the requirements with the least development effort because it uses Amazon Kendra, which is a highly accurate and easy to use intelligent search service powered by machine learning. Amazon Kendra can index company documents from various sources and formats, such as PDF, HTML, Word, and more. Amazon Kendra can also integrate with chatbots by using the Amazon Kendra Query API operation, which can understand natural language questions and provide relevant answers from the indexed documents.Amazon Kendra can also provide additional information, such as document excerpts, links, and FAQs, to enhance the chatbot experience1.

The other options are not suitable because:

Option B: Training a Bidirectional Attention Flow (BiDAF) network based on past customer questions and company documents, deploying the model as a real-time Amazon SageMaker endpoint, and integrating the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation will incur more development effort than using Amazon Kendra. The company will have to write the code for the BiDAF network, which is a complex deep learning model for question answering.The company will also have to manage the SageMaker endpoint, the model artifact, and the inference logic2.

Option C: Training an Amazon SageMaker BlazingText model based on past customer questions and company documents, deploying the model as a real-time SageMaker endpoint, and integrating the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation will incur more development effort than using Amazon Kendra. The company will have to write the code for the BlazingText model, which is a fast and scalable text classification and word embedding algorithm.The company will also have to manage the SageMaker endpoint, the model artifact, and the inference logic3.

Option D: Indexing company documents by using Amazon OpenSearch Service and integrating the chatbot with OpenSearch Service by using the OpenSearch Service k-nearest neighbors (k-NN) Query API operation will not meet the requirements effectively. Amazon OpenSearch Service is a fully managed service that provides fast and scalable search and analytics capabilities. However, it is not designed for natural language question answering, and it may not provide accurate or relevant answers for the chatbot.Moreover, the k-NN Query API operation is used to find the most similar documents or vectors based on a distance function, not to find the best answers based on a natural language query4.

References:

1: Amazon Kendra

2: Bidirectional Attention Flow for Machine Comprehension

3: Amazon SageMaker BlazingText

4: Amazon OpenSearch Service

A company ingests machine learning (ML) data from web advertising clicks into an Amazon S3 data lake. Click data is added to an Amazon Kinesis data stream by using the Kinesis Producer Library (KPL). The data is loaded into the S3 data lake from the data stream by using an Amazon Kinesis Data Firehose delivery stream. As the data volume increases, an ML specialist notices that the rate of data ingested into Amazon S3 is relatively constant. There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest.

Which next step is MOST likely to improve the data ingestion rate into Amazon S3?

A.
Increase the number of S3 prefixes for the delivery stream to write to.
A.
Increase the number of S3 prefixes for the delivery stream to write to.
Answers
B.
Decrease the retention period for the data stream.
B.
Decrease the retention period for the data stream.
Answers
C.
Increase the number of shards for the data stream.
C.
Increase the number of shards for the data stream.
Answers
D.
Add more consumers using the Kinesis Client Library (KCL).
D.
Add more consumers using the Kinesis Client Library (KCL).
Answers
Suggested answer: C

Explanation:

The solution C is the most likely to improve the data ingestion rate into Amazon S3 because it increases the number of shards for the data stream. The number of shards determines the throughput capacity of the data stream, which affects the rate of data ingestion. Each shard can support up to 1 MB per second of data input and 2 MB per second of data output. By increasing the number of shards, the company can increase the data ingestion rate proportionally.The company can use the UpdateShardCount API operation to modify the number of shards in the data stream1.

The other options are not likely to improve the data ingestion rate into Amazon S3 because:

Option A: Increasing the number of S3 prefixes for the delivery stream to write to will not affect the data ingestion rate, as it only changes the way the data is organized in the S3 bucket.The number of S3 prefixes can help to optimize the performance of downstream applications that read the data from S3, but it does not impact the performance of Kinesis Data Firehose2.

Option B: Decreasing the retention period for the data stream will not affect the data ingestion rate, as it only changes the amount of time the data is stored in the data stream.The retention period can help to manage the data availability and durability, but it does not impact the throughput capacity of the data stream3.

Option D: Adding more consumers using the Kinesis Client Library (KCL) will not affect the data ingestion rate, as it only changes the way the data is processed by downstream applications.The consumers can help to scale the data processing and handle failures, but they do not impact the data ingestion into S3 by Kinesis Data Firehose4.

References:

1: Resharding - Amazon Kinesis Data Streams

2: Amazon S3 Prefixes - Amazon Kinesis Data Firehose

3: Data Retention - Amazon Kinesis Data Streams

4: Developing Consumers Using the Kinesis Client Library - Amazon Kinesis Data Streams

Total 308 questions
Go to page: of 31