ExamGecko
Home Home / Amazon / MLS-C01

Amazon MLS-C01 Practice Test - Questions Answers, Page 30

Question list
Search
Search

List of questions

Search

Related questions











A data scientist is designing a repository that will contain many images of vehicles. The repository must scale automatically in size to store new images every day. The repository must support versioning of the images. The data scientist must implement a solution that maintains multiple immediately accessible copies of the data in different AWS Regions.

Which solution will meet these requirements?

A.

Amazon S3 with S3 Cross-Region Replication (CRR)

A.

Amazon S3 with S3 Cross-Region Replication (CRR)

Answers
B.

Amazon Elastic Block Store (Amazon EBS) with snapshots that are shared in a secondary Region

B.

Amazon Elastic Block Store (Amazon EBS) with snapshots that are shared in a secondary Region

Answers
C.

Amazon Elastic File System (Amazon EFS) Standard storage that is configured with Regional availability

C.

Amazon Elastic File System (Amazon EFS) Standard storage that is configured with Regional availability

Answers
D.

AWS Storage Gateway Volume Gateway

D.

AWS Storage Gateway Volume Gateway

Answers
Suggested answer: A

Explanation:

For a repository containing a large and dynamically scaling collection of images, Amazon S3 is ideal due to its scalability and versioning capabilities. Amazon S3 natively supports automatic scaling to accommodate increasing storage needs and allows versioning, which enables tracking and managing different versions of objects.

To meet the requirement of maintaining multiple, immediately accessible copies of data across AWS Regions, S3 Cross-Region Replication (CRR) can be enabled. CRR automatically replicates new or updated objects to a specified destination bucket in another AWS Region, ensuring low-latency access and disaster recovery. By setting up CRR with versioning enabled, the data scientist can achieve a multi-Region, scalable, and version-controlled repository in Amazon S3.

An insurance company is creating an application to automate car insurance claims. A machine learning (ML) specialist used an Amazon SageMaker Object Detection - TensorFlow built-in algorithm to train a model to detect scratches and dents in images of cars. After the model was trained, the ML specialist noticed that the model performed better on the training dataset than on the testing dataset.

Which approach should the ML specialist use to improve the performance of the model on the testing data?

A.

Increase the value of the momentum hyperparameter.

A.

Increase the value of the momentum hyperparameter.

Answers
B.

Reduce the value of the dropout_rate hyperparameter.

B.

Reduce the value of the dropout_rate hyperparameter.

Answers
C.

Reduce the value of the learning_rate hyperparameter.

C.

Reduce the value of the learning_rate hyperparameter.

Answers
D.

Increase the value of the L2 hyperparameter.

D.

Increase the value of the L2 hyperparameter.

Answers
Suggested answer: D

Explanation:

The machine learning model in this scenario shows signs of overfitting, as evidenced by better performance on the training dataset than on the testing dataset. Overfitting indicates that the model is capturing noise or details specific to the training data rather than general patterns.

One common approach to reduce overfitting is L2 regularization, which adds a penalty to the loss function for large weights and helps the model generalize better by smoothing out the weight distribution. By increasing the value of the L2 hyperparameter, the ML specialist can increase this penalty, helping to mitigate overfitting and improve performance on the testing dataset.

Options like increasing momentum or reducing dropout are less effective for addressing overfitting in this context.

A machine learning (ML) specialist is training a linear regression model. The specialist notices that the model is overfitting. The specialist applies an L1 regularization parameter and runs the model again. This change results in all features having zero weights.

What should the ML specialist do to improve the model results?

A.

Increase the L1 regularization parameter. Do not change any other training parameters.

A.

Increase the L1 regularization parameter. Do not change any other training parameters.

Answers
B.

Decrease the L1 regularization parameter. Do not change any other training parameters.

B.

Decrease the L1 regularization parameter. Do not change any other training parameters.

Answers
C.

Introduce a large L2 regularization parameter. Do not change the current L1 regularization value.

C.

Introduce a large L2 regularization parameter. Do not change the current L1 regularization value.

Answers
D.

Introduce a small L2 regularization parameter. Do not change the current L1 regularization value.

D.

Introduce a small L2 regularization parameter. Do not change the current L1 regularization value.

Answers
Suggested answer: B

Explanation:

Applying L1 regularization encourages sparsity by penalizing weights directly, often driving many weights to zero. In this case, the ML specialist observes that all weights become zero, which suggests that the L1 regularization parameter is set too high. This high value overly penalizes non-zero weights, effectively removing all features from the model.

To improve the model, the ML specialist should reduce the L1 regularization parameter, allowing some features to retain non-zero weights. This adjustment will make the model less prone to excessive sparsity, allowing it to better capture essential patterns in the data without dropping all features. Introducing L2 regularization is another approach but may not directly resolve this specific issue of all-zero weights as effectively as reducing L1.

A machine learning (ML) specialist at a retail company must build a system to forecast the daily sales for one of the company's stores. The company provided the ML specialist with sales data for this store from the past 10 years. The historical dataset includes the total amount of sales on each day for the store. Approximately 10% of the days in the historical dataset are missing sales data.

The ML specialist builds a forecasting model based on the historical dataset. The specialist discovers that the model does not meet the performance standards that the company requires.

Which action will MOST likely improve the performance for the forecasting model?

A.

Aggregate sales from stores in the same geographic area.

A.

Aggregate sales from stores in the same geographic area.

Answers
B.

Apply smoothing to correct for seasonal variation.

B.

Apply smoothing to correct for seasonal variation.

Answers
C.

Change the forecast frequency from daily to weekly.

C.

Change the forecast frequency from daily to weekly.

Answers
D.

Replace missing values in the dataset by using linear interpolation.

D.

Replace missing values in the dataset by using linear interpolation.

Answers
Suggested answer: D

Explanation:

When forecasting sales data, missing values can significantly impact model accuracy, especially for time series models. Approximately 10% of the days in this dataset lack sales data, which may cause gaps in patterns and disrupt seasonal trends. Linear interpolation is an effective technique for estimating and filling in missing data points based on adjacent known values, thus preserving the continuity of the time series.

By interpolating the missing values, the ML specialist can provide the model with a more complete and consistent dataset, potentially enhancing performance. This approach maintains the daily data granularity, which is important for accurately capturing trends at that frequency.

A company plans to build a custom natural language processing (NLP) model to classify and prioritize user feedback. The company hosts the data and all machine learning (ML) infrastructure in the AWS Cloud. The ML team works from the company's office, which has an IPsec VPN connection to one VPC in the AWS Cloud.

The company has set both the enableDnsHostnames attribute and the enableDnsSupport attribute of the VPC to true. The company's DNS resolvers point to the VPC DNS. The company does not allow the ML team to access Amazon SageMaker notebooks through connections that use the public internet. The connection must stay within a private network and within the AWS internal network.

Which solution will meet these requirements with the LEAST development effort?

A.

Create a VPC interface endpoint for the SageMaker notebook in the VPC. Access the notebook through a VPN connection and the VPC endpoint.

A.

Create a VPC interface endpoint for the SageMaker notebook in the VPC. Access the notebook through a VPN connection and the VPC endpoint.

Answers
B.

Create a bastion host by using Amazon EC2 in a public subnet within the VPC. Log in to the bastion host through a VPN connection. Access the SageMaker notebook from the bastion host.

B.

Create a bastion host by using Amazon EC2 in a public subnet within the VPC. Log in to the bastion host through a VPN connection. Access the SageMaker notebook from the bastion host.

Answers
C.

Create a bastion host by using Amazon EC2 in a private subnet within the VPC with a NAT gateway. Log in to the bastion host through a VPN connection. Access the SageMaker notebook from the bastion host.

C.

Create a bastion host by using Amazon EC2 in a private subnet within the VPC with a NAT gateway. Log in to the bastion host through a VPN connection. Access the SageMaker notebook from the bastion host.

Answers
D.

Create a NAT gateway in the VPC. Access the SageMaker notebook HTTPS endpoint through a VPN connection and the NAT gateway.

D.

Create a NAT gateway in the VPC. Access the SageMaker notebook HTTPS endpoint through a VPN connection and the NAT gateway.

Answers
Suggested answer: A

Explanation:

In this scenario, the company requires that access to the Amazon SageMaker notebook remain within the AWS internal network, avoiding the public internet. By creating a VPC interface endpoint for SageMaker, the company can ensure that traffic to the SageMaker notebook remains internal to the VPC and is accessible over a private connection. The VPC interface endpoint allows private network access to AWS services, and it operates over AWS's internal network, respecting the security and connectivity policies the company requires.

This solution requires minimal development effort compared to options involving bastion hosts or NAT gateways, as it directly provides private network access to the SageMaker notebook.

A bank has collected customer data for 10 years in CSV format. The bank stores the data in an on-premises server. A data science team wants to use Amazon SageMaker to build and train a machine learning (ML) model to predict churn probability. The team will use the historical data. The data scientists want to perform data transformations quickly and to generate data insights before the team builds a model for production.

Which solution will meet these requirements with the LEAST development effort?

A.

Upload the data into the SageMaker Data Wrangler console directly. Perform data transformations and generate insights within Data Wrangler.

A.

Upload the data into the SageMaker Data Wrangler console directly. Perform data transformations and generate insights within Data Wrangler.

Answers
B.

Upload the data into an Amazon S3 bucket. Allow SageMaker to access the data that is in the bucket. Import the data from the S3 bucket into SageMaker Data Wrangler. Perform data transformations and generate insights within Data Wrangler.

B.

Upload the data into an Amazon S3 bucket. Allow SageMaker to access the data that is in the bucket. Import the data from the S3 bucket into SageMaker Data Wrangler. Perform data transformations and generate insights within Data Wrangler.

Answers
C.

Upload the data into the SageMaker Data Wrangler console directly. Allow SageMaker and Amazon QuickSight to access the data that is in an Amazon S3 bucket. Perform data transformations in Data Wrangler and save the transformed data into a second S3 bucket. Use QuickSight to generate data insights.

C.

Upload the data into the SageMaker Data Wrangler console directly. Allow SageMaker and Amazon QuickSight to access the data that is in an Amazon S3 bucket. Perform data transformations in Data Wrangler and save the transformed data into a second S3 bucket. Use QuickSight to generate data insights.

Answers
D.

Upload the data into an Amazon S3 bucket. Allow SageMaker to access the data that is in the bucket. Import the data from the bucket into SageMaker Data Wrangler. Perform data transformations in Data Wrangler. Save the data into a second S3 bucket. Use a SageMaker Studio notebook to generate data insights.

D.

Upload the data into an Amazon S3 bucket. Allow SageMaker to access the data that is in the bucket. Import the data from the bucket into SageMaker Data Wrangler. Perform data transformations in Data Wrangler. Save the data into a second S3 bucket. Use a SageMaker Studio notebook to generate data insights.

Answers
Suggested answer: B

Explanation:

To prepare and transform historical data efficiently with minimal setup, Amazon SageMaker Data Wrangler is the optimal tool. Data Wrangler simplifies data preprocessing and exploratory data analysis (EDA) by providing a graphical interface for transformations and insights. By first uploading the CSV data to Amazon S3, the data becomes easily accessible to SageMaker and can be imported directly into Data Wrangler.

Once in Data Wrangler, the team can perform required data transformations and generate insights in a single workflow, avoiding the need for additional tools like Amazon QuickSight or further notebook configuration. This approach provides the simplest and most integrated solution for the data science team.

A machine learning (ML) engineer is preparing a dataset for a classification model. The ML engineer notices that some continuous numeric features have a significantly greater value than most other features. A business expert explains that the features are independently informative and that the dataset is representative of the target distribution.

After training, the model's inferences accuracy is lower than expected.

Which preprocessing technique will result in the GREATEST increase of the model's inference accuracy?

A.

Normalize the problematic features.

A.

Normalize the problematic features.

Answers
B.

Bootstrap the problematic features.

B.

Bootstrap the problematic features.

Answers
C.

Remove the problematic features.

C.

Remove the problematic features.

Answers
D.

Extrapolate synthetic features.

D.

Extrapolate synthetic features.

Answers
Suggested answer: A

Explanation:

In a classification model, features with significantly larger scales can dominate the model training process, leading to poor performance. Normalization scales the values of continuous features to a uniform range, such as [0, 1], which prevents large-value features from disproportionately influencing the model. This is particularly beneficial for algorithms sensitive to the scale of input data, such as neural networks or distance-based algorithms.

Given that the problematic features are informative and representative of the target distribution, removing or bootstrapping these features is not advisable. Normalization will bring all features to a similar scale and improve the model's inference accuracy without losing important information.

A data scientist needs to create a model for predictive maintenance. The model will be based on historical data to identify rare anomalies in the data.

The historical data is stored in an Amazon S3 bucket. The data scientist needs to use Amazon SageMaker Data Wrangler to ingest the data. The data scientists also needs to perform exploratory data analysis (EDA) to understand the statistical properties of the data.

Which solution will meet these requirements with the LEAST amount of compute resources?

A.

Import the data by using the None option.

A.

Import the data by using the None option.

Answers
B.

Import the data by using the Stratified option.

B.

Import the data by using the Stratified option.

Answers
C.

Import the data by using the First K option. Infer the value of K from domain knowledge.

C.

Import the data by using the First K option. Infer the value of K from domain knowledge.

Answers
D.

Import the data by using the Randomized option. Infer the random size from domain knowledge.

D.

Import the data by using the Randomized option. Infer the random size from domain knowledge.

Answers
Suggested answer: C

Explanation:

To perform efficient exploratory data analysis (EDA) on a large dataset for anomaly detection, using the First K option in SageMaker Data Wrangler is an optimal choice. This option allows the data scientist to select the first K rows, limiting the data loaded into memory, which conserves compute resources.

Given that the First K option allows the data scientist to determine K based on domain knowledge, this approach provides a representative sample without requiring extensive compute resources. Other options like randomized sampling may not provide data samples that are as useful for initial analysis in a time-series or sequential dataset context.

A company wants to use machine learning (ML) to improve its customer churn prediction model. The company stores data in an Amazon Redshift data warehouse.

A data science team wants to use Amazon Redshift machine learning (Amazon Redshift ML) to build a model and run predictions for new data directly within the data warehouse.

Which combination of steps should the company take to use Amazon Redshift ML to meet these requirements? (Select THREE.)

A.

Define the feature variables and target variable for the churn prediction model.

A.

Define the feature variables and target variable for the churn prediction model.

Answers
B.

Use the SQL EXPLAIN_MODEL function to run predictions.

B.

Use the SQL EXPLAIN_MODEL function to run predictions.

Answers
C.

Write a CREATE MODEL SQL statement to create a model.

C.

Write a CREATE MODEL SQL statement to create a model.

Answers
D.

Use Amazon Redshift Spectrum to train the model.

D.

Use Amazon Redshift Spectrum to train the model.

Answers
E.

Manually export the training data to Amazon S3.

E.

Manually export the training data to Amazon S3.

Answers
F.

Use the SQL prediction function to run predictions,

F.

Use the SQL prediction function to run predictions,

Answers
Suggested answer: A, C, F

Explanation:

Amazon Redshift ML enables in-database machine learning model creation and predictions, allowing data scientists to leverage Redshift for model training without needing to export data.

To create and run a model for customer churn prediction in Amazon Redshift ML:

Define the feature variables and target variable: Identify the columns to use as features (predictors) and the target variable (outcome) for the churn prediction model.

Create the model: Write a CREATE MODEL SQL statement, which trains the model using Amazon Redshift's integration with Amazon SageMaker and stores the model directly in Redshift.

Run predictions: Use the SQL PREDICT function to generate predictions on new data directly within Redshift.

Options B, D, and E are not required as Redshift ML handles model creation and prediction without manual data export to Amazon S3 or additional Spectrum integration.

A business to business (B2B) ecommerce company wants to develop a fair and equitable risk mitigation strategy to reject potentially fraudulent transactions. The company wants to reject fraudulent transactions despite the possibility of losing some profitable transactions or customers.

Which solution will meet these requirements with the LEAST operational effort?

A.

Use Amazon SageMaker to approve transactions only for products the company has sold in the past.

A.

Use Amazon SageMaker to approve transactions only for products the company has sold in the past.

Answers
B.

Use Amazon SageMaker to train a custom fraud detection model based on customer data.

B.

Use Amazon SageMaker to train a custom fraud detection model based on customer data.

Answers
C.

Use the Amazon Fraud Detector prediction API to approve or deny any activities that Fraud Detector identifies as fraudulent.

C.

Use the Amazon Fraud Detector prediction API to approve or deny any activities that Fraud Detector identifies as fraudulent.

Answers
D.

Use the Amazon Fraud Detector prediction API to identify potentially fraudulent activities so the company can review the activities and reject fraudulent transactions.

D.

Use the Amazon Fraud Detector prediction API to identify potentially fraudulent activities so the company can review the activities and reject fraudulent transactions.

Answers
Suggested answer: C

Explanation:

Amazon Fraud Detector is a managed service designed to detect potentially fraudulent online activities, such as transactions. It uses machine learning and business rules to classify activities as fraudulent or legitimate, minimizing the need for custom model training. By using the Amazon Fraud Detector prediction API, the company can automatically approve or reject transactions flagged as fraudulent, implementing an efficient risk mitigation strategy without extensive operational effort.

This approach requires minimal setup and effectively allows the company to block fraudulent transactions with high confidence, addressing the business's need to balance risk mitigation and customer impact.

Total 308 questions
Go to page: of 31