ExamGecko
Home Home / Amazon / MLS-C01

Amazon MLS-C01 Practice Test - Questions Answers, Page 31

Question list
Search
Search

List of questions

Search

Related questions











A company has a podcast platform that has thousands of users. The company implemented an algorithm to detect low podcast engagement based on a 10-minute running window of user events such as listening to. pausing, and closing the podcast. A machine learning (ML) specialist is designing the ingestion process for these events. The ML specialist needs to transform the data to prepare the data for inference.

How should the ML specialist design the transformation step to meet these requirements with the LEAST operational effort?

A.

Use an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster to ingest event data. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to transform the most recent 10 minutes of data before inference.

A.

Use an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster to ingest event data. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to transform the most recent 10 minutes of data before inference.

Answers
B.

Use Amazon Kinesis Data Streams to ingest event data. Store the data in Amazon S3 by using Amazon Data Firehose. Use AWS Lambda to transform the most recent 10 minutes of data before inference.

B.

Use Amazon Kinesis Data Streams to ingest event data. Store the data in Amazon S3 by using Amazon Data Firehose. Use AWS Lambda to transform the most recent 10 minutes of data before inference.

Answers
C.

Use Amazon Kinesis Data Streams to ingest event data. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to transform the most recent 10 minutes of data before inference.

C.

Use Amazon Kinesis Data Streams to ingest event data. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to transform the most recent 10 minutes of data before inference.

Answers
D.

Use an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster to ingest event data. Use AWS Lambda to transform the most recent 10 minutes of data before inference.

D.

Use an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster to ingest event data. Use AWS Lambda to transform the most recent 10 minutes of data before inference.

Answers
Suggested answer: C

Explanation:

In this scenario, Kinesis Data Streams efficiently ingests real-time event data, while Amazon Managed Service for Apache Flink (formerly Amazon Kinesis Data Analytics) is ideal for transforming and analyzing data in a continuous stream. Apache Flink allows processing of time-based windows, such as the 10-minute sliding window required here, with low operational overhead.

This combination provides an effective solution for low-latency data processing and transformation, meeting the requirements for preparing data for inference with minimal setup and serverless scalability.

A finance company has collected stock return data for 5.000 publicly traded companies. A financial analyst has a dataset that contains 2.000 attributes for each company. The financial analyst wants to use Amazon SageMaker to identify the top 15 attributes that are most valuable to predict future stock returns.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use the linear learner algorithm in SageMaker to train a linear regression model to predict the stock returns. Identify the most predictive features by ranking absolute coefficient values.

A.

Use the linear learner algorithm in SageMaker to train a linear regression model to predict the stock returns. Identify the most predictive features by ranking absolute coefficient values.

Answers
B.

Use random forest regression in SageMaker to train a model to predict the stock returns. Identify the most predictive features based on Gini importance scores.

B.

Use random forest regression in SageMaker to train a model to predict the stock returns. Identify the most predictive features based on Gini importance scores.

Answers
C.

Use an Amazon SageMaker Data Wrangler quick model visualization to predict the stock returns. Identify the most predictive features based on the quick model's feature importance scores.

C.

Use an Amazon SageMaker Data Wrangler quick model visualization to predict the stock returns. Identify the most predictive features based on the quick model's feature importance scores.

Answers
D.

Use Amazon SageMaker Autopilot to build a regression model to predict the stock returns. Identify the most predictive features based on an Amazon SageMaker Clarify report.

D.

Use Amazon SageMaker Autopilot to build a regression model to predict the stock returns. Identify the most predictive features based on an Amazon SageMaker Clarify report.

Answers
Suggested answer: D

Explanation:

Amazon SageMaker Autopilot is a fully managed solution that automatically explores different ML models and selects the most effective ones for a given prediction task. After model training, Amazon SageMaker Clarify can generate feature importance scores, identifying the top features in a straightforward, automated manner with minimal manual intervention.

By using SageMaker Autopilot, the data scientist can obtain the desired feature importance ranking for predictive attributes with minimal setup and low operational overhead, as opposed to manually configuring models in SageMaker.

A company's machine learning (ML) specialist is designing a scalable data storage solution for Amazon SageMaker. The company has an existing TensorFlow-based model that uses a train.py script. The model relies on static training data that is currently stored in TFRecord format.

What should the ML specialist do to provide the training data to SageMaker with the LEAST development overhead?

A.

Put the TFRecord data into an Amazon S3 bucket. Use AWS Glue or AWS Lambda to reformat the data to protobuf format and store the data in a second S3 bucket. Point the SageMaker training invocation to the second S3 bucket.

A.

Put the TFRecord data into an Amazon S3 bucket. Use AWS Glue or AWS Lambda to reformat the data to protobuf format and store the data in a second S3 bucket. Point the SageMaker training invocation to the second S3 bucket.

Answers
B.

Rewrite the train.py script to add a section that converts TFRecord data to protobuf format. Point the SageMaker training invocation to the local path of the data. Ingest the protobuf data instead of the TFRecord data.

B.

Rewrite the train.py script to add a section that converts TFRecord data to protobuf format. Point the SageMaker training invocation to the local path of the data. Ingest the protobuf data instead of the TFRecord data.

Answers
C.

Use SageMaker script mode, and use train.py unchanged. Point the SageMaker training invocation to the local path of the data without reformatting the training data.

C.

Use SageMaker script mode, and use train.py unchanged. Point the SageMaker training invocation to the local path of the data without reformatting the training data.

Answers
D.

Use SageMaker script mode, and use train.py unchanged. Put the TFRecord data into an Amazon S3 bucket. Point the SageMaker training invocation to the S3 bucket without reformatting the training data.

D.

Use SageMaker script mode, and use train.py unchanged. Put the TFRecord data into an Amazon S3 bucket. Point the SageMaker training invocation to the S3 bucket without reformatting the training data.

Answers
Suggested answer: D

Explanation:

Amazon SageMaker script mode allows users to bring custom training scripts (such as train.py) without needing extensive modifications for specific data formats like TFRecord. By storing the TFRecord data in an Amazon S3 bucket and pointing the SageMaker training job to this bucket, the model can directly access the data, allowing the ML specialist to train the model without additional reformatting or data processing steps.

This approach minimizes development overhead and leverages SageMaker's built-in support for custom training scripts and S3 integration, making it the most efficient choice.

An ecommerce company wants to train a large image classification model with 10.000 classes. The company runs multiple model training iterations and needs to minimize operational overhead and cost. The company also needs to avoid loss of work and model retraining.

Which solution will meet these requirements?

A.

Create the training jobs as AWS Batch jobs that use Amazon EC2 Spot Instances in a managed compute environment.

A.

Create the training jobs as AWS Batch jobs that use Amazon EC2 Spot Instances in a managed compute environment.

Answers
B.

Use Amazon EC2 Spot Instances to run the training jobs. Use a Spot Instance interruption notice to save a snapshot of the model to Amazon S3 before an instance is terminated.

B.

Use Amazon EC2 Spot Instances to run the training jobs. Use a Spot Instance interruption notice to save a snapshot of the model to Amazon S3 before an instance is terminated.

Answers
C.

Use AWS Lambda to run the training jobs. Save model weights to Amazon S3.

C.

Use AWS Lambda to run the training jobs. Save model weights to Amazon S3.

Answers
D.

Use managed spot training in Amazon SageMaker. Launch the training jobs with checkpointing enabled.

D.

Use managed spot training in Amazon SageMaker. Launch the training jobs with checkpointing enabled.

Answers
Suggested answer: D

Explanation:

Amazon SageMaker managed spot training allows for cost-effective training by utilizing Spot Instances, which are lower-cost EC2 instances that can be interrupted when demand is high. By enabling checkpointing in SageMaker, the company can save intermediate model states to Amazon S3, allowing training to resume from the last checkpoint if interrupted. This solution minimizes operational overhead by automating the checkpointing process and resuming work after interruptions, reducing the need for retraining from scratch.

This setup provides a reliable and cost-efficient approach to training large models with minimal operational overhead and risk of data loss.

A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products.

Which solution will meet these requirements with the MOST operational efficiency?

A.

Build a custom clustering model. Create a Dockerfile and build a Docker image. Register the Docker image in Amazon Elastic Container Registry (Amazon ECR). Use the custom image in Amazon SageMaker to generate a trained model.

A.

Build a custom clustering model. Create a Dockerfile and build a Docker image. Register the Docker image in Amazon Elastic Container Registry (Amazon ECR). Use the custom image in Amazon SageMaker to generate a trained model.

Answers
B.

Tokenize the data and transform the data into tabulai data. Train an Amazon SageMaker k-means mode to generate the product categories.

B.

Tokenize the data and transform the data into tabulai data. Train an Amazon SageMaker k-means mode to generate the product categories.

Answers
C.

Train an Amazon SageMaker Neural Topic Model (NTM) model to generate the product categories.

C.

Train an Amazon SageMaker Neural Topic Model (NTM) model to generate the product categories.

Answers
D.

Train an Amazon SageMaker Blazing Text model to generate the product categories.

D.

Train an Amazon SageMaker Blazing Text model to generate the product categories.

Answers
Suggested answer: C

Explanation:

Amazon SageMaker's Neural Topic Model (NTM) is designed to uncover underlying topics within text data by clustering documents based on topic similarity. For document categorization, NTM can identify product categories by analyzing and grouping the documents, making it an efficient choice for unsupervised learning where predefined categories do not exist.

A machine learning (ML) specialist is building a credit score model for a financial institution. The ML specialist has collected data for the previous 3 years of transactions and third-party metadata that is related to the transactions.

After the ML specialist builds the initial model, the ML specialist discovers that the model has low accuracy for both the training data and the test data. The ML specialist needs to improve the accuracy of the model.

Which solutions will meet this requirement? (Select TWO.)

A.

Increase the number of passes on the existing training data. Perform more hyperparameter tuning.

A.

Increase the number of passes on the existing training data. Perform more hyperparameter tuning.

Answers
B.

Increase the amount of regularization. Use fewer feature combinations.

B.

Increase the amount of regularization. Use fewer feature combinations.

Answers
C.

Add new domain-specific features. Use more complex models.

C.

Add new domain-specific features. Use more complex models.

Answers
D.

Use fewer feature combinations. Decrease the number of numeric attribute bins.

D.

Use fewer feature combinations. Decrease the number of numeric attribute bins.

Answers
E.

Decrease the amount of training data examples. Reduce the number of passes on the existing training data.

E.

Decrease the amount of training data examples. Reduce the number of passes on the existing training data.

Answers
Suggested answer: A, C

Explanation:

For a model with low accuracy on both training and testing datasets, the following two strategies are effective:

Increase the number of passes and perform hyperparameter tuning: This approach allows the model to better learn from the existing data and improve performance through optimized hyperparameters.

Add domain-specific features and use more complex models: Adding relevant features that capture additional information from domain knowledge and using more complex model architectures can help the model capture patterns better, potentially improving accuracy.

Options B, D, and E would either reduce feature complexity or training data volume, which is less likely to improve performance when accuracy is low on both training and testing sets.

A data scientist uses Amazon SageMaker Data Wrangler to analyze and visualize data. The data scientist wants to refine a training dataset by selecting predictor variables that are strongly predictive of the target variable. The target variable correlates with other predictor variables.

The data scientist wants to understand the variance in the data along various directions in the feature space.

Which solution will meet these requirements?

A.

Use the SageMaker Data Wrangler multicollinearity measurement features with a variance inflation factor (VIF) score. Use the VIF score as a measurement of how closely the variables are related to each other.

A.

Use the SageMaker Data Wrangler multicollinearity measurement features with a variance inflation factor (VIF) score. Use the VIF score as a measurement of how closely the variables are related to each other.

Answers
B.

Use the SageMaker Data Wrangler Data Quality and Insights Report quick model visualization to estimate the expected quality of a model that is trained on the data.

B.

Use the SageMaker Data Wrangler Data Quality and Insights Report quick model visualization to estimate the expected quality of a model that is trained on the data.

Answers
C.

Use the SageMaker Data Wrangler multicollinearity measurement features with the principal component analysis (PCA) algorithm to provide a feature space that includes all of the predictor variables.

C.

Use the SageMaker Data Wrangler multicollinearity measurement features with the principal component analysis (PCA) algorithm to provide a feature space that includes all of the predictor variables.

Answers
D.

Use the SageMaker Data Wrangler Data Quality and Insights Report feature to review features by their predictive power.

D.

Use the SageMaker Data Wrangler Data Quality and Insights Report feature to review features by their predictive power.

Answers
Suggested answer: C

Explanation:

Principal Component Analysis (PCA) is a dimensionality reduction technique that captures the variance within the feature space, helping to understand the directions in which data varies most. In SageMaker Data Wrangler, the multicollinearity measurement and PCA features allow the data scientist to analyze interdependencies between predictor variables while reducing redundancy. PCA transforms correlated features into a set of uncorrelated components, helping to simplify the dataset without significant loss of information, making it ideal for refining features based on variance.

Options A and D offer methods to understand feature relevance but are less effective for managing multicollinearity and variance representation in the data.

Acybersecurity company is collecting on-premises server logs, mobile app logs, and loT sensor data. The company backs up the ingested data in an Amazon S3 bucket and sends the ingested data to Amazon OpenSearch Service for further analysis. Currently, the company has a custom ingestion pipeline that is running on Amazon EC2 instances. The company needs to implement a new serverless ingestion pipeline that can automatically scale to handle sudden changes in the data flow.

Which solution will meet these requirements MOST cost-effectively?

A.

Create two Amazon Data Firehose delivery streams to send data to the S3 bucket and OpenSearch Service. Configure the data sources to send data to the delivery streams.

A.

Create two Amazon Data Firehose delivery streams to send data to the S3 bucket and OpenSearch Service. Configure the data sources to send data to the delivery streams.

Answers
B.

Create one Amazon Kinesis data stream. Create two Amazon Data Firehose delivery streams to send data to the S3 bucket and OpenSearch Service. Connect the delivery streams to the data stream. Configure the data sources to send data to the data stream.

B.

Create one Amazon Kinesis data stream. Create two Amazon Data Firehose delivery streams to send data to the S3 bucket and OpenSearch Service. Connect the delivery streams to the data stream. Configure the data sources to send data to the data stream.

Answers
C.

Create one Amazon Data Firehose delivery stream to send data to OpenSearch Service. Configure the delivery stream to back up the raw data to the S3 bucket. Configure the data sources to send data to the delivery stream.

C.

Create one Amazon Data Firehose delivery stream to send data to OpenSearch Service. Configure the delivery stream to back up the raw data to the S3 bucket. Configure the data sources to send data to the delivery stream.

Answers
D.

Create one Amazon Kinesis data stream. Create one Amazon Data Firehose delivery stream to send data to OpenSearch Service. Configure the delivery stream to back up the data to the S3 bucket. Connect the delivery stream to the data stream. Configure the data sources to send data to the data stream.

D.

Create one Amazon Kinesis data stream. Create one Amazon Data Firehose delivery stream to send data to OpenSearch Service. Configure the delivery stream to back up the data to the S3 bucket. Connect the delivery stream to the data stream. Configure the data sources to send data to the data stream.

Answers
Suggested answer: B

Explanation:

To build a scalable, serverless, and cost-effective data ingestion pipeline, this solution uses a Kinesis data stream to handle fluctuations in data flow, buffering and distributing incoming data in real time. By connecting two Amazon Kinesis Data Firehose delivery streams to the Kinesis data stream, the company can simultaneously route data to Amazon S3 for backup and Amazon OpenSearch Service for analysis.

This approach meets all requirements by providing automatic scaling, reducing operational overhead, and ensuring data storage and analysis without duplicating efforts or needing additional infrastructure.

Total 308 questions
Go to page: of 31