ExamGecko
Home Home / Amazon / MLS-C01

Amazon MLS-C01 Practice Test - Questions Answers, Page 23

Question list
Search
Search

List of questions

Search

Related questions











A medical device company is building a machine learning (ML) model to predict the likelihood of device recall based on customer data that the company collects from a plain text survey. One of the survey questions asks which medications the customer is taking. The data for this field contains the names of medications that customers enter manually. Customers misspell some of the medication names. The column that contains the medication name data gives a categorical feature with high cardinality but redundancy.

What is the MOST effective way to encode this categorical feature into a numeric feature?

A.
Spell check the column. Use Amazon SageMaker one-hot encoding on the column to transform a categorical feature to a numerical feature.
A.
Spell check the column. Use Amazon SageMaker one-hot encoding on the column to transform a categorical feature to a numerical feature.
Answers
B.
Fix the spelling in the column by using char-RNN. Use Amazon SageMaker Data Wrangler one-hot encoding to transform a categorical feature to a numerical feature.
B.
Fix the spelling in the column by using char-RNN. Use Amazon SageMaker Data Wrangler one-hot encoding to transform a categorical feature to a numerical feature.
Answers
C.
Use Amazon SageMaker Data Wrangler similarity encoding on the column to create embeddings Of vectors Of real numbers.
C.
Use Amazon SageMaker Data Wrangler similarity encoding on the column to create embeddings Of vectors Of real numbers.
Answers
D.
Use Amazon SageMaker Data Wrangler ordinal encoding on the column to encode categories into an integer between O and the total number Of categories in the column.
D.
Use Amazon SageMaker Data Wrangler ordinal encoding on the column to encode categories into an integer between O and the total number Of categories in the column.
Answers
Suggested answer: C

Explanation:

The most effective way to encode this categorical feature into a numeric feature is to use Amazon SageMaker Data Wrangler similarity encoding on the column to create embeddings of vectors of real numbers. Similarity encoding is a technique that transforms categorical features into numerical features by computing the similarity between the categories. Similarity encoding can handle high cardinality and redundancy in categorical features, as it can group similar categories together based on their string similarity. For example, if the column contains the values ''aspirin'', ''asprin'', and ''ibuprofen'', similarity encoding will assign a high similarity score to ''aspirin'' and ''asprin'', and a low similarity score to ''ibuprofen''. Similarity encoding can also create embeddings of vectors of real numbers, which can be used as input for machine learning models. Amazon SageMaker Data Wrangler is a feature of Amazon SageMaker that enables you to prepare data for machine learning quickly and easily.You can use SageMaker Data Wrangler to apply similarity encoding to a column of categorical data, and generate embeddings of vectors of real numbers that capture the similarity between the categories1. The other options are either less effective or more complex to implement. Spell checking the column and using one-hot encoding would require additional steps and resources, and may not capture all the misspellings or redundancies. One-hot encoding would also create a large number of features, which could increase the dimensionality and sparsity of the data. Ordinal encoding would assign an arbitrary order to the categories, which could introduce bias or noise in the data.References:

1: Amazon SageMaker Data Wrangler -- Amazon Web Services

A manufacturing company wants to create a machine learning (ML) model to predict when equipment is likely to fail. A data science team already constructed a deep learning model by using TensorFlow and a custom Python script in a local environment. The company wants to use Amazon SageMaker to train the model.

Which TensorFlow estimator configuration will train the model MOST cost-effectively?

A.
Turn on SageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as a parameter. Pass the script to the estimator in the call to the TensorFlow fit() method.
A.
Turn on SageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as a parameter. Pass the script to the estimator in the call to the TensorFlow fit() method.
Answers
B.
Turn on SageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as a parameter. Turn on managed spot training by setting the use_spot_instances parameter to True. Pass the script to the estimator in the call to the TensorFlow fit() method.
B.
Turn on SageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as a parameter. Turn on managed spot training by setting the use_spot_instances parameter to True. Pass the script to the estimator in the call to the TensorFlow fit() method.
Answers
C.
Adjust the training script to use distributed data parallelism. Specify appropriate values for the distribution parameter. Pass the script to the estimator in the call to the TensorFlow fit() method.
C.
Adjust the training script to use distributed data parallelism. Specify appropriate values for the distribution parameter. Pass the script to the estimator in the call to the TensorFlow fit() method.
Answers
D.
Turn on SageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as a parameter. Set the MaxWaitTimeInSeconds parameter to be equal to the MaxRuntimeInSeconds parameter. Pass the script to the estimator in the call to the TensorFlow fit() method.
D.
Turn on SageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as a parameter. Set the MaxWaitTimeInSeconds parameter to be equal to the MaxRuntimeInSeconds parameter. Pass the script to the estimator in the call to the TensorFlow fit() method.
Answers
Suggested answer: B

Explanation:

The TensorFlow estimator configuration that will train the model most cost-effectively is to turn on SageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as a parameter, turn on managed spot training by setting the use_spot_instances parameter to True, and pass the script to the estimator in the call to the TensorFlow fit() method. This configuration will optimize the model for the target hardware platform, reduce the training cost by using Amazon EC2 Spot Instances, and use the custom Python script without any modification.

SageMaker Training Compiler is a feature of Amazon SageMaker that enables you to optimize your TensorFlow, PyTorch, and MXNet models for inference on a variety of target hardware platforms. SageMaker Training Compiler can improve the inference performance and reduce the inference cost of your models by applying various compilation techniques, such as operator fusion, quantization, pruning, and graph optimization.You can enable SageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as a parameter to the TensorFlow estimator constructor1.

Managed spot training is another feature of Amazon SageMaker that enables you to use Amazon EC2 Spot Instances for training your machine learning models. Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS Cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices. You can use Spot Instances for various fault-tolerant and flexible applications.You can enable managed spot training by setting the use_spot_instances parameter to True and specifying the max_wait and max_run parameters in the TensorFlow estimator constructor2.

The TensorFlow estimator is a class in the SageMaker Python SDK that allows you to train and deploy TensorFlow models on SageMaker. You can use the TensorFlow estimator to run your own Python script on SageMaker, without any modification. You can pass the script to the estimator in the call to the TensorFlow fit() method, along with the location of your input data.The fit() method starts a SageMaker training job and runs your script as the entry point in the training containers3.

The other options are either less cost-effective or more complex to implement. Adjusting the training script to use distributed data parallelism would require modifying the script and specifying appropriate values for the distribution parameter, which could increase the development time and complexity. Setting the MaxWaitTimeInSeconds parameter to be equal to the MaxRuntimeInSeconds parameter would not reduce the cost, as it would only specify the maximum duration of the training job, regardless of the instance type.

References:

1: Optimize TensorFlow, PyTorch, and MXNet models for deployment using Amazon SageMaker Training Compiler | AWS Machine Learning Blog

2: Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs | AWS Machine Learning Blog

3: sagemaker.tensorflow --- sagemaker 2.66.0 documentation

A company is creating an application to identify, count, and classify animal images that are uploaded to the company's website. The company is using the Amazon SageMaker image classification algorithm with an ImageNetV2 convolutional neural network (CNN). The solution works well for most animal images but does not recognize many animal species that are less common.

The company obtains 10,000 labeled images of less common animal species and stores the images in Amazon S3. A machine learning (ML) engineer needs to incorporate the images into the model by using Pipe mode in SageMaker.

Which combination of steps should the ML engineer take to train the model? (Choose two.)

A.
Use a ResNet model. Initiate full training mode by initializing the network with random weights.
A.
Use a ResNet model. Initiate full training mode by initializing the network with random weights.
Answers
B.
Use an Inception model that is available with the SageMaker image classification algorithm.
B.
Use an Inception model that is available with the SageMaker image classification algorithm.
Answers
C.
Create a .lst file that contains a list of image files and corresponding class labels. Upload the .lst file to Amazon S3.
C.
Create a .lst file that contains a list of image files and corresponding class labels. Upload the .lst file to Amazon S3.
Answers
D.
Initiate transfer learning. Train the model by using the images of less common species.
D.
Initiate transfer learning. Train the model by using the images of less common species.
Answers
E.
Use an augmented manifest file in JSON Lines format.
E.
Use an augmented manifest file in JSON Lines format.
Answers
Suggested answer: C, D

Explanation:

The combination of steps that the ML engineer should take to train the model are to create a .lst file that contains a list of image files and corresponding class labels, upload the .lst file to Amazon S3, and initiate transfer learning by training the model using the images of less common species. This approach will allow the ML engineer to leverage the existing ImageNetV2 CNN model and fine-tune it with the new data using Pipe mode in SageMaker.

A .lst file is a text file that contains a list of image files and corresponding class labels, separated by tabs. The .lst file format is required for using the SageMaker image classification algorithm with Pipe mode. Pipe mode is a feature of SageMaker that enables streaming data directly from Amazon S3 to the training instances, without downloading the data first. Pipe mode can reduce the startup time, improve the I/O throughput, and enable training on large datasets that exceed the disk size limit.To use Pipe mode, the ML engineer needs to upload the .lst file to Amazon S3 and specify the S3 path as the input data channel for the training job1.

Transfer learning is a technique that enables reusing a pre-trained model for a new task by fine-tuning the model parameters with new data. Transfer learning can save time and computational resources, as well as improve the performance of the model, especially when the new task is similar to the original task. The SageMaker image classification algorithm supports transfer learning by allowing the ML engineer to specify the number of output classes and the number of layers to be retrained.The ML engineer can use the existing ImageNetV2 CNN model, which is trained on 1,000 classes of common objects, and fine-tune it with the new data of less common animal species, which is a similar task2.

The other options are either less effective or not supported by the SageMaker image classification algorithm. Using a ResNet model and initiating full training mode would require training the model from scratch, which would take more time and resources than transfer learning. Using an Inception model is not possible, as the SageMaker image classification algorithm only supports ResNet and ImageNetV2 models.Using an augmented manifest file in JSON Lines format is not compatible with Pipe mode, as Pipe mode only supports .lst files for image classification1.

References:

1: Using Pipe input mode for Amazon SageMaker algorithms | AWS Machine Learning Blog

2: Image Classification Algorithm - Amazon SageMaker

A credit card company wants to identify fraudulent transactions in real time. A data scientist builds a machine learning model for this purpose. The transactional data is captured and stored in Amazon S3. The historic data is already labeled with two classes: fraud (positive) and fair transactions (negative). The data scientist removes all the missing data and builds a classifier by using the XGBoost algorithm in Amazon SageMaker. The model produces the following results:

* True positive rate (TPR): 0.700

* False negative rate (FNR): 0.300

* True negative rate (TNR): 0.977

* False positive rate (FPR): 0.023

* Overall accuracy: 0.949

Which solution should the data scientist use to improve the performance of the model?

A.
Apply the Synthetic Minority Oversampling Technique (SMOTE) on the minority class in the training dataset. Retrain the model with the updated training data.
A.
Apply the Synthetic Minority Oversampling Technique (SMOTE) on the minority class in the training dataset. Retrain the model with the updated training data.
Answers
B.
Apply the Synthetic Minority Oversampling Technique (SMOTE) on the majority class in the training dataset. Retrain the model with the updated training data.
B.
Apply the Synthetic Minority Oversampling Technique (SMOTE) on the majority class in the training dataset. Retrain the model with the updated training data.
Answers
C.
Undersample the minority class.
C.
Undersample the minority class.
Answers
D.
Oversample the majority class.
D.
Oversample the majority class.
Answers
Suggested answer: A

Explanation:

The solution that the data scientist should use to improve the performance of the model is to apply the Synthetic Minority Oversampling Technique (SMOTE) on the minority class in the training dataset, and retrain the model with the updated training data. This solution can address the problem of class imbalance in the dataset, which can affect the model's ability to learn from the rare but important positive class (fraud).

Class imbalance is a common issue in machine learning, especially for classification tasks. It occurs when one class (usually the positive or target class) is significantly underrepresented in the dataset compared to the other class (usually the negative or non-target class). For example, in the credit card fraud detection problem, the positive class (fraud) is much less frequent than the negative class (fair transactions). This can cause the model to be biased towards the majority class, and fail to capture the characteristics and patterns of the minority class. As a result, the model may have a high overall accuracy, but a low recall or true positive rate for the minority class, which means it misses many fraudulent transactions.

SMOTE is a technique that can help mitigate the class imbalance problem by generating synthetic samples for the minority class. SMOTE works by finding the k-nearest neighbors of each minority class instance, and randomly creating new instances along the line segments connecting them. This way, SMOTE can increase the number and diversity of the minority class instances, without duplicating or losing any information.By applying SMOTE on the minority class in the training dataset, the data scientist can balance the classes and improve the model's performance on the positive class1.

The other options are either ineffective or counterproductive. Applying SMOTE on the majority class would not balance the classes, but increase the imbalance and the size of the dataset. Undersampling the minority class would reduce the number of instances available for the model to learn from, and potentially lose some important information. Oversampling the majority class would also increase the imbalance and the size of the dataset, and introduce redundancy and overfitting.

References:

1: SMOTE for Imbalanced Classification with Python - Machine Learning Mastery

A company's data scientist has trained a new machine learning model that performs better on test data than the company's existing model performs in the production environment. The data scientist wants to replace the existing model that runs on an Amazon SageMaker endpoint in the production environment. However, the company is concerned that the new model might not work well on the production environment data.

The data scientist needs to perform A/B testing in the production environment to evaluate whether the new model performs well on production environment data.

Which combination of steps must the data scientist take to perform the A/B testing? (Choose two.)

A.
Create a new endpoint configuration that includes a production variant for each of the two models.
A.
Create a new endpoint configuration that includes a production variant for each of the two models.
Answers
B.
Create a new endpoint configuration that includes two target variants that point to different endpoints.
B.
Create a new endpoint configuration that includes two target variants that point to different endpoints.
Answers
C.
Deploy the new model to the existing endpoint.
C.
Deploy the new model to the existing endpoint.
Answers
D.
Update the existing endpoint to activate the new model.
D.
Update the existing endpoint to activate the new model.
Answers
E.
Update the existing endpoint to use the new endpoint configuration.
E.
Update the existing endpoint to use the new endpoint configuration.
Answers
Suggested answer: A, E

Explanation:

The combination of steps that the data scientist must take to perform the A/B testing are to create a new endpoint configuration that includes a production variant for each of the two models, and update the existing endpoint to use the new endpoint configuration. This approach will allow the data scientist to deploy both models on the same endpoint and split the inference traffic between them based on a specified distribution.

Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to quickly build, train, and deploy machine learning models. Amazon SageMaker supports A/B testing on machine learning models by allowing the data scientist to run multiple production variants on an endpoint. A production variant is a version of a model that is deployed on an endpoint. Each production variant has a name, a machine learning model, an instance type, an initial instance count, and an initial weight. The initial weight determines the percentage of inference requests that the variant will handle. For example, if there are two variants with weights of 0.5 and 0.5, each variant will handle 50% of the requests.The data scientist can use production variants to test models that have been trained using different training datasets, algorithms, and machine learning frameworks; test how they perform on different instance types; or a combination of all of the above1.

To perform A/B testing on machine learning models, the data scientist needs to create a new endpoint configuration that includes a production variant for each of the two models. An endpoint configuration is a collection of settings that define the properties of an endpoint, such as the name, the production variants, and the data capture configuration. The data scientist can use the Amazon SageMaker console, the AWS CLI, or the AWS SDKs to create a new endpoint configuration.The data scientist needs to specify the name, model name, instance type, initial instance count, and initial variant weight for each production variant in the endpoint configuration2.

After creating the new endpoint configuration, the data scientist needs to update the existing endpoint to use the new endpoint configuration. Updating an endpoint is the process of deploying a new endpoint configuration to an existing endpoint. Updating an endpoint does not affect the availability or scalability of the endpoint, as Amazon SageMaker creates a new endpoint instance with the new configuration and switches the DNS record to point to the new instance when it is ready. The data scientist can use the Amazon SageMaker console, the AWS CLI, or the AWS SDKs to update an endpoint.The data scientist needs to specify the name of the endpoint and the name of the new endpoint configuration to update the endpoint3.

The other options are either incorrect or unnecessary. Creating a new endpoint configuration that includes two target variants that point to different endpoints is not possible, as target variants are only used to invoke a specific variant on an endpoint, not to define an endpoint configuration. Deploying the new model to the existing endpoint would replace the existing model, not run it side-by-side with the new model. Updating the existing endpoint to activate the new model is not a valid operation, as there is no activation parameter for an endpoint.

References:

1: A/B Testing ML models in production using Amazon SageMaker | AWS Machine Learning Blog

2: Create an Endpoint Configuration - Amazon SageMaker

3: Update an Endpoint - Amazon SageMaker

An online store is predicting future book sales by using a linear regression model that is based on past sales data. The data includes duration, a numerical feature that represents the number of days that a book has been listed in the online store. A data scientist performs an exploratory data analysis and discovers that the relationship between book sales and duration is skewed and non-linear.

Which data transformation step should the data scientist take to improve the predictions of the model?

A.
One-hot encoding
A.
One-hot encoding
Answers
B.
Cartesian product transformation
B.
Cartesian product transformation
Answers
C.
Quantile binning
C.
Quantile binning
Answers
D.
Normalization
D.
Normalization
Answers
Suggested answer: C

Explanation:

Quantile binning is a data transformation technique that can be used to handle skewed and non-linear numerical features. It divides the range of a feature into equal-sized bins based on the percentiles of the data. Each bin is assigned a numerical value that represents the midpoint of the bin. This way, the feature values are transformed into a more uniform distribution that can improve the performance of linear models. Quantile binning can also reduce the impact of outliers and noise in the data.

One-hot encoding, Cartesian product transformation, and normalization are not suitable for this scenario. One-hot encoding is used to transform categorical features into binary features. Cartesian product transformation is used to create new features by combining existing features. Normalization is used to scale numerical features to a standard range, but it does not change the shape of the distribution.References:

Data Transformations for Machine Learning

Quantile Binning Transformation

A beauty supply store wants to understand some characteristics of visitors to the store. The store has security video recordings from the past several years. The store wants to generate a report of hourly visitors from the recordings. The report should group visitors by hair style and hair color.

Which solution will meet these requirements with the LEAST amount of effort?

A.
Use an object detection algorithm to identify a visitor's hair in video frames. Pass the identified hair to an ResNet-50 algorithm to determine hair style and hair color.
A.
Use an object detection algorithm to identify a visitor's hair in video frames. Pass the identified hair to an ResNet-50 algorithm to determine hair style and hair color.
Answers
B.
Use an object detection algorithm to identify a visitor's hair in video frames. Pass the identified hair to an XGBoost algorithm to determine hair style and hair color.
B.
Use an object detection algorithm to identify a visitor's hair in video frames. Pass the identified hair to an XGBoost algorithm to determine hair style and hair color.
Answers
C.
Use a semantic segmentation algorithm to identify a visitor's hair in video frames. Pass the identified hair to an ResNet-50 algorithm to determine hair style and hair color.
C.
Use a semantic segmentation algorithm to identify a visitor's hair in video frames. Pass the identified hair to an ResNet-50 algorithm to determine hair style and hair color.
Answers
D.
Use a semantic segmentation algorithm to identify a visitor's hair in video frames. Pass the identified hair to an XGBoost algorithm to determine hair style and hair.
D.
Use a semantic segmentation algorithm to identify a visitor's hair in video frames. Pass the identified hair to an XGBoost algorithm to determine hair style and hair.
Answers
Suggested answer: C

Explanation:

The solution that will meet the requirements with the least amount of effort is to use a semantic segmentation algorithm to identify a visitor's hair in video frames, and pass the identified hair to an ResNet-50 algorithm to determine hair style and hair color. This solution can leverage the existing Amazon SageMaker algorithms and frameworks to perform the tasks of hair segmentation and classification.

Semantic segmentation is a computer vision technique that assigns a class label to every pixel in an image, such that pixels with the same label share certain characteristics. Semantic segmentation can be used to identify and isolate different objects or regions in an image, such as a visitor's hair in a video frame. Amazon SageMaker provides a built-in semantic segmentation algorithm that can train and deploy models for semantic segmentation tasks. The algorithm supports three state-of-the-art network architectures: Fully Convolutional Network (FCN), Pyramid Scene Parsing Network (PSP), and DeepLab v3. The algorithm can also use pre-trained or randomly initialized ResNet-50 or ResNet-101 as the backbone network.The algorithm can be trained using P2/P3 type Amazon EC2 instances in single machine configurations1.

ResNet-50 is a convolutional neural network that is 50 layers deep and can classify images into 1000 object categories. ResNet-50 is trained on more than a million images from the ImageNet database and can achieve high accuracy on various image recognition tasks. ResNet-50 can be used to determine hair style and hair color from the segmented hair regions in the video frames. Amazon SageMaker provides a built-in image classification algorithm that can use ResNet-50 as the network architecture. The algorithm can also perform transfer learning by fine-tuning the pre-trained ResNet-50 model with new data.The algorithm can be trained using P2/P3 type Amazon EC2 instances in single or multiple machine configurations2.

The other options are either less effective or more complex to implement. Using an object detection algorithm to identify a visitor's hair in video frames would not segment the hair at the pixel level, but only draw bounding boxes around the hair regions. This could result in inaccurate or incomplete hair segmentation, especially if the hair is occluded or has irregular shapes. Using an XGBoost algorithm to determine hair style and hair color would require transforming the segmented hair images into numerical features, which could lose some information or introduce noise. XGBoost is also not designed for image classification tasks, and may not achieve high accuracy or performance.

References:

1: Semantic Segmentation Algorithm - Amazon SageMaker

2: Image Classification Algorithm - Amazon SageMaker

A company wants to predict stock market price trends. The company stores stock market data each business day in Amazon S3 in Apache Parquet format. The company stores 20 GB of data each day for each stock code.

A data engineer must use Apache Spark to perform batch preprocessing data transformations quickly so the company can complete prediction jobs before the stock market opens the next day. The company plans to track more stock market codes and needs a way to scale the preprocessing data transformations.

Which AWS service or feature will meet these requirements with the LEAST development effort over time?

A.
AWS Glue jobs
A.
AWS Glue jobs
Answers
B.
Amazon EMR cluster
B.
Amazon EMR cluster
Answers
C.
Amazon Athena
C.
Amazon Athena
Answers
D.
AWS Lambda
D.
AWS Lambda
Answers
Suggested answer: A

Explanation:

AWS Glue jobs is the AWS service or feature that will meet the requirements with the least development effort over time. AWS Glue jobs is a fully managed service that enables data engineers to run Apache Spark applications on a serverless Spark environment. AWS Glue jobs can perform batch preprocessing data transformations on large datasets stored in Amazon S3, such as converting data formats, filtering data, joining data, and aggregating data. AWS Glue jobs can also scale the Spark environment automatically based on the data volume and processing needs, without requiring any infrastructure provisioning or management. AWS Glue jobs can reduce the development effort and time by providing a graphical interface to create and monitor Spark applications, as well as a code generation feature that can generate Scala or Python code based on the data sources and targets.AWS Glue jobs can also integrate with other AWS services, such as Amazon Athena, Amazon EMR, and Amazon SageMaker, to enable further data analysis and machine learning tasks1.

The other options are either more complex or less scalable than AWS Glue jobs. Amazon EMR cluster is a managed service that enables data engineers to run Apache Spark applications on a cluster of Amazon EC2 instances. However, Amazon EMR cluster requires more development effort and time than AWS Glue jobs, as it involves setting up, configuring, and managing the cluster, as well as writing and deploying the Spark code.Amazon EMR cluster also does not scale automatically, but requires manual or scheduled resizing of the cluster based on the data volume and processing needs2. Amazon Athena is a serverless interactive query service that enables data engineers to analyze data stored in Amazon S3 using standard SQL. However, Amazon Athena is not suitable for performing complex data transformations, such as joining data from multiple sources, aggregating data, or applying custom logic.Amazon Athena is also not designed for running Spark applications, but only supports SQL queries3. AWS Lambda is a serverless compute service that enables data engineers to run code without provisioning or managing servers. However, AWS Lambda is not optimized for running Spark applications, as it has limitations on the execution time, memory size, and concurrency of the functions. AWS Lambda is also not integrated with Amazon S3, and requires additional steps to read and write data from S3 buckets.

References:

1: AWS Glue - Fully Managed ETL Service - Amazon Web Services

2: Amazon EMR - Amazon Web Services

3: Amazon Athena -- Interactive SQL Queries for Data in Amazon S3

[4]: AWS Lambda -- Serverless Compute - Amazon Web Services

A company wants to enhance audits for its machine learning (ML) systems. The auditing system must be able to perform metadata analysis on the features that the ML models use. The audit solution must generate a report that analyzes the metadata. The solution also must be able to set the data sensitivity and authorship of features.

Which solution will meet these requirements with the LEAST development effort?

A.
Use Amazon SageMaker Feature Store to select the features. Create a data flow to perform feature-level metadata analysis. Create an Amazon DynamoDB table to store feature-level metadata. Use Amazon QuickSight to analyze the metadata.
A.
Use Amazon SageMaker Feature Store to select the features. Create a data flow to perform feature-level metadata analysis. Create an Amazon DynamoDB table to store feature-level metadata. Use Amazon QuickSight to analyze the metadata.
Answers
B.
Use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use. Assign the required metadata for each feature. Use SageMaker Studio to analyze the metadata.
B.
Use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use. Assign the required metadata for each feature. Use SageMaker Studio to analyze the metadata.
Answers
C.
Use Amazon SageMaker Features Store to apply custom algorithms to analyze the feature-level metadata that the company requires. Create an Amazon DynamoDB table to store feature-level metadata. Use Amazon QuickSight to analyze the metadata.
C.
Use Amazon SageMaker Features Store to apply custom algorithms to analyze the feature-level metadata that the company requires. Create an Amazon DynamoDB table to store feature-level metadata. Use Amazon QuickSight to analyze the metadata.
Answers
D.
Use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use. Assign the required metadata for each feature. Use Amazon QuickSight to analyze the metadata.
D.
Use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use. Assign the required metadata for each feature. Use Amazon QuickSight to analyze the metadata.
Answers
Suggested answer: D

Explanation:

The solution that will meet the requirements with the least development effort is to use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use, assign the required metadata for each feature, and use Amazon QuickSight to analyze the metadata. This solution can leverage the existing AWS services and features to perform feature-level metadata analysis and reporting.

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, update, search, and share machine learning (ML) features. The service provides feature management capabilities such as enabling easy feature reuse, low latency serving, time travel, and ensuring consistency between features used in training and inference workflows. A feature group is a logical grouping of ML features whose organization and structure is defined by a feature group schema. A feature group schema consists of a list of feature definitions, each of which specifies the name, type, and metadata of a feature. The metadata can include information such as data sensitivity, authorship, description, and parameters. The metadata can help make features discoverable, understandable, and traceable.Amazon SageMaker Feature Store allows users to set feature groups for the current features that the ML models use, and assign the required metadata for each feature using the AWS SDK for Python (Boto3), AWS Command Line Interface (AWS CLI), or Amazon SageMaker Studio1.

Amazon QuickSight is a fully managed, serverless business intelligence service that makes it easy to create and publish interactive dashboards that include ML insights. Amazon QuickSight can connect to various data sources, such as Amazon S3, Amazon Athena, Amazon Redshift, and Amazon SageMaker Feature Store, and analyze the data using standard SQL or built-in ML-powered analytics. Amazon QuickSight can also create rich visualizations and reports that can be accessed from any device, and securely shared with anyone inside or outside an organization.Amazon QuickSight can be used to analyze the metadata of the features stored in Amazon SageMaker Feature Store, and generate a report that summarizes the metadata analysis2.

The other options are either more complex or less effective than the proposed solution. Using Amazon SageMaker Data Wrangler to select the features and create a data flow to perform feature-level metadata analysis would require additional steps and resources, and may not capture all the metadata attributes that the company requires. Creating an Amazon DynamoDB table to store feature-level metadata would introduce redundancy and inconsistency, as the metadata is already stored in Amazon SageMaker Feature Store. Using SageMaker Studio to analyze the metadata would not generate a report that can be easily shared and accessed by the company.

References:

1: Amazon SageMaker Feature Store -- Amazon Web Services

2: Amazon QuickSight -- Business Intelligence Service - Amazon Web Services

A machine learning (ML) engineer has created a feature repository in Amazon SageMaker Feature Store for the company. The company has AWS accounts for development, integration, and production. The company hosts a feature store in the development account. The company uses Amazon S3 buckets to store feature values offline. The company wants to share features and to allow the integration account and the production account to reuse the features that are in the feature repository.

Which combination of steps will meet these requirements? (Select TWO.)

A.
Create an IAM role in the development account that the integration account and production account can assume. Attach IAM policies to the role that allow access to the feature repository and the S3 buckets.
A.
Create an IAM role in the development account that the integration account and production account can assume. Attach IAM policies to the role that allow access to the feature repository and the S3 buckets.
Answers
B.
Share the feature repository that is associated the S3 buckets from the development account to the integration account and the production account by using AWS Resource Access Manager (AWS RAM).
B.
Share the feature repository that is associated the S3 buckets from the development account to the integration account and the production account by using AWS Resource Access Manager (AWS RAM).
Answers
C.
Use AWS Security Token Service (AWS STS) from the integration account and the production account to retrieve credentials for the development account.
C.
Use AWS Security Token Service (AWS STS) from the integration account and the production account to retrieve credentials for the development account.
Answers
D.
Set up S3 replication between the development S3 buckets and the integration and production S3 buckets.
D.
Set up S3 replication between the development S3 buckets and the integration and production S3 buckets.
Answers
E.
Create an AWS PrivateLink endpoint in the development account for SageMaker.
E.
Create an AWS PrivateLink endpoint in the development account for SageMaker.
Answers
Suggested answer: A, B

Explanation:

The combination of steps that will meet the requirements are to create an IAM role in the development account that the integration account and production account can assume, attach IAM policies to the role that allow access to the feature repository and the S3 buckets, and share the feature repository that is associated with the S3 buckets from the development account to the integration account and the production account by using AWS Resource Access Manager (AWS RAM). This approach will enable cross-account access and sharing of the features stored in Amazon SageMaker Feature Store and Amazon S3.

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, update, search, and share curated data used in training and prediction workflows. The service provides feature management capabilities such as enabling easy feature reuse, low latency serving, time travel, and ensuring consistency between features used in training and inference workflows. A feature group is a logical grouping of ML features whose organization and structure is defined by a feature group schema. A feature group schema consists of a list of feature definitions, each of which specifies the name, type, and metadata of a feature. Amazon SageMaker Feature Store stores the features in both an online store and an offline store. The online store is a low-latency, high-throughput store that is optimized for real-time inference.The offline store is a historical store that is backed by an Amazon S3 bucket and is optimized for batch processing and model training1.

AWS Identity and Access Management (IAM) is a web service that helps you securely control access to AWS resources for your users. You use IAM to control who can use your AWS resources (authentication) and what resources they can use and in what ways (authorization). An IAM role is an IAM identity that you can create in your account that has specific permissions. You can use an IAM role to delegate access to users, applications, or services that don't normally have access to your AWS resources. For example, you can create an IAM role in your development account that allows the integration account and the production account to assume the role and access the resources in the development account. You can attach IAM policies to the role that specify the permissions for the feature repository and the S3 buckets.You can also use IAM conditions to restrict the access based on the source account, IP address, or other factors2.

AWS Resource Access Manager (AWS RAM) is a service that enables you to easily and securely share AWS resources with any AWS account or within your AWS Organization. You can share AWS resources that you own with other accounts using resource shares. A resource share is an entity that defines the resources that you want to share, and the principals that you want to share with. For example, you can share the feature repository that is associated with the S3 buckets from the development account to the integration account and the production account by creating a resource share in AWS RAM. You can specify the feature group ARN and the S3 bucket ARN as the resources, and the integration account ID and the production account ID as the principals.You can also use IAM policies to further control the access to the shared resources3.

The other options are either incorrect or unnecessary. Using AWS Security Token Service (AWS STS) from the integration account and the production account to retrieve credentials for the development account is not required, as the IAM role in the development account can provide temporary security credentials for the cross-account access. Setting up S3 replication between the development S3 buckets and the integration and production S3 buckets would introduce redundancy and inconsistency, as the S3 buckets are already shared through AWS RAM. Creating an AWS PrivateLink endpoint in the development account for SageMaker is not relevant, as it is used to securely connect to SageMaker services from a VPC, not from another account.

References:

1: Amazon SageMaker Feature Store -- Amazon Web Services

2: What Is IAM? - AWS Identity and Access Management

3: What Is AWS Resource Access Manager? - AWS Resource Access Manager

Total 308 questions
Go to page: of 31