ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 47 - MLS-C01 discussion

Report
Export

A machine learning (ML) specialist must develop a classification model for a financial services company. A domain expert provides the dataset, which is tabular with 10,000 rows and 1,020 features. During exploratory data analysis, the specialist finds no missing values and a small percentage of duplicate rows. There are correlation scores of > 0.9 for 200 feature pairs. The mean value of each feature is similar to its 50th percentile.

Which feature engineering strategy should the ML specialist use with Amazon SageMaker?

A.
Apply dimensionality reduction by using the principal component analysis (PCA) algorithm.
Answers
A.
Apply dimensionality reduction by using the principal component analysis (PCA) algorithm.
B.
Drop the features with low correlation scores by using a Jupyter notebook.
Answers
B.
Drop the features with low correlation scores by using a Jupyter notebook.
C.
Apply anomaly detection by using the Random Cut Forest (RCF) algorithm.
Answers
C.
Apply anomaly detection by using the Random Cut Forest (RCF) algorithm.
D.
Concatenate the features with high correlation scores by using a Jupyter notebook.
Answers
D.
Concatenate the features with high correlation scores by using a Jupyter notebook.
Suggested answer: A

Explanation:

The best feature engineering strategy for this scenario is to apply dimensionality reduction by using the principal component analysis (PCA) algorithm. PCA is a technique that transforms a large set of correlated features into a smaller set of uncorrelated features called principal components. This can help reduce the complexity and noise in the data, improve the performance and interpretability of the model, and avoid overfitting. Amazon SageMaker provides a built-in PCA algorithm that can be used to perform dimensionality reduction on tabular data. The ML specialist can use Amazon SageMaker to train and deploy the PCA model, and then use the output of the PCA model as the input for the classification model.

References:

Dimensionality Reduction with Amazon SageMaker

Amazon SageMaker PCA Algorithm

asked 16/09/2024
Vijay Kumar
47 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first