ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 307 - MLS-C01 discussion

Report
Export

A data scientist uses Amazon SageMaker Data Wrangler to analyze and visualize data. The data scientist wants to refine a training dataset by selecting predictor variables that are strongly predictive of the target variable. The target variable correlates with other predictor variables.

The data scientist wants to understand the variance in the data along various directions in the feature space.

Which solution will meet these requirements?

A.

Use the SageMaker Data Wrangler multicollinearity measurement features with a variance inflation factor (VIF) score. Use the VIF score as a measurement of how closely the variables are related to each other.

Answers
A.

Use the SageMaker Data Wrangler multicollinearity measurement features with a variance inflation factor (VIF) score. Use the VIF score as a measurement of how closely the variables are related to each other.

B.

Use the SageMaker Data Wrangler Data Quality and Insights Report quick model visualization to estimate the expected quality of a model that is trained on the data.

Answers
B.

Use the SageMaker Data Wrangler Data Quality and Insights Report quick model visualization to estimate the expected quality of a model that is trained on the data.

C.

Use the SageMaker Data Wrangler multicollinearity measurement features with the principal component analysis (PCA) algorithm to provide a feature space that includes all of the predictor variables.

Answers
C.

Use the SageMaker Data Wrangler multicollinearity measurement features with the principal component analysis (PCA) algorithm to provide a feature space that includes all of the predictor variables.

D.

Use the SageMaker Data Wrangler Data Quality and Insights Report feature to review features by their predictive power.

Answers
D.

Use the SageMaker Data Wrangler Data Quality and Insights Report feature to review features by their predictive power.

Suggested answer: C

Explanation:

Principal Component Analysis (PCA) is a dimensionality reduction technique that captures the variance within the feature space, helping to understand the directions in which data varies most. In SageMaker Data Wrangler, the multicollinearity measurement and PCA features allow the data scientist to analyze interdependencies between predictor variables while reducing redundancy. PCA transforms correlated features into a set of uncorrelated components, helping to simplify the dataset without significant loss of information, making it ideal for refining features based on variance.

Options A and D offer methods to understand feature relevance but are less effective for managing multicollinearity and variance representation in the data.

asked 31/10/2024
Hendrik Woldhuis
50 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first