ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 166 - MLS-C01 discussion

Report
Export

A Data Scientist needs to analyze employment data. The dataset contains approximately 10 million observations on people across 10 different features. During the preliminary analysis, the Data Scientist notices that income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distribution also show a right skew, with fewer older individuals participating in the workforce.

Which feature transformations can the Data Scientist apply to fix the incorrectly skewed data? (Choose two.)

A.
Cross-validation
Answers
A.
Cross-validation
B.
Numerical value binning
Answers
B.
Numerical value binning
C.
High-degree polynomial transformation
Answers
C.
High-degree polynomial transformation
D.
Logarithmic transformation
Answers
D.
Logarithmic transformation
E.
One hot encoding
Answers
E.
One hot encoding
Suggested answer: B, D

Explanation:

To fix the incorrectly skewed data, the Data Scientist can apply two feature transformations: numerical value binning and logarithmic transformation. Numerical value binning is a technique that groups continuous values into discrete bins or categories. This can help reduce the skewness of the data by creating more balanced frequency distributions. Logarithmic transformation is a technique that applies the natural logarithm function to each value in the data. This can help reduce the right skewness of the data by compressing the large values and expanding the small values. Both of these transformations can make the data more suitable for machine learning algorithms that assume normality of the data.References:

Data Transformation - Amazon SageMaker

Transforming Skewed Data for Machine Learning

asked 16/09/2024
shafinaaz hossenny
39 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first