ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 226 - MLS-C01 discussion

Report
Export

An online store is predicting future book sales by using a linear regression model that is based on past sales data. The data includes duration, a numerical feature that represents the number of days that a book has been listed in the online store. A data scientist performs an exploratory data analysis and discovers that the relationship between book sales and duration is skewed and non-linear.

Which data transformation step should the data scientist take to improve the predictions of the model?

A.
One-hot encoding
Answers
A.
One-hot encoding
B.
Cartesian product transformation
Answers
B.
Cartesian product transformation
C.
Quantile binning
Answers
C.
Quantile binning
D.
Normalization
Answers
D.
Normalization
Suggested answer: C

Explanation:

Quantile binning is a data transformation technique that can be used to handle skewed and non-linear numerical features. It divides the range of a feature into equal-sized bins based on the percentiles of the data. Each bin is assigned a numerical value that represents the midpoint of the bin. This way, the feature values are transformed into a more uniform distribution that can improve the performance of linear models. Quantile binning can also reduce the impact of outliers and noise in the data.

One-hot encoding, Cartesian product transformation, and normalization are not suitable for this scenario. One-hot encoding is used to transform categorical features into binary features. Cartesian product transformation is used to create new features by combining existing features. Normalization is used to scale numerical features to a standard range, but it does not change the shape of the distribution.References:

Data Transformations for Machine Learning

Quantile Binning Transformation

asked 16/09/2024
Firew Abebe
29 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first