ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 266 - MLS-C01 discussion

Report
Export

A company is building a new supervised classification model in an AWS environment. The company's data science team notices that the dataset has a large quantity of variables Ail the variables are numeric. The model accuracy for training and validation is low. The model's processing time is affected by high latency The data science team needs to increase the accuracy of the model and decrease the processing.

How it should the data science team do to meet these requirements?

A.
Create new features and interaction variables.
Answers
A.
Create new features and interaction variables.
B.
Use a principal component analysis (PCA) model.
Answers
B.
Use a principal component analysis (PCA) model.
C.
Apply normalization on the feature set.
Answers
C.
Apply normalization on the feature set.
D.
Use a multiple correspondence analysis (MCA) model
Answers
D.
Use a multiple correspondence analysis (MCA) model
Suggested answer: B

Explanation:

The best way to meet the requirements is to use a principal component analysis (PCA) model, which is a technique that reduces the dimensionality of the dataset by transforming the original variables into a smaller set of new variables, called principal components, that capture most of the variance and information in the data1. This technique has the following advantages:

It can increase the accuracy of the model by removing noise, redundancy, and multicollinearity from the data, and by enhancing the interpretability and generalization of the model23.

It can decrease the processing time of the model by reducing the number of features and the computational complexity of the model, and by improving the convergence and stability of the model45.

It is suitable for numeric variables, as it relies on the covariance or correlation matrix of the data, and it can handle a large quantity of variables, as it can extract the most relevant ones16.

The other options are not effective or appropriate, because they have the following drawbacks:

A: Creating new features and interaction variables can increase the accuracy of the model by capturing more complex and nonlinear relationships in the data, but it can also increase the processing time of the model by adding more features and increasing the computational complexity of the model7.Moreover, it can introduce more noise, redundancy, and multicollinearity in the data, which can degrade the performance and interpretability of the model8.

C: Applying normalization on the feature set can increase the accuracy of the model by scaling the features to a common range and avoiding the dominance of some features over others, but it can also decrease the processing time of the model by reducing the numerical instability and improving the convergence of the model . However, normalization alone is not enough to address the high dimensionality and high latency issues of the dataset, as it does not reduce the number of features or the variance in the data.

D: Using a multiple correspondence analysis (MCA) model is not suitable for numeric variables, as it is a technique that reduces the dimensionality of the dataset by transforming the original categorical variables into a smaller set of new variables, called factors, that capture most of the inertia and information in the data. MCA is similar to PCA, but it is designed for nominal or ordinal variables, not for continuous or interval variables.

References:

1:Principal Component Analysis - Amazon SageMaker

2:How to Use PCA for Data Visualization and Improved Performance in Machine Learning | by Pratik Shukla | Towards Data Science

3:Principal Component Analysis (PCA) for Feature Selection and some of its Pitfalls | by Nagesh Singh Chauhan | Towards Data Science

4:How to Reduce Dimensionality with PCA and Train a Support Vector Machine in Python | by James Briggs | Towards Data Science

5:Dimensionality Reduction and Its Applications | by Aniruddha Bhandari | Towards Data Science

6:Principal Component Analysis (PCA) in Python | by Susan Li | Towards Data Science

7:Feature Engineering for Machine Learning | by Dipanjan (DJ) Sarkar | Towards Data Science

8:Feature Engineering --- How to Engineer Features and How to Get Good at It | by Parul Pandey | Towards Data Science

: [Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization | by Benjamin Obi Tayo Ph.D. | Towards Data Science]

: [Why, How and When to Scale your Features | by George Seif | Towards Data Science]

: [Normalization vs Dimensionality Reduction | by Saurabh Annadate | Towards Data Science]

: [Multiple Correspondence Analysis - Amazon SageMaker]

: [Multiple Correspondence Analysis (MCA) | by Raul Eulogio | Towards Data Science]

asked 16/09/2024
Nestor Maitin
28 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first