ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 161 - MLS-C01 discussion

Report
Export

A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant will default on a credit card payment. The company has collected data from a large number of sources with thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are highly correlated, the large number of features slows down the training speed significantly, and that there are some overfitting issues.

The Data Scientist on this project would like to speed up the model training time without losing a lot of information from the original dataset.

Which feature engineering technique should the Data Scientist use to meet the objectives?

A.
Run self-correlation on all features and remove highly correlated features
Answers
A.
Run self-correlation on all features and remove highly correlated features
B.
Normalize all numerical values to be between 0 and 1
Answers
B.
Normalize all numerical values to be between 0 and 1
C.
Use an autoencoder or principal component analysis (PCA) to replace original features with new features
Answers
C.
Use an autoencoder or principal component analysis (PCA) to replace original features with new features
D.
Cluster raw data using k-means and use sample data from each cluster to build a new dataset
Answers
D.
Cluster raw data using k-means and use sample data from each cluster to build a new dataset
Suggested answer: C

Explanation:

The best feature engineering technique to speed up the model training time without losing a lot of information from the original dataset is to use an autoencoder or principal component analysis (PCA) to replace original features with new features. An autoencoder is a type of neural network that learns a compressed representation of the input data, called the latent space, by minimizing the reconstruction error between the input and the output. PCA is a statistical technique that reduces the dimensionality of the data by finding a set of orthogonal axes, called the principal components, that capture the maximum variance of the data. Both techniques can help reduce the number of features and remove the noise and redundancy in the data, which can improve the model performance and speed up the training process.References:

AWS Machine Learning Specialty Exam Guide

AWS Machine Learning Training - Dimensionality Reduction for Machine Learning

AWS Machine Learning Training - Deep Learning with Amazon SageMaker

asked 16/09/2024
jonathan jaramillo
31 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first