ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 155 - MLS-C01 discussion

Report
Export

A Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team has not provided any insight about which features are relevant for churn prediction. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. While training a logistic regression model, the Data Scientist observes that there is a wide gap between the training and validation set accuracy.

Which methods can the Data Scientist use to improve the model performance and satisfy the Marketing team's needs? (Choose two.)

A.
Add L1 regularization to the classifier
Answers
A.
Add L1 regularization to the classifier
B.
Add features to the dataset
Answers
B.
Add features to the dataset
C.
Perform recursive feature elimination
Answers
C.
Perform recursive feature elimination
D.
Perform t-distributed stochastic neighbor embedding (t-SNE)
Answers
D.
Perform t-distributed stochastic neighbor embedding (t-SNE)
E.
Perform linear discriminant analysis
Answers
E.
Perform linear discriminant analysis
Suggested answer: A, C

Explanation:

The Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. However, the Data Scientist observes that there is a wide gap between the training and validation set accuracy, which indicates that the model is overfitting the data and generalizing poorly to new data.

To improve the model performance and satisfy the Marketing team's needs, the Data Scientist can use the following methods:

Add L1 regularization to the classifier: L1 regularization is a technique that adds a penalty term to the loss function of the logistic regression model, proportional to the sum of the absolute values of the coefficients. L1 regularization can help reduce overfitting by shrinking the coefficients of the less important features to zero, effectively performing feature selection. This can simplify the model and make it more interpretable, as well as improve the validation accuracy.

Perform recursive feature elimination: Recursive feature elimination (RFE) is a feature selection technique that involves training a model on a subset of the features, and then iteratively removing the least important features one by one until the desired number of features is reached. The idea behind RFE is to determine the contribution of each feature to the model by measuring how well the model performs when that feature is removed. The features that are most important to the model will have the greatest impact on performance when they are removed. RFE can help improve the model performance by eliminating the irrelevant or redundant features that may cause noise or multicollinearity in the data. RFE can also help the Marketing team understand the direct impact of the relevant features on the model outcome, as the remaining features will have the highest weights in the model.

References:

Regularization for Logistic Regression

Recursive Feature Elimination

asked 16/09/2024
Alajauan Adams
35 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first