ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 254 - MLS-C01 discussion

Report
Export

A global bank requires a solution to predict whether customers will leave the bank and choose another bank. The bank is using a dataset to train a model to predict customer loss. The training dataset has 1,000 rows. The training dataset includes 100 instances of customers who left the bank.

A machine learning (ML) specialist is using Amazon SageMaker Data Wrangler to train a churn prediction model by using a SageMaker training job. After training, the ML specialist notices that the model returns only false results. The ML specialist must correct the model so that it returns more accurate predictions.

Which solution will meet these requirements?

A.
Apply anomaly detection to remove outliers from the training dataset before training.
Answers
A.
Apply anomaly detection to remove outliers from the training dataset before training.
B.
Apply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training.
Answers
B.
Apply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training.
C.
Apply normalization to the features of the training dataset before training.
Answers
C.
Apply normalization to the features of the training dataset before training.
D.
Apply undersampling to the training dataset before training.
Answers
D.
Apply undersampling to the training dataset before training.
Suggested answer: B

Explanation:

The best solution to meet the requirements is to apply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training. SMOTE is a technique that generates synthetic samples for the minority class by interpolating between existing samples. This can help balance the class distribution and provide more information to the model. SMOTE can improve the performance of the model on the minority class, which is the class of interest in churn prediction. SMOTE can be applied using the SageMaker Data Wrangler, which provides a built-in analysis for oversampling the minority class1.

The other options are not effective solutions for the problem. Applying anomaly detection to remove outliers from the training dataset before training may not improve the model's accuracy, as outliers may not be the main cause of the false results. Moreover, removing outliers may reduce the diversity of the data and make the model less robust. Applying normalization to the features of the training dataset before training may improve the model's convergence and stability, but it does not address the class imbalance issue. Normalization can also be applied using the SageMaker Data Wrangler, which provides a built-in transformation for scaling the features2. Applying undersampling to the training dataset before training may reduce the class imbalance, but it also discards potentially useful information from the majority class. Undersampling can also result in underfitting and high bias for the model.

References:

* Analyze and Visualize

* Transform and Export

* SMOTE for Imbalanced Classification with Python

* Churn prediction using Amazon SageMaker built-in tabular algorithms LightGBM, CatBoost, TabTransformer, and AutoGluon-Tabular

asked 16/09/2024
ILLIA VELIASEVICH
46 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first