ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 196 - MLS-C01 discussion

Report
Export

A Data Scientist is developing a machine learning model to classify whether a financial transaction is fraudulent. The labeled data available for training consists of 100,000 non-fraudulent observations and 1,000 fraudulent observations.

The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix when the trained model is applied to a previously unseen validation dataset. The accuracy of the model is 99.1%, but the Data Scientist needs to reduce the number of false negatives.

Which combination of steps should the Data Scientist take to reduce the number of false negative predictions by the model? (Choose two.)

A.
Change the XGBoost eval_metric parameter to optimize based on Root Mean Square Error (RMSE).
Answers
A.
Change the XGBoost eval_metric parameter to optimize based on Root Mean Square Error (RMSE).
B.
Increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights.
Answers
B.
Increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights.
C.
Increase the XGBoost max_depth parameter because the model is currently underfitting the data.
Answers
C.
Increase the XGBoost max_depth parameter because the model is currently underfitting the data.
D.
Change the XGBoost eval_metric parameter to optimize based on Area Under the ROC Curve (AUC).
Answers
D.
Change the XGBoost eval_metric parameter to optimize based on Area Under the ROC Curve (AUC).
E.
Decrease the XGBoost max_depth parameter because the model is currently overfitting the data.
Answers
E.
Decrease the XGBoost max_depth parameter because the model is currently overfitting the data.
Suggested answer: B, D

Explanation:

The Data Scientist should increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights and change the XGBoost eval_metric parameter to optimize based on Area Under the ROC Curve (AUC). This will help reduce the number of false negative predictions by the model.

The scale_pos_weight parameter controls the balance of positive and negative weights in the XGBoost algorithm. It is useful for imbalanced classification problems, such as fraud detection, where the number of positive examples (fraudulent transactions) is much smaller than the number of negative examples (non-fraudulent transactions). By increasing the scale_pos_weight parameter, the Data Scientist can assign more weight to the positive class and make the model more sensitive to detecting fraudulent transactions.

The eval_metric parameter specifies the metric that is used to measure the performance of the model during training and validation. The default metric for binary classification problems is the error rate, which is the fraction of incorrect predictions. However, the error rate is not a good metric for imbalanced classification problems, because it does not take into account the cost of different types of errors. For example, in fraud detection, a false negative (failing to detect a fraudulent transaction) is more costly than a false positive (flagging a non-fraudulent transaction as fraudulent). Therefore, the Data Scientist should use a metric that reflects the trade-off between the true positive rate (TPR) and the false positive rate (FPR), such as the Area Under the ROC Curve (AUC). The AUC is a measure of how well the model can distinguish between the positive and negative classes, regardless of the classification threshold. A higher AUC means that the model can achieve a higher TPR with a lower FPR, which is desirable for fraud detection.

References:

XGBoost Parameters - Amazon Machine Learning

Using XGBoost with Amazon SageMaker - AWS Machine Learning Blog

asked 16/09/2024
Sean Frenette
39 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first