ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 66 - Professional Machine Learning Engineer discussion

Report
Export

You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables. You need to prioritize detection of fraudulent transactions while minimizing false positives. Which optimization objective should you use when training the model?

A.
An optimization objective that minimizes Log loss
Answers
A.
An optimization objective that minimizes Log loss
B.
An optimization objective that maximizes the Precision at a Recall value of 0.50
Answers
B.
An optimization objective that maximizes the Precision at a Recall value of 0.50
C.
An optimization objective that maximizes the area under the precision-recall curve (AUC PR) value
Answers
C.
An optimization objective that maximizes the area under the precision-recall curve (AUC PR) value
D.
An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value
Answers
D.
An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value
Suggested answer: C

Explanation:

In this scenario, the goal is to create a custom fraud detection model using AutoML Tables. Fraud detection is a type of binary classification problem, where the model needs to predict whether a transaction is fraudulent or not. The optimization objective is a metric that defines how the model is trained and evaluated. AutoML Tables allows you to choose from different optimization objectives for binary classification problems, such as Log loss, Precision at a Recall value, AUC PR, and AUC ROC.

To choose the best optimization objective for fraud detection, we need to consider the characteristics of the problem and the data. Fraud detection is a problem where the positive class (fraudulent transactions) is very rare compared to the negative class (legitimate transactions). This means that the data is highly imbalanced, and the model needs to be sensitive to the minority class. Moreover, fraud detection is a problem where the cost of false negatives (missing a fraudulent transaction) is much higher than the cost of false positives (flagging a legitimate transaction as fraudulent). This means that the model needs to have high recall (the ability to detect all fraudulent transactions) while maintaining high precision (the ability to avoid false alarms).

Given these considerations, the best optimization objective for fraud detection is the one that maximizes the area under the precision-recall curve (AUC PR) value. The AUC PR value is a metric that measures the trade-off between precision and recall for different probability thresholds. A higher AUC PR value means that the model can achieve high precision and high recall at the same time. The AUC PR value is also more suitable for imbalanced data than the AUC ROC value, which measures the trade-off between the true positive rate and the false positive rate. The AUC ROC value can be misleading for imbalanced data, as it can give a high score even if the model has low recall or low precision.

Therefore, option C is the correct answer. Option A is not suitable, as Log loss is a metric that measures the difference between the predicted probabilities and the actual labels, and does not account for the trade-off between precision and recall. Option B is not suitable, as Precision at a Recall value is a metric that measures the precision at a fixed recall level, and does not account for the trade-off between precision and recall at different thresholds. Option D is not suitable, as AUC ROC is a metric that can be misleading for imbalanced data, as explained above.

AutoML Tables documentation

Optimization objectives for binary classification

Precision-Recall Curves: How to Easily Evaluate Machine Learning Models in No Time

ROC Curves and Area Under the Curve Explained (video)

asked 18/09/2024
inigo abeledo
39 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first