ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 198 - MLS-C01 discussion

Report
Export

A financial company is trying to detect credit card fraud. The company observed that, on average, 2% of credit card transactions were fraudulent. A data scientist trained a classifier on a year's worth of credit card transactions data. The model needs to identify the fraudulent transactions (positives) from the regular ones (negatives). The company's goal is to accurately capture as many positives as possible.

Which metrics should the data scientist use to optimize the model? (Choose two.)

A.
Specificity
Answers
A.
Specificity
B.
False positive rate
Answers
B.
False positive rate
C.
Accuracy
Answers
C.
Accuracy
D.
Area under the precision-recall curve
Answers
D.
Area under the precision-recall curve
E.
True positive rate
Answers
E.
True positive rate
Suggested answer: D, E

Explanation:

The data scientist should use the area under the precision-recall curve and the true positive rate to optimize the model. These metrics are suitable for imbalanced classification problems, such as credit card fraud detection, where the positive class (fraudulent transactions) is much rarer than the negative class (non-fraudulent transactions).

The area under the precision-recall curve (AUPRC) is a measure of how well the model can identify the positive class among all the predicted positives. Precision is the fraction of predicted positives that are actually positive, and recall is the fraction of actual positives that are correctly predicted. A higher AUPRC means that the model can achieve a higher precision with a higher recall, which is desirable for fraud detection.

The true positive rate (TPR) is another name for recall. It is also known as sensitivity or hit rate. It measures the proportion of actual positives that are correctly identified by the model. A higher TPR means that the model can capture more positives, which is the company's goal.

References:

Metrics for Imbalanced Classification in Python - Machine Learning Mastery

Precision-Recall - scikit-learn

asked 16/09/2024
Oren Dahan
45 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first