ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 102 - Professional Machine Learning Engineer discussion

Report
Export

You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a human. Which metric(s) should you use to monitor the model's performance?

A.
Number of messages flagged by the model per minute
Answers
A.
Number of messages flagged by the model per minute
B.
Number of messages flagged by the model per minute confirmed as being inappropriate by humans.
Answers
B.
Number of messages flagged by the model per minute confirmed as being inappropriate by humans.
C.
Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review
Answers
C.
Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review
D.
Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
Answers
D.
Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
Suggested answer: D

Explanation:

Precisionmeasures the fraction of messages flagged by the model that are actually inappropriate, whilerecallmeasures the fraction of inappropriate messages that are flagged by the model. These metrics are useful for evaluating how well the model can identify and filter out inappropriate comments.

Option A is not a good metric because it does not account for the accuracy of the model. The model might flag many messages that are not inappropriate, or miss many messages that are inappropriate.

Option B is better than option A, but it still does not account for the recall of the model. The model might flag only a few messages that are highly likely to be inappropriate, but miss many other messages that are less obvious but still inappropriate.

Option C is not a good metric because it does not focus on the messages that are flagged by the model. The random sample of 0.1% of raw messages might contain very few inappropriate messages, making the precision and recall estimates unreliable.

asked 18/09/2024
hotthefish shark
35 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first