ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 142 - Professional Machine Learning Engineer discussion

Report
Export

Your organization manages an online message board A few months ago, you discovered an increase in toxic language and bullying on the message board. You deployed an automated text classifier that flags certain comments as toxic or harmful. Now some users are reporting that benign comments referencing their religion are being misclassified as abusive Upon further inspection, you find that your classifier's false positive rate is higher for comments that reference certain underrepresented religious groups. Your team has a limited budget and is already overextended. What should you do?

A.
Add synthetic training data where those phrases are used in non-toxic ways
Answers
A.
Add synthetic training data where those phrases are used in non-toxic ways
B.
Remove the model and replace it with human moderation.
Answers
B.
Remove the model and replace it with human moderation.
C.
Replace your model with a different text classifier.
Answers
C.
Replace your model with a different text classifier.
D.
Raise the threshold for comments to be considered toxic or harmful
Answers
D.
Raise the threshold for comments to be considered toxic or harmful
Suggested answer: A

Explanation:

The problem of the text classifier is that it has a high false positive rate for comments that reference certain underrepresented religious groups. This means that the classifier is not able to distinguish between toxic and non-toxic language when those groups are mentioned. One possible reason for this is that the training data does not have enough examples of non-toxic comments that reference those groups, leading to a biased model. Therefore, a possible solution is to add synthetic training data where those phrases are used in non-toxic ways, which can help the model learn to generalize better and reduce the false positive rate. Synthetic data is artificially generated data that mimics the characteristics of real data, and can be used to augment the existing data when the real data is scarce or imbalanced.Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 5: Responsible AI, Week 3: Fairness

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 4: Ensuring solution quality, 4.4 Evaluating fairness and bias in ML models

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 9: Responsible AI, Section 9.3: Fairness and Bias

asked 18/09/2024
Brant McGurk
35 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first