ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 120 - Professional Machine Learning Engineer discussion

Report
Export

You work for a gaming company that has millions of customers around the world. All games offer a chat feature that allows players to communicate with each other in real time. Messages can be typed in more than 20 languages and are translated in real time using the Cloud Translation API. You have been asked to build an ML system to moderate the chat in real time while assuring that the performance is uniform across the various languages and without changing the serving infrastructure.

You trained your first model using an in-house word2vec model for embedding the chat messages translated by the Cloud Translation API. However, the model has significant differences in performance across the different languages. How should you improve it?

A.
Add a regularization term such as the Min-Diff algorithm to the loss function.
Answers
A.
Add a regularization term such as the Min-Diff algorithm to the loss function.
B.
Train a classifier using the chat messages in their original language.
Answers
B.
Train a classifier using the chat messages in their original language.
C.
Replace the in-house word2vec with GPT-3 or T5.
Answers
C.
Replace the in-house word2vec with GPT-3 or T5.
D.
Remove moderation for languages for which the false positive rate is too high.
Answers
D.
Remove moderation for languages for which the false positive rate is too high.
Suggested answer: B

Explanation:

The problem with the current approach is that it relies on the Cloud Translation API to translate the chat messages into a common language before embedding them with the in-house word2vec model. This introduces two sources of error: the translation quality and the word2vec quality. The translation quality may vary across different languages, depending on the availability of data and the complexity of the grammar and vocabulary. The word2vec quality may also vary depending on the size and diversity of the corpus used to train it. These errors may affect the performance of the classifier that moderates the chat messages, resulting in significant differences across the languages.

A better approach would be to train a classifier using the chat messages in their original language, without relying on the Cloud Translation API or the in-house word2vec model. This way, the classifier can learn the nuances and subtleties of each language, and avoid the errors introduced by the translation and embedding processes. This would also reduce the latency and cost of the moderation system, as it would not need to invoke the Cloud Translation API for every message. To train a classifier using the chat messages in their original language, one could use a multilingual pre-trained model such as mBERT or XLM-R, which can handle multiple languages and domains. Alternatively, one could train a separate classifier for each language, using a monolingual pre-trained model such as BERT or a custom model tailored to the specific language and task.

Professional ML Engineer Exam Guide

Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate

Google Cloud launches machine learning engineer certification

[mBERT: Bidirectional Encoder Representations from Transformers]

[XLM-R: Unsupervised Cross-lingual Representation Learning at Scale]

[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]

asked 18/09/2024
sangilipandy Arumugam
24 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first