ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 101 - Professional Machine Learning Engineer discussion

Report
Export

Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading. Which approach should you use?

A.
Create a collaborative filtering system that recommends articles to a user based on the user's past behavior.
Answers
A.
Create a collaborative filtering system that recommends articles to a user based on the user's past behavior.
B.
Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.
Answers
B.
Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.
C.
Build a logistic regression model for each user that predicts whether an article should be recommended to a user.
Answers
C.
Build a logistic regression model for each user that predicts whether an article should be recommended to a user.
D.
Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories.
Answers
D.
Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories.
Suggested answer: B

Explanation:

Option A is incorrect because creating a collaborative filtering system that recommends articles to a user based on the user's past behavior is not the best approach to suggest articles that are similar to the articles they are currently reading.Collaborative filtering is a method of recommendation that uses the ratings or preferences of other users to predict the preferences of a target user1. However, this method does not consider the content or features of the articles, and may not be able to find articles that are similar in terms of topic, style, or sentiment.

Option B is correct because encoding all articles into vectors using word2vec, and building a model that returns articles based on vector similarity is a suitable approach to suggest articles that are similar to the articles they are currently reading.Word2vec is a technique that learns low-dimensional and dense representations of words from a large corpus of text, such that words that are semantically similar have similar vectors2. By applying word2vec to the articles, we can obtain vector representations of the articles that capture their meaning and usage.Then, we can use a similarity measure, such as cosine similarity, to find articles that have similar vectors to the current article3.

Option C is incorrect because building a logistic regression model for each user that predicts whether an article should be recommended to a user is not a feasible approach to suggest articles that are similar to the articles they are currently reading.Logistic regression is a supervised learning method that models the probability of a binary outcome (such as recommend or not) based on some input features (such as user profile or article content)4. However, this method requires a large amount of labeled data for each user, which may not be available or scalable. Moreover, this method does not directly measure the similarity between articles, but rather the likelihood of a user's preference.

Option D is incorrect because manually labeling a few hundred articles, and then training an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories is not an effective approach to suggest articles that are similar to the articles they are currently reading.SVM (support vector machine) is a supervised learning method that finds a hyperplane that separates the data into different classes (such as news categories) with the maximum margin5. However, this method also requires a large amount of labeled data, which may be costly and time-consuming to obtain. Moreover, this method does not account for the fine-grained similarity between articles within the same category, or the cross-category similarity between articles from different categories.

Collaborative filtering

Word2vec

Cosine similarity

Logistic regression

SVM

asked 18/09/2024
Antonio Pombo
36 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first