ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 9 - MLS-C01 discussion

Report
Export

A data scientist must build a custom recommendation model in Amazon SageMaker for an online retail company. Due to the nature of the company's products, customers buy only 4-5 products every 5-10 years. So, the company relies on a steady stream of new customers. When a new customer signs up, the company collects data on the customer's preferences. Below is a sample of the data available to the data scientist.

How should the data scientist split the dataset into a training and test set for this use case?

A.
Shuffle all interaction data. Split off the last 10% of the interaction data for the test set.
Answers
A.
Shuffle all interaction data. Split off the last 10% of the interaction data for the test set.
B.
Identify the most recent 10% of interactions for each user. Split off these interactions for the test set.
Answers
B.
Identify the most recent 10% of interactions for each user. Split off these interactions for the test set.
C.
Identify the 10% of users with the least interaction data. Split off all interaction data from these users for the test set.
Answers
C.
Identify the 10% of users with the least interaction data. Split off all interaction data from these users for the test set.
D.
Randomly select 10% of the users. Split off all interaction data from these users for the test set.
Answers
D.
Randomly select 10% of the users. Split off all interaction data from these users for the test set.
Suggested answer: D

Explanation:

The best way to split the dataset into a training and test set for this use case is to randomly select 10% of the users and split off all interaction data from these users for the test set. This is because the company relies on a steady stream of new customers, so the test set should reflect the behavior of new customers who have not been seen by the model before. The other options are not suitable because they either mix old and new customers in the test set (A and B), or they bias the test set towards users with less interaction data .References:

Amazon SageMaker Developer Guide: Train and Test Datasets

Amazon Personalize Developer Guide: Preparing and Importing Data

asked 16/09/2024
adir tamam
32 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first