ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 117 - MLS-C01 discussion

Report
Export

Example Corp has an annual sale event from October to December. The company has sequential sales data from the past 15 years and wants to use Amazon ML to predict the sales for this year's upcoming event. Which method should Example Corp use to split the data into a training dataset and evaluation dataset?

A.
Pre-split the data before uploading to Amazon S3
Answers
A.
Pre-split the data before uploading to Amazon S3
B.
Have Amazon ML split the data randomly.
Answers
B.
Have Amazon ML split the data randomly.
C.
Have Amazon ML split the data sequentially.
Answers
C.
Have Amazon ML split the data sequentially.
D.
Perform custom cross-validation on the data
Answers
D.
Perform custom cross-validation on the data
Suggested answer: C

Explanation:

A sequential split is a method of splitting data into training and evaluation datasets while preserving the order of the data records. This method is useful when the data has a temporal or sequential structure, and the order of the data matters for the prediction task. For example, if the data contains sales data for different months or years, and the goal is to predict the sales for the next month or year, a sequential split can ensure that the training data comes from the earlier period and the evaluation data comes from the later period. This can help avoid data leakage, which occurs when the training data contains information from the future that is not available at the time of prediction. A sequential split can also help evaluate the model performance on the most recent data, which may be more relevant and representative of the future data.

In this question, Example Corp has sequential sales data from the past 15 years and wants to use Amazon ML to predict the sales for this year's upcoming annual sale event. A sequential split is the most appropriate method for splitting the data, as it can preserve the order of the data and prevent data leakage. For example, Example Corp can use the data from the first 14 years as the training dataset, and the data from the last year as the evaluation dataset. This way, the model can learn from the historical data and be tested on the most recent data.

Amazon ML provides an option to split the data sequentially when creating the training and evaluation datasources. To use this option, Example Corp can specify the percentage of the data to use for training and evaluation, and Amazon ML will use the first part of the data for training and the remaining part of the data for evaluation. For more information, seeSplitting Your Data - Amazon Machine Learning.

asked 16/09/2024
Selim OZIS
32 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first