ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 29 - Professional Machine Learning Engineer discussion

Report
Export

You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model's accuracy dropped to 66%. How can you make your production model more accurate?

A.
Normalize the data for the training, and test datasets as two separate steps.
Answers
A.
Normalize the data for the training, and test datasets as two separate steps.
B.
Split the training and test data based on time rather than a random split to avoid leakage
Answers
B.
Split the training and test data based on time rather than a random split to avoid leakage
C.
Add more data to your test set to ensure that you have a fair distribution and sample for testing
Answers
C.
Add more data to your test set to ensure that you have a fair distribution and sample for testing
D.
Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.
Answers
D.
Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.
Suggested answer: B

Explanation:

When building a model to predict daily temperatures, it is important to split the training and test data based on time rather than a random split. This is because temperature data is likely to have temporal dependencies and patterns, such as seasonality, trends, and cycles. If the data is split randomly, there is a risk of data leakage, which occurs when information from the future is used to train or validate the model. Data leakage can lead to overfitting and unrealistic performance estimates, as the model may learn from data that it should not have access to. By splitting the data based on time, such as using the most recent data as the test set and the older data as the training set, the model can be evaluated on how well it can forecast future temperatures based on past data, which is the realistic scenario in production. Therefore, splitting the data based on time rather than a random split is the best way to make the production model more accurate.

asked 18/09/2024
massamba gaye
23 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first