ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 69 - Professional Data Engineer discussion

Report
Export

Why do you need to split a machine learning dataset into training data and test data?

A.
So you can try two different sets of features
Answers
A.
So you can try two different sets of features
B.
To make sure your model is generalized for more than just the training data
Answers
B.
To make sure your model is generalized for more than just the training data
C.
To allow you to create unit tests in your code
Answers
C.
To allow you to create unit tests in your code
D.
So you can use one dataset for a wide model and one for a deep model
Answers
D.
So you can use one dataset for a wide model and one for a deep model
Suggested answer: B

Explanation:

The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called overfitting.

Reference: https://machinelearningmastery.com/a-simple-intuition-for-overfitting/

asked 18/09/2024
Jean Presume
30 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first