ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 251 - MLS-C01 discussion

Report
Export

A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.

What should the engineer do to improve the validation accuracy of the model?

A.
Perform stratified sampling on the original dataset.
Answers
A.
Perform stratified sampling on the original dataset.
B.
Acquire additional data about the majority classes in the original dataset.
Answers
B.
Acquire additional data about the majority classes in the original dataset.
C.
Use a smaller, randomly sampled version of the training dataset.
Answers
C.
Use a smaller, randomly sampled version of the training dataset.
D.
Perform systematic sampling on the original dataset.
Answers
D.
Perform systematic sampling on the original dataset.
Suggested answer: A

Explanation:

Stratified sampling is a technique that preserves the class distribution of the original dataset when creating a smaller or split dataset. This means that the proportion of examples from each class in the original dataset is maintained in the smaller or split dataset. Stratified sampling can help improve the validation accuracy of the model by ensuring that the validation dataset is representative of the original dataset and not biased towards any class. This can reduce the variance and overfitting of the model and increase its generalization ability. Stratified sampling can be applied to both oversampling and undersampling methods, depending on whether the goal is to increase or decrease the size of the dataset.

The other options are not effective ways to improve the validation accuracy of the model. Acquiring additional data about the majority classes in the original dataset will only increase the imbalance and make the model more biased towards the majority classes. Using a smaller, randomly sampled version of the training dataset will not guarantee that the class distribution is preserved and may result in losing important information from the minority classes. Performing systematic sampling on the original dataset will also not ensure that the class distribution is preserved and may introduce sampling bias if the original dataset is ordered or grouped by class.

References:

* Stratified Sampling for Imbalanced Datasets

* Imbalanced Data

* Tour of Data Sampling Methods for Imbalanced Classification

asked 16/09/2024
Georgios Kavvalakis
31 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first