ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 174 - MLS-C01 discussion

Report
Export

A Data Scientist is developing a binary classifier to predict whether a patient has a particular disease on a series of test results. The Data Scientist has data on 400 patients randomly selected from the population. The disease is seen in 3% of the population.

Which cross-validation strategy should the Data Scientist adopt?

A.
A k-fold cross-validation strategy with k=5
Answers
A.
A k-fold cross-validation strategy with k=5
B.
A stratified k-fold cross-validation strategy with k=5
Answers
B.
A stratified k-fold cross-validation strategy with k=5
C.
A k-fold cross-validation strategy with k=5 and 3 repeats
Answers
C.
A k-fold cross-validation strategy with k=5 and 3 repeats
D.
An 80/20 stratified split between training and validation
Answers
D.
An 80/20 stratified split between training and validation
Suggested answer: B

Explanation:

A stratified k-fold cross-validation strategy is a technique that preserves the class distribution in each fold. This is important for imbalanced datasets, such as the one in the question, where the disease is seen in only 3% of the population. If a random k-fold cross-validation strategy is used, some folds may have no positive cases or very few, which would lead to poor estimates of the model performance. A stratified k-fold cross-validation strategy ensures that each fold has the same proportion of positive and negative cases as the whole dataset, which makes the evaluation more reliable and robust. A k-fold cross-validation strategy with k=5 and 3 repeats is also a possible option, but it is more computationally expensive and may not be necessary if the stratification is done properly. An 80/20 stratified split between training and validation is another option, but it uses less data for training and validation than k-fold cross-validation, which may result in higher variance and lower accuracy of the estimates.References:

AWS Machine Learning Specialty Certification Exam Guide

AWS Machine Learning Training: Model Evaluation

How to Fix k-Fold Cross-Validation for Imbalanced Classification

asked 16/09/2024
Rio Ordonez
40 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first