ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 124 - MLS-C01 discussion

Report
Export

An online reseller has a large, multi-column dataset with one column missing 30% of its data A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data.

Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?

A.
Listwise deletion
Answers
A.
Listwise deletion
B.
Last observation carried forward
Answers
B.
Last observation carried forward
C.
Multiple imputation
Answers
C.
Multiple imputation
D.
Mean substitution
Answers
D.
Mean substitution
Suggested answer: C

Explanation:

Multiple imputation is a technique that uses machine learning to generate multiple plausible values for each missing value in a dataset, based on the observed data and the relationships among the variables. Multiple imputation preserves the integrity of the dataset by accounting for the uncertainty and variability of the missing data, and avoids the bias and loss of information that may result from other methods, such as listwise deletion, last observation carried forward, or mean substitution. Multiple imputation can improve the accuracy and validity of statistical analysis and machine learning models that use the imputed dataset.References:

Managing missing values in your target and related datasets with automated imputation support in Amazon Forecast

Imputation by feature importance (IBFI): A methodology to impute missing data in large datasets

Multiple Imputation by Chained Equations (MICE) Explained

asked 16/09/2024
Mark Anthony Simon
38 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first