ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 124 - Professional Machine Learning Engineer discussion

Report
Export

While performing exploratory data analysis on a dataset, you find that an important categorical feature has 5% null values. You want to minimize the bias that could result from the missing values. How should you handle the missing values?

A.
Remove the rows with missing values, and upsample your dataset by 5%.
Answers
A.
Remove the rows with missing values, and upsample your dataset by 5%.
B.
Replace the missing values with the feature's mean.
Answers
B.
Replace the missing values with the feature's mean.
C.
Replace the missing values with a placeholder category indicating a missing value.
Answers
C.
Replace the missing values with a placeholder category indicating a missing value.
D.
Move the rows with missing values to your validation dataset.
Answers
D.
Move the rows with missing values to your validation dataset.
Suggested answer: C

Explanation:

The best option for handling missing values in a categorical feature is to replace them with a placeholder category indicating a missing value. This is a type of imputation, which is a method of estimating the missing values based on the observed data. Imputing the missing values with a placeholder category preserves the information that the data is missing, and avoids introducing bias or distortion in the feature distribution. It also allows the machine learning model to learn from the missingness pattern, and potentially use it as a predictor for the target variable. The other options are not suitable for handling missing values in a categorical feature, because:

Removing the rows with missing values and upsampling the dataset by 5% would reduce the size of the dataset and potentially lose important information. It would also introduce sampling bias and overfitting, as the upsampling process would create duplicate or synthetic observations that do not reflect the true population.

Replacing the missing values with the feature's mean would not make sense for a categorical feature, as the mean is a numerical measure that does not capture the mode or frequency of the categories. It would also create a new category that does not exist in the original data, and might confuse the machine learning model.

Moving the rows with missing values to the validation dataset would compromise the validity and reliability of the model evaluation, as the validation dataset would not be representative of the test or production data. It would also reduce the amount of data available for training the model, and might introduce leakage or inconsistency between the training and validation datasets.Reference:

Imputation of missing values

Effective Strategies to Handle Missing Values in Data Analysis

How to Handle Missing Values of Categorical Variables?

Google Cloud launches machine learning engineer certification

Google Professional Machine Learning Engineer Certification

Professional ML Engineer Exam Guide

Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate

asked 18/09/2024
DHANANJAY TIWARI
34 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first