ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 84 - Professional Machine Learning Engineer discussion

Report
Export

You are developing an ML model to predict house prices. While preparing the data, you discover that an important predictor variable, distance from the closest school, is often missing and does not have high variance. Every instance (row) in your data is important. How should you handle the missing data?

A.
Delete the rows that have missing values.
Answers
A.
Delete the rows that have missing values.
B.
Apply feature crossing with another column that does not have missing values.
Answers
B.
Apply feature crossing with another column that does not have missing values.
C.
Predict the missing values using linear regression.
Answers
C.
Predict the missing values using linear regression.
D.
Replace the missing values with zeros.
Answers
D.
Replace the missing values with zeros.
Suggested answer: C

Explanation:

The best option for handling missing data in this case is to predict the missing values using linear regression. Linear regression is a supervised learning technique that can be used to estimate the relationship between a continuous target variable and one or more predictor variables. In this case, the target variable is the distance from the closest school, and the predictor variables are the other features in the dataset, such as house size, location, number of rooms, etc. By fitting a linear regression model on the data that has no missing values, we can then use the model to predict the missing values for the distance from the closest school feature. This way, we can preserve all the instances in the dataset and avoid introducing bias or reducing variance. The other options are not suitable for handling missing data in this case, because:

Deleting the rows that have missing values would reduce the size of the dataset and potentially lose important information. Since every instance is important, we want to keep as much data as possible.

Applying feature crossing with another column that does not have missing values would create a new feature that combines the values of two existing features. This might increase the complexity of the model and introduce noise or multicollinearity. It would not solve the problem of missing values, as the new feature would still have missing values whenever the distance from the closest school feature is missing.

Replacing the missing values with zeros would distort the distribution of the feature and introduce bias. It would also imply that the houses with missing values are located at the same distance from the closest school, which is unlikely to be true. A zero value might also be outside the range of the feature, as the distance from the closest school is unlikely to be exactly zero for any house.Reference:

Linear Regression

Imputation of missing values

Google Cloud launches machine learning engineer certification

Google Professional Machine Learning Engineer Certification

Professional ML Engineer Exam Guide

Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate

asked 18/09/2024
Musaddiq Shorunke
44 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first