ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 203 - MLS-C01 discussion

Report
Export

A machine learning specialist is developing a regression model to predict rental rates from rental listings. A variable named Wall_Color represents the most prominent exterior wall color of the property. The following is the sample data, excluding all other variables:

The specialist chose a model that needs numerical input data.

Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)

A.
Apply integer transformation and set Red = 1, White = 5, and Green = 10.
Answers
A.
Apply integer transformation and set Red = 1, White = 5, and Green = 10.
B.
Add new columns that store one-hot representation of colors.
Answers
B.
Add new columns that store one-hot representation of colors.
C.
Replace the color name string by its length.
Answers
C.
Replace the color name string by its length.
D.
Create three columns to encode the color in RGB format.
Answers
D.
Create three columns to encode the color in RGB format.
E.
Replace each color name by its training set frequency.
Answers
E.
Replace each color name by its training set frequency.
Suggested answer: B, D

Explanation:

In this scenario, the specialist should use one-hot encoding and RGB encoding to allow the regression model to learn from the Wall_Color data. One-hot encoding is a technique used to convert categorical data into numerical data. It creates new columns that store one-hot representation of colors. For example, a variable named color has three categories: red, green, and blue. After one-hot encoding, the new variables should be like this:

One-hot encoding can capture the presence or absence of a color, but it cannot capture the intensity or hue of a color. RGB encoding is a technique used to represent colors in a digital image. It creates three columns to encode the color in RGB format. For example, a variable named color has three categories: red, green, and blue. After RGB encoding, the new variables should be like this:

RGB encoding can capture the intensity and hue of a color, but it may also introduce correlation among the three columns. Therefore, using both one-hot encoding and RGB encoding can provide more information to the regression model than using either one alone.

References:

Feature Engineering for Categorical Data

How to Perform Feature Selection with Categorical Data

asked 16/09/2024
IOSSIF ZINGUER
46 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first