ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 31 - Professional Machine Learning Engineer discussion

Report
Export

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?

A.
Use the class distribution to generate 10% positive examples
Answers
A.
Use the class distribution to generate 10% positive examples
B.
Use a convolutional neural network with max pooling and softmax activation
Answers
B.
Use a convolutional neural network with max pooling and softmax activation
C.
Downsample the data with upweighting to create a sample with 10% positive examples
Answers
C.
Downsample the data with upweighting to create a sample with 10% positive examples
D.
Remove negative examples until the numbers of positive and negative examples are equal
Answers
D.
Remove negative examples until the numbers of positive and negative examples are equal
Suggested answer: C

Explanation:

The class imbalance problem is a common challenge in machine learning, especially in classification tasks. It occurs when the distribution of the target classes is highly skewed, such that one class (the majority class) has much more examples than the other class (the minority class). The minority class is often the more interesting or important class, such as failure incidents, fraud cases, or rare diseases. However, most machine learning algorithms are designed to optimize the overall accuracy, which can be biased towards the majority class and ignore the minority class. This can result in poor predictive performance, especially for the minority class.

There are different techniques to deal with the class imbalance problem, such as data-level methods, algorithm-level methods, and evaluation-level methods1. Data-level methods involve resampling the original dataset to create a more balanced class distribution. There are two main types of data-level methods: oversampling and undersampling. Oversampling methods increase the number of examples in the minority class, either by duplicating existing examples or by generating synthetic examples. Undersampling methods reduce the number of examples in the majority class, either by randomly removing examples or by using clustering or other criteria to select representative examples. Both oversampling and undersampling methods can be combined with upweighting or downweighting, which assign different weights to the examples according to their class frequency, to further balance the dataset.

For the use case of investigating failures of a production line component based on sensor readings, the best option is to downsample the data with upweighting to create a sample with 10% positive examples. This option involves randomly removing some of the negative examples (the majority class) until the ratio of positive to negative examples is 1:9, and then assigning higher weights to the positive examples to compensate for their low frequency. This option can create a more balanced dataset that can improve the performance of the classification models, while preserving the diversity and representativeness of the original data. This option can also reduce the computation time and memory usage, as the size of the dataset is reduced. Therefore, downsampling the data with upweighting to create a sample with 10% positive examples is the best option for this use case.

A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks

asked 18/09/2024
Sébastien PIERRE
48 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first