ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 167 - MLS-C01 discussion

Report
Export

A Machine Learning Specialist is given a structured dataset on the shopping habits of a company's customer base. The dataset contains thousands of columns of data and hundreds of numerical columns for each customer. The Specialist wants to identify whether there are natural groupings for these columns across all customers and visualize the results as quickly as possible.

What approach should the Specialist take to accomplish these tasks?

A.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot.
Answers
A.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot.
B.
Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.
Answers
B.
Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.
C.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph.
Answers
C.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph.
D.
Run k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster.
Answers
D.
Run k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster.
Suggested answer: A

Explanation:

The best approach to identify and visualize the natural groupings for the numerical columns across all customers is to embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot. t-SNE is a dimensionality reduction technique that can project high-dimensional data into a lower-dimensional space, while preserving the local structure and distances of the data points. A scatter plot can then show the clusters of data points in the reduced space, where each point represents a customer and the color indicates the cluster membership. This approach can help the Specialist quickly explore the patterns and similarities among the customers based on their numerical features.

The other options are not as effective or efficient as the t-SNE approach. Running k-means for different values of k and creating an elbow plot can help determine the optimal number of clusters, but it does not provide a visual representation of the clusters or the customers. Embedding the numerical features using t-SNE and creating a line graph does not make sense, as a line graph is used to show the change of a variable over time, not the distribution of data points in a space. Running k-means for different values of k and creating box plots for each numerical column within each cluster can provide some insights into the statistics of each cluster, but it is very time-consuming and cumbersome to create and compare thousands of box plots.References:

Dimensionality Reduction - Amazon SageMaker

Visualize high dimensional data using t-SNE - Amazon SageMaker

asked 16/09/2024
Ilya Shadrin
37 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first