ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 91 - MLS-C01 discussion

Report
Export

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset

Which tool should be used to improve the validation accuracy?

A.
Amazon Comprehend syntax analysts and entity detection
Answers
A.
Amazon Comprehend syntax analysts and entity detection
B.
Amazon SageMaker BlazingText allow mode
Answers
B.
Amazon SageMaker BlazingText allow mode
C.
Natural Language Toolkit (NLTK) stemming and stop word removal
Answers
C.
Natural Language Toolkit (NLTK) stemming and stop word removal
D.
Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers
Answers
D.
Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers
Suggested answer: D

Explanation:

Term frequency-inverse document frequency (TF-IDF) is a technique that assigns a weight to each word in a document based on how important it is to the meaning of the document. The term frequency (TF) measures how often a word appears in a document, while the inverse document frequency (IDF) measures how rare a word is across a collection of documents. The TF-IDF weight is the product of the TF and IDF values, and it is high for words that are frequent in a specific document but rare in the overall corpus. TF-IDF can help improve the validation accuracy of a sentiment analysis model by reducing the impact of common words that have little or no sentiment value, such as ''the'', ''a'', ''and'', etc. Scikit-learn is a popular Python library for machine learning that provides a TF-IDF vectorizer class that can transform a collection of text documents into a matrix of TF-IDF features. By using this tool, the Data Scientist can create a more informative and discriminative feature representation for the sentiment analysis task.

References:

TfidfVectorizer - scikit-learn

Text feature extraction - scikit-learn

TF-IDF for Beginners | by Jana Schmidt | Towards Data Science

Sentiment Analysis: Concept, Analysis and Applications | by Susan Li | Towards Data Science

asked 16/09/2024
Adrien Gallais
38 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first