Amazon MLS-C01 Practice Test - Questions Answers, Page 33

List of questions
Question 321

A machine learning (ML) specialist needs to solve a binary classification problem for a marketing dataset. The ML specialist must maximize the Area Under the ROC Curve (AUC) of the algorithm by training an XGBoost algorithm. The ML specialist must find values for the eta, alpha, min_child_weight, and max_depth hyperparameter that will generate the most accurate model.
Which approach will meet these requirements with the LEAST operational overhead?
Question 322

An agriculture company wants to improve crop yield forecasting for the upcoming season by using crop yields from the last three seasons. The company wants to compare the performance of its new scikit-learn model to the benchmark.
A data scientist needs to package the code into a container that computes both the new model forecast and the benchmark.
The data scientist wants AWS to be responsible for the operational maintenance of the container.
Which solution will meet these requirements?
Question 323

A data scientist is building a new model for an ecommerce company. The model will predict how many minutes it will take to deliver a package.
During model training, the data scientist needs to evaluate model performance.
Which metrics should the data scientist use to meet this requirement? (Select TWO.)
Question 324

A company needs to develop a model that uses a machine learning (ML) model for risk analysis. An ML engineer needs to evaluate the contribution each feature of a training dataset makes to the prediction of the target variable before the ML engineer selects features.
How should the ML engineer predict the contribution of each feature?
Question 325

A data scientist receives a new dataset in .csv format and stores the dataset in Amazon S3. The data scientist will use this dataset to train a machine learning (ML) model.
The data scientist first needs to identify any potential data quality issues in the dataset. The data scientist must identify values that are missing or values that are not valid. The data scientist must also identify the number of outliers in the dataset.
Which solution will meet these requirements with the LEAST operational effort?)
Question