ExamGecko

Microsoft DP-100 Practice Test - Questions Answers, Page 13

Question list
Search
Search

List of questions

Search

Related questions











You plan to run a script as an experiment using a Script Run Configuration. The script uses modules from the scipy library as well as several Python packages that are not typically installed in a default conda environment.

You plan to run the experiment on your local workstation for small datasets and scale out the experiment by running it on more powerful remote compute clusters for larger datasets.

You need to ensure that the experiment runs successfully on local and remote compute with the least administrative effort.

What should you do?

A.
Do not specify an environment in the run configuration for the experiment. Run the experiment by using the default environment.
A.
Do not specify an environment in the run configuration for the experiment. Run the experiment by using the default environment.
Answers
B.
Create a virtual machine (VM) with the required Python configuration and attach the VM as a compute target. Use this compute target for all experiment runs.
B.
Create a virtual machine (VM) with the required Python configuration and attach the VM as a compute target. Use this compute target for all experiment runs.
Answers
C.
Create and register an Environment that includes the required packages. Use this Environment for all experiment runs.
C.
Create and register an Environment that includes the required packages. Use this Environment for all experiment runs.
Answers
D.
Create a config.yaml file defining the conda packages that are required and save the file in the experiment folder.
D.
Create a config.yaml file defining the conda packages that are required and save the file in the experiment folder.
Answers
E.
Always run the experiment with an Estimator by using the default packages.
E.
Always run the experiment with an Estimator by using the default packages.
Answers
Suggested answer: C

Explanation:

If you have an existing Conda environment on your local computer, then you can use the service to create an environment object. By using this strategy, you can reuse your local interactive environment on remote runs.

Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments

You write a Python script that processes data in a comma-separated values (CSV) file.

You plan to run this script as an Azure Machine Learning experiment.

The script loads the data and determines the number of rows it contains using the following code:

You need to record the row count as a metric named row_count that can be returned using the get_metrics method of the Run object after the experiment run completes.

Which code should you use?

A.
run.upload_file(T3 row_count', './data.csv')
A.
run.upload_file(T3 row_count', './data.csv')
Answers
B.
run.log('row_count', rows)
B.
run.log('row_count', rows)
Answers
C.
run.tag('row_count', rows)
C.
run.tag('row_count', rows)
Answers
D.
run.log_table('row_count', rows)
D.
run.log_table('row_count', rows)
Answers
E.
run.log_row('row_count', rows)
E.
run.log_row('row_count', rows)
Answers
Suggested answer: B

Explanation:

Log a numerical or string value to the run with the given name using log(name, value, description=''). Logging a metric to a run causes that metric to be stored in the run record in the experiment. You can log the same metric multiple times within a run, the result being considered a vector of that metric.

Example: run.log("accuracy", 0.95)

Incorrect Answers:

E: Using log_row(name, description=None, **kwargs) creates a metric with multiple columns as described in kwargs. Each named parameter generates a column with the value specified. log_row can be called once to log an arbitrary tuple, or multiple times in a loop to generate a complete table.

Example: run.log_row("Y over X", x=1, y=0.4)

Reference: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run

Note: This question-is part of a series of questions that present the same scenario. Each question-in the series contains a unique solution that might meet the stated goals. Some question-sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question-in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You are creating a new experiment in Azure Machine Learning Studio.

One class has a much smaller number of observations than the other classes in the training set.

You need to select an appropriate data sampling strategy to compensate for the class imbalance.

Solution: You use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.

Does the solution meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: A

Explanation:

SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote

Note: This question-is part of a series of questions that present the same scenario. Each question-in the series contains a unique solution that might meet the stated goals. Some question-sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question-in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You are creating a new experiment in Azure Machine Learning Studio.

One class has a much smaller number of observations than the other classes in the training set.

You need to select an appropriate data sampling strategy to compensate for the class imbalance.

Solution: You use the Stratified split for the sampling mode.

Does the solution meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.

Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.

Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote

You are creating a machine learning model.

You need to identify outliers in the data.

Which two visualizations can you use? Each correct answer presents a complete solution.

NOTE: Each correct selection is worth one point.

A.
Venn diagram
A.
Venn diagram
Answers
B.
Box plot
B.
Box plot
Answers
C.
ROC curve
C.
ROC curve
Answers
D.
Random forest diagram
D.
Random forest diagram
Answers
E.
Scatter plot
E.
Scatter plot
Answers
Suggested answer: B, E

Explanation:

The box-plot algorithm can be used to display outliers.

One other way to quickly identify Outliers visually is to create scatter plots.

Reference:

https://blogs.msdn.microsoft.com/azuredev/2017/05/27/data-cleansing-tools-in-azure-machine-learning/

You are evaluating a completed binary classification machine learning model.

You need to use the precision as the evaluation metric.

Which visualization should you use?

A.
Violin plot
A.
Violin plot
Answers
B.
Gradient descent
B.
Gradient descent
Answers
C.
Box plot
C.
Box plot
Answers
D.
Binary classification confusion matrix
D.
Binary classification confusion matrix
Answers
Suggested answer: D

Explanation:

Incorrect Answers:

A: A violin plot is a visual that traditionally combines a box plot and a kernel density plot.

B: Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.

C: A box plot lets you see basic distribution information about your data, such as median, mean, range and quartiles but doesn't show you how your data looks throughout its range.

Reference:

https://machinelearningknowledge.ai/confusion-matrix-and-performance-metrics-machine-learning/

You create a multi-class image classification deep learning model that uses the PyTorch deep learning framework.

You must configure Azure Machine Learning Hyperdrive to optimize the hyperparameters for the classification model.

You need to define a primary metric to determine the hyperparameter values that result in the model with the best accuracy score.

Which three actions must you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

A.
Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to maximize.
A.
Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to maximize.
Answers
B.
Add code to the bird_classifier_train.py script to calculate the validation loss of the model and log it as a float value with the key loss.
B.
Add code to the bird_classifier_train.py script to calculate the validation loss of the model and log it as a float value with the key loss.
Answers
C.
Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to minimize.
C.
Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to minimize.
Answers
D.
Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to accuracy.
D.
Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to accuracy.
Answers
E.
Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to loss.
E.
Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to loss.
Answers
F.
Add code to the bird_classifier_train.py script to calculate the validation accuracy of the model and log it as a float value with the key accuracy.
F.
Add code to the bird_classifier_train.py script to calculate the validation accuracy of the model and log it as a float value with the key accuracy.
Answers
Suggested answer: A, D, F

Explanation:

AD:

primary_metric_name="accuracy", primary_metric_goal=PrimaryMetricGoal.MAXIMIZE Optimize the runs to maximize "accuracy". Make sure to log this value in your training script. Note: primary_metric_name: The name of the primary metric to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script. primary_metric_goal: It can be either PrimaryMetricGoal.MAXIMIZE or PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maximized or minimized when evaluating the runs.

F: The training script calculates the val_accuracy and logs it as "accuracy", which is used as the primary metric.

You are performing a filter-based feature selection for a dataset to build a multi-class classifier by using Azure Machine Learning Studio.

The dataset contains categorical features that are highly correlated to the output label column.

You need to select the appropriate feature scoring statistical method to identify the key predictors.

Which method should you use?

A.
Kendall correlation
A.
Kendall correlation
Answers
B.
Spearman correlation
B.
Spearman correlation
Answers
C.
Chi-squared
C.
Chi-squared
Answers
D.
Pearson correlation
D.
Pearson correlation
Answers
Suggested answer: D

Explanation:

Pearson's correlation statistic, or Pearson's correlation coefficient, is also known in statistical models as the r value. For any two variables, it returns a value that indicates the strength of the correlation Pearson's correlation coefficient is the test statistics that measures the statistical relationship, or association, between two continuous variables. It is known as the best method of measuring the association between variables of interest because it is based on the method of covariance. It gives information about the magnitude of the association, or correlation, as well as the direction of the relationship.

Incorrect Answers:

C: The two-way chi-squared test is a statistical method that measures how close expected values are to actual results.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/filter-based-feature-selection https://www.statisticssolutions.com/pearsons-correlation-coefficient/

You plan to use automated machine learning to train a regression model. You have data that has features which have missing values, and categorical features with few distinct values.

You need to configure automated machine learning to automatically impute missing values and encode categorical features as part of the training task.

Which parameter and value pair should you use in the AutoMLConfig class?

A.
featurization = 'auto'
A.
featurization = 'auto'
Answers
B.
enable_voting_ensemble = True
B.
enable_voting_ensemble = True
Answers
C.
task = 'classification'
C.
task = 'classification'
Answers
D.
exclude_nan_labels = True
D.
exclude_nan_labels = True
Answers
E.
enable_tf = True
E.
enable_tf = True
Answers
Suggested answer: A

Explanation:

Featurization str or FeaturizationConfig

Values: 'auto' / 'off' / FeaturizationConfig

Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used.

Column type is automatically detected. Based on the detected column type preprocessing/featurization is done as follows:

Categorical: Target encoding, one hot encoding, drop high cardinality categories, impute missing values.

Numeric: Impute missing values, cluster distance, weight of evidence.

DateTime: Several features such as day, seconds, minutes, hours etc.

Text: Bag of words, pre-trained Word embedding, text target encoding.

Reference:

https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig

You are building a regression model for estimating the number of calls during an event.

You need to determine whether the feature values achieve the conditions to build a Poisson regression model.

Which two conditions must the feature set contain? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

A.
The label data must be a negative value.
A.
The label data must be a negative value.
Answers
B.
The label data must be whole numbers.
B.
The label data must be whole numbers.
Answers
C.
The label data must be non-discrete.
C.
The label data must be non-discrete.
Answers
D.
The label data must be a positive value.
D.
The label data must be a positive value.
Answers
E.
The label data can be positive or negative.
E.
The label data can be positive or negative.
Answers
Suggested answer: B, D

Explanation:

Poisson regression is intended for use in regression models that are used to predict numeric values, typically counts. Therefore, you should use this module to create your regression model only if the values you are trying to predict fit the following conditions:

The response variable has a Poisson distribution.

Counts cannot be negative. The method will fail outright if you attempt to use it with negative labels.

A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with non-whole numbers.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/poisson-regression

Total 433 questions
Go to page: of 44