ExamGecko

Microsoft DP-100 Practice Test - Questions Answers, Page 14

Question list
Search
Search

List of questions

Search

Related questions











Note: This question-is part of a series of questions that present the same scenario. Each question-in the series contains a unique solution that might meet the stated goals. Some question-sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question-in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You are creating a new experiment in Azure Machine Learning Studio.

One class has a much smaller number of observations than the other classes in the training set.

You need to select an appropriate data sampling strategy to compensate for the class imbalance.

Solution: You use the Principal Components Analysis (PCA) sampling mode.

Does the solution meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.

Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.

Incorrect Answers:

The Principal Component Analysis module in Azure Machine Learning Studio (classic) is used to reduce the dimensionality of your training data. The module analyzes your data and creates a reduced feature set that captures all the information contained in the dataset, but in a smaller number of features.

Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/principal-component-analysis

You are performing feature engineering on a dataset.

You must add a feature named CityName and populate the column value with the text London.

You need to add the new feature to the dataset.

Which Azure Machine Learning Studio module should you use?

A.
Edit Metadata
A.
Edit Metadata
Answers
B.
Filter Based Feature Selection
B.
Filter Based Feature Selection
Answers
C.
Execute Python Script
C.
Execute Python Script
Answers
D.
Latent Dirichlet Allocation
D.
Latent Dirichlet Allocation
Answers
Suggested answer: A

Explanation:

Typical metadata changes might include marking columns as features.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/edit-metadata

You are evaluating a completed binary classification machine learning model.

You need to use the precision as the evaluation metric.

Which visualization should you use?

A.
violin plot
A.
violin plot
Answers
B.
Gradient descent
B.
Gradient descent
Answers
C.
Scatter plot
C.
Scatter plot
Answers
D.
Receiver Operating Characteristic (ROC) curve
D.
Receiver Operating Characteristic (ROC) curve
Answers
Suggested answer: D

Explanation:

Receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the incorrectly classified labels for a particular model.

Incorrect Answers:

A: A violin plot is a visual that traditionally combines a box plot and a kernel density plot.

B: Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.

C: A scatter plot graphs the actual values in your data against the values predicted by the model. The scatter plot displays the actual values along the X-axis, and displays the predicted values along the Y-axis. It also displays a line that illustrates the perfect prediction, where the predicted value exactly matches the actual value.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#confusion-matrix

You are solving a classification task.

You must evaluate your model on a limited data sample by using k-fold cross-validation. You start by configuring a k parameter as the number of splits.

You need to configure the k parameter for the cross-validation.

Which value should you use?

A.
k=1
A.
k=1
Answers
B.
k=10
B.
k=10
Answers
C.
k=0.5
C.
k=0.5
Answers
D.
k=0.9
D.
k=0.9
Answers
Suggested answer: B

Explanation:

Leave One Out (LOO) cross-validation

Setting K = n (the number of observations) yields n-fold and is called leave-one out cross-validation (LOO), a special case of the K-fold approach.

LOO CV is sometimes useful but typically doesn't shake up the data enough. The estimates from each fold are highly correlated and hence their average can have high variance. This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance tradeoff.

You use the Azure Machine Learning service to create a tabular dataset named training_data. You plan to use this dataset in a training script.

You create a variable that references the dataset using the following code:

training_ds = workspace.datasets.get("training_data")

You define an estimator to run the script.

You need to set the correct property of the estimator to ensure that your script can access the training_data dataset.

Which property should you set?

A.
environment_definition = {"training_data":training_ds}
A.
environment_definition = {"training_data":training_ds}
Answers
B.
inputs = [training_ds.as_named_input('training_ds')]
B.
inputs = [training_ds.as_named_input('training_ds')]
Answers
C.
script_params = {"--training_ds":training_ds}
C.
script_params = {"--training_ds":training_ds}
Answers
D.
source_directory = training_ds
D.
source_directory = training_ds
Answers
Suggested answer: B

Explanation:

Example:

# Get the training dataset diabetes_ds = ws.datasets.get("Diabetes Dataset") # Create an estimator that uses the remote compute hyper_estimator = SKLearn(source_directory=experiment_folder, inputs=[diabetes_ds.as_named_input('diabetes')], # Pass the dataset as an input compute_target = cpu_cluster, conda_packages=['pandas','ipykernel','matplotlib'], pip_packages=['azureml-sdk','argparse','pyarrow'], entry_script='diabetes_training.py')

Reference: https://notebooks.azure.com/GraemeMalcolm/projects/azureml-primers/html/04%20-%20Optimizing%20Model%20Training.ipynb

You register a file dataset named csv_folder that references a folder. The folder includes multiple comma-separated values (CSV) files in an Azure storage blob container.

You plan to use the following code to run a script that loads data from the file dataset. You create and instantiate the following variables:

You have the following code:

You need to pass the dataset to ensure that the script can read the files it references.

Which code segment should you insert to replace the code comment?

A.
inputs=[file_dataset.as_named_input('training_files')],
A.
inputs=[file_dataset.as_named_input('training_files')],
Answers
B.
inputs=[file_dataset.as_named_input('training_files').as_mount()],
B.
inputs=[file_dataset.as_named_input('training_files').as_mount()],
Answers
C.
inputs=[file_dataset.as_named_input('training_files').to_pandas_dataframe()],
C.
inputs=[file_dataset.as_named_input('training_files').to_pandas_dataframe()],
Answers
D.
script_params={'--training_files': file_dataset},
D.
script_params={'--training_files': file_dataset},
Answers
Suggested answer: B

Explanation:

Example:

from azureml.train.estimator import Estimator

script_params = {

# to mount files referenced by mnist dataset

'--data-folder': mnist_file_dataset.as_named_input('mnist_opendataset').as_mount(),

'--regularization': 0.5

}

est = Estimator(source_directory=script_folder,

script_params=script_params,

compute_target=compute_target,

environment_definition=env,

entry_script='train.py')

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-train-models-with-aml

You are creating a new Azure Machine Learning pipeline using the designer.

The pipeline must train a model using data in a comma-separated values (CSV) file that is published on a website. You have not created a dataset for this file.

You need to ingest the data from the CSV file into the designer pipeline using the minimal administrative effort.

Which module should you add to the pipeline in Designer?

A.
Convert to CSV
A.
Convert to CSV
Answers
B.
Enter Data Manually
B.
Enter Data Manually
Answers
C.
Import Data
C.
Import Data
Answers
D.
Dataset
D.
Dataset
Answers
Suggested answer: D

Explanation:


You define a datastore named ml-data for an Azure Storage blob container. In the container, you have a folder named train that contains a file named data.csv. You plan to use the file to train a model by using the Azure Machine Learning

SDK.

You plan to train the model by using the Azure Machine Learning SDK to run an experiment on local compute.

You define a DataReference object by running the following code:

You need to load the training data.

Which code segment should you use?

A.
A.
Answers
B.
B.
Answers
C.
C.
Answers
D.
D.
Answers
E.
E.
Answers
Suggested answer: E

Explanation:

Example:

data_folder = args.data_folder # Load Train and Test data train_data = pd.read_csv(os.path.join(data_folder, 'data.csv'))

Reference:

https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You create an Azure Machine Learning service datastore in a workspace. The datastore contains the following files:

/data/2018/Q1.csv

/data/2018/Q2.csv

/data/2018/Q3.csv

/data/2018/Q4.csv

/data/2019/Q1.csv

All files store data in the following format:

id,f1,f2,I

1,1,2,0

2,1,1,1

3,2,1,0

4,2,2,1

You run the following code:

You need to create a dataset named training_data and load the data from all files into a single data frame by using the following code:

Solution: Run the following code:

Does the solution meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

Define paths with two file paths instead.

Use Dataset.Tabular_from_delimeted as the data isn't cleansed.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You create an Azure Machine Learning service datastore in a workspace. The datastore contains the following files:

/data/2018/Q1.csv

/data/2018/Q2.csv

/data/2018/Q3.csv

/data/2018/Q4.csv

/data/2019/Q1.csv

All files store data in the following format:

id,f1,f2,I

1,1,2,0

2,1,1,1

3,2,1,0

4,2,2,1

You run the following code:

You need to create a dataset named training_data and load the data from all files into a single data frame by using the following code:

Solution: Run the following code:

Does the solution meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

Use two file paths.

Use Dataset.Tabular_from_delimeted, instead of Dataset.File.from_files as the data isn't cleansed.

Note:

A FileDataset references single or multiple files in your datastores or public URLs. If your data is already cleansed, and ready to use in training experiments, you can download or mount the files to your compute as a FileDataset object.

A TabularDataset represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas or Spark DataFrame so you can work with familiar data preparation and training libraries without having to leave your notebook. You can create a TabularDataset object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets

Total 433 questions
Go to page: of 44