ExamGecko

DP-100: Designing and Implementing a Data Science Solution on Azure

Designing and Implementing a Data Science Solution on Azure
Vendor:

Microsoft

Designing and Implementing a Data Science Solution on Azure
Exam Questions: 433

Exam Number: DP-100

Exam Name: Designing and Implementing a Data Science Solution on Azure

Length of test: 120 mins

Exam Format: Multiple-choice, Drag and Drop, and HOTSPOT questions.

Exam Language: English

Number of questions in the actual exam: 40-60 questions

Passing Score: 700/1000

This study guide should help you understand what to expect on DP-100 exam and includes a summary of the topics the exam might cover and links to additional resources. The information and materials in this document should help you focus your studies as you prepare for the exam.

Skills at a glance

  • Design and prepare a machine learning solution (20–25%)

  • Explore data, and train models (35–40%)

  • Prepare a model for deployment (20–25%)

  • Deploy and retrain a model (10–15%)

Related questions

HOTSPOT

You create an Azure Machine Learning dataset containing automobile price data The dataset includes 10,000 rows and 10 columns You use Azure Machine Learning Designer to transform the dataset by using an Execute Python Script component and custom code.

The code must combine three columns to create a new column.

You need to configure the code function.

Which configurations should you use? lo answer, select the appropriate options in the answer area

NOTE: Each correct selection is worth one point.


Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member

DRAG DROP

You create an Azure Machine Learning workspace and an Azure Synapse Analytics workspace with a Spark pool. The workspaces are contained within the same Azure subscription.

You must manage the Synapse Spark pool from the Azure Machine Learning workspace.

You need to attach the Synapse Spark pool in Azure Machine Learning by usinq the Python SDK v2.

Which three actions should you perform in sequence? To answer move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.



Question 2
Correct answer: Question 2

You use Azure Machine Learning studio to analyze an mltable data asset containing a decimal column named column1. You need to verify that the column1 values are normally distributed.

Which statistic should you use?

Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member

You register a model that you plan to use in a batch inference pipeline.

The batch inference pipeline must use a ParallelRunStep step to process files in a file dataset. The script has the ParallelRunStep step runs must process six input files each time the inferencing function is called.

You need to configure the pipeline.

Which configuration setting should you specify in the ParallelRunConfig object for the PrallelRunStep step?

A.
process_count_per_node= "6"
A.
process_count_per_node= "6"
Answers
B.
node_count= "6"
B.
node_count= "6"
Answers
C.
mini_batch_size= "6"
C.
mini_batch_size= "6"
Answers
D.
error_threshold= "6"
D.
error_threshold= "6"
Answers
Suggested answer: B

Explanation:

node_count is the number of nodes in the compute target used for running the ParallelRunStep.

Incorrect Answers:

A: process_count_per_node

Number of processes executed on each node. (optional, default value is number of cores on node.)

C: mini_batch_size

For FileDataset input, this field is the number of files user script can process in one run() call. For TabularDataset input, this field is the approximate size of data the user script can process in one run() call. Example values are 1024, 1024KB, 10MB, and 1GB.

D: error_threshold

The number of record failures for TabularDataset and file failures for FileDataset that should be ignored during processing. If the error count goes above this value, then the job will be aborted.

Reference:

https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-steps/azureml.contrib.pipeline.steps.parallelrunconfig?view=azure-ml-py

HOTSPOT

You create an Azure Machine Learning model to include model files and a scorning script. You must deploy the model. The deployment solution must meet the following requirements:

* Provide near real-time inferencing.

* Enable endpoint and deployment level cost estimates.

* Support logging to Azure Log Analytics.

You need to configure the deployment solution.

What should you configure? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.


Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member

You train and publish a machine teaming model.

You need to run a pipeline that retrains the model based on a trigger from an external system.

What should you configure?

A.
Azure Data Catalog
A.
Azure Data Catalog
Answers
B.
Azure Batch
B.
Azure Batch
Answers
C.
Azure logic App
C.
Azure logic App
Answers
Suggested answer: C

You create a multi-class image classification deep learning model.

You train the model by using PyTorch version 1.2.

You need to ensure that the correct version of PyTorch can be identified for the inferencing environment when the model is deployed.

What should you do?

A.
Save the model locally as a.pt file, and deploy the model as a local web service.
A.
Save the model locally as a.pt file, and deploy the model as a local web service.
Answers
B.
Deploy the model on computer that is configured to use the default Azure Machine Learning conda environment.
B.
Deploy the model on computer that is configured to use the default Azure Machine Learning conda environment.
Answers
C.
Register the model with a .pt file extension and the default version property.
C.
Register the model with a .pt file extension and the default version property.
Answers
D.
Register the model, specifying the model_framework and model_framework_version properties.
D.
Register the model, specifying the model_framework and model_framework_version properties.
Answers
Suggested answer: D

Explanation:

framework_version: The PyTorch version to be used for executing training code.

Reference: https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn.pytorch?view=azure-ml-py

You plan to create a speech recognition deep learning model.

The model must support the latest version of Python.

You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual Machine (DSVM).

What should you recommend?

A.
Rattle
A.
Rattle
Answers
B.
TensorFlow
B.
TensorFlow
Answers
C.
Weka
C.
Weka
Answers
D.
Scikit-learn
D.
Scikit-learn
Answers
Suggested answer: B

Explanation:

TensorFlow is an open-source library for numerical computation and large-scale machine learning. It uses Python to provide a convenient front-end API for building applications with the framework TensorFlow can train and run deep neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks, sequence-to-sequence models for machine translation, natural language processing, and PDE (partial differential equation) based simulations.

Incorrect Answers:

A: Rattle is the R analytical tool that gets you started with data analytics and machine learning.

C: Weka is used for visual data mining and machine learning software in Java.

D: Scikit-learn is one of the most useful libraries for machine learning in Python. It is on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.

Reference:

https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html

you create an Azure Machine learning workspace named workspace1. The workspace contains a Python SOK v2 notebook mat uses Mallow to correct model coaxing men's anal arracks from your local computer.

Vou must reuse the notebook to run on Azure Machine I earning compute instance m workspace.

You need to comminute to log training and artifacts from your data science code.

What should you do?

Become a Premium Member for full access
Unlock Premium Member  Unlock Premium Member

You deploy a real-time inference service for a trained model.

The deployed model supports a business-critical application, and it is important to be able to monitor the data submitted to the web service and the predictions the data generates.

You need to implement a monitoring solution for the deployed model using minimal administrative effort.

What should you do?

A.
View the explanations for the registered model in Azure ML studio.
A.
View the explanations for the registered model in Azure ML studio.
Answers
B.
Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal.
B.
Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal.
Answers
C.
View the log files generated by the experiment used to train the model.
C.
View the log files generated by the experiment used to train the model.
Answers
D.
D.
Answers
Suggested answer: B

Explanation:

B. Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal.

C. View the log files generated by the experiment used to train the model.

D. Create an ML Flow tracking URI that references the endpoint, and view the data logged by ML Flow.

Answer: B

Explanation:

Configure logging with Azure Machine Learning studio

You can also enable Azure Application Insights from Azure Machine Learning studio. When you're ready to deploy your model as a web service, use the following steps to enable Application Insights:

1. Sign in to the studio at https://ml.azure.com.

2. Go to Models and select the model you want to deploy.

3. Select +Deploy.

4. Populate the Deploy model form.

5. Expand the Advanced menu.

6. Select Enable Application Insights diagnostics and data collection.

Reference:

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-enable-app-insights