ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 344 - Professional Data Engineer discussion

Report
Export

You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?

A.
Use the BigQuery Data Transfer Service to schedule your migration. After the data is populated in BigQuery. use the connection to the Cloud Data Loss Prevention {Cloud DLP} API to de-identify the necessary data.
Answers
A.
Use the BigQuery Data Transfer Service to schedule your migration. After the data is populated in BigQuery. use the connection to the Cloud Data Loss Prevention {Cloud DLP} API to de-identify the necessary data.
B.
Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming. batch processing, and Cloud DLP Select BigQuery as your data sink.
Answers
B.
Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming. batch processing, and Cloud DLP Select BigQuery as your data sink.
C.
Use Cloud Data Fusion to design your pipeline, use the Cloud DLP plug-in to de-identify data within your pipeline, and then move the data into BigQuery.
Answers
C.
Use Cloud Data Fusion to design your pipeline, use the Cloud DLP plug-in to de-identify data within your pipeline, and then move the data into BigQuery.
D.
Set up Datastream to replicate your on-premise data on BigQuery.
Answers
D.
Set up Datastream to replicate your on-premise data on BigQuery.
Suggested answer: B

Explanation:

To load on-premises data into BigQuery while masking sensitive data, we need a solution that offers flexibility for both streaming and batch processing, as well as data masking capabilities. Here's a detailed explanation of why option B is the best choice:

Apache Beam and Dataflow:

Apache Beam SDK provides a unified programming model for both batch and stream data processing.

Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines, offering scalability and ease of use.

Customization for Different Use Cases:

By using the Apache Beam SDK, you can write custom pipelines that can handle both streaming and batch processing within the same framework.

This allows you to switch between streaming and batch modes based on your use case without changing the core logic of your data pipeline.

Data Masking with Cloud DLP:

Google Cloud Data Loss Prevention (DLP) API can be integrated into your Apache Beam pipeline to de-identify and mask sensitive data programmatically before loading it into BigQuery.

This ensures that sensitive data is handled securely and complies with privacy requirements.

Cost Efficiency:

Using Dataflow can be cost-effective because it is a fully managed service, reducing the operational overhead associated with managing your own infrastructure.

The pay-as-you-go model ensures you only pay for the resources you consume, which can help keep costs under control.

Implementation Steps:

Set up Apache Beam Pipeline:

Write a pipeline using the Apache Beam SDK for Python that reads data from your on-premises storage.

Add transformations for data processing, including the integration with Cloud DLP for data masking.

Configure Dataflow:

Deploy the Apache Beam pipeline on Google Cloud Dataflow.

Customize the pipeline options for both streaming and batch use cases.

Load Data into BigQuery:

Set BigQuery as the sink for your data in the Apache Beam pipeline.

Ensure the processed and masked data is loaded into the appropriate BigQuery tables.

Apache Beam Documentation

Google Cloud Dataflow Documentation

Google Cloud DLP Documentation

BigQuery Documentation

asked 18/09/2024
Jess Kendrick Gamboa
35 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first