You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?

Question

Jess Kendrick Gamboa · Accepted Answer

Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming. batch processing, and Cloud DLP Select BigQuery as your data sink.

Jess Kendrick Gamboa · Answer

Use the BigQuery Data Transfer Service to schedule your migration. After the data is populated in BigQuery. use the connection to the Cloud Data Loss Prevention {Cloud DLP} API to de-identify the necessary data.

Jess Kendrick Gamboa · Answer

Use Cloud Data Fusion to design your pipeline, use the Cloud DLP plug-in to de-identify data within your pipeline, and then move the data into BigQuery.

Jess Kendrick Gamboa · Answer

Set up Datastream to replicate your on-premise data on BigQuery.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 344 - Professional Data Engineer discussion

Suggested answer: B

0 comments