Your infrastructure team has set up an interconnect link between Google Cloud and the on-premises network. You are designing a high-throughput streaming pipeline to ingest data in streaming from an Apache Kafka cluster hosted on-premises. You want to store the data in BigQuery, with as minima! latency as possible. What should you do?

Question

Vladimir Kornfeld · Accepted Answer

Setup a Kafka Connect bridge between Kafka and Pub/Sub. Write a Dataflow pipeline, read the data from Pub/Sub, and write the data to BigQuery.

Vladimir Kornfeld · Answer

Use a proxy host in the VPC in Google Cloud connecting to Kafka. Write a Dataflow pipeline, read data from the proxy host, and write the data to BigQuery.

Vladimir Kornfeld · Answer

Setup a Kafka Connect bridge between Kafka and Pub/Sub. Use a Google-provided Dataflow template to read the data from Pub/Sub, and write the data to BigQuery.

Vladimir Kornfeld · Answer

Use Dataflow, write a pipeline that reads the data from Kafka, and writes the data to BigQuery.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 337 - Professional Data Engineer discussion

Suggested answer: C

0 comments