ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 74 - Professional Machine Learning Engineer discussion

Report
Export

You need to analyze user activity data from your company's mobile applications. Your team will use BigQuery for data analysis, transformation, and experimentation with ML algorithms. You need to ensure real-time ingestion of the user activity data into BigQuery. What should you do?

A.
Configure Pub/Sub to stream the data into BigQuery.
Answers
A.
Configure Pub/Sub to stream the data into BigQuery.
B.
Run an Apache Spark streaming job on Dataproc to ingest the data into BigQuery.
Answers
B.
Run an Apache Spark streaming job on Dataproc to ingest the data into BigQuery.
C.
Run a Dataflow streaming job to ingest the data into BigQuery.
Answers
C.
Run a Dataflow streaming job to ingest the data into BigQuery.
D.
Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery,
Answers
D.
Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery,
Suggested answer: C

Explanation:

The best option to ensure real-time ingestion of the user activity data into BigQuery is to run a Dataflow streaming job to ingest the data into BigQuery. Dataflow is a fully managed service that can handle both batch and stream processing of data, and can integrate seamlessly with BigQuery and other Google Cloud services. Dataflow can also use Apache Beam as the programming model, which provides a unified and portable API for developing data pipelines. By using Dataflow, you can avoid the complexity and overhead of managing your own infrastructure, and focus on the logic and transformation of your data. Dataflow can also handle various types of data, such as structured, unstructured, or binary data, and can apply windowing, aggregation, and other operations on the data streams.

The other options are not optimal for the following reasons:

A) Configuring Pub/Sub to stream the data into BigQuery is not a good option, as Pub/Sub is a messaging service that can publish and subscribe to data streams, but cannot perform any transformation or processing on the data. Pub/Sub can be used as a source or a sink for Dataflow, but not as a standalone solution for ingesting data into BigQuery.

B) Running an Apache Spark streaming job on Dataproc to ingest the data into BigQuery is not a good option, as it requires setting up and managing your own cluster of virtual machines, which can increase the cost and complexity of your solution. Moreover, Apache Spark is not natively integrated with BigQuery, and requires using connectors or intermediate storage to write data to BigQuery, which can introduce latency and inefficiency.

D) Configuring Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery is not a bad option, but it is not necessary, as Dataflow can directly read data from the mobile applications without using Pub/Sub as an intermediary. Using Pub/Sub can add an extra layer of abstraction and reliability, but it can also increase the cost and complexity of your solution, and introduce some delay in the data ingestion.

Professional ML Engineer Exam Guide

Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate

Google Cloud launches machine learning engineer certification

Dataflow documentation

BigQuery documentation

asked 18/09/2024
Nathalie Yip
38 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first