ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 242 - Professional Data Engineer discussion

Report
Export

You want to rebuild your batch pipeline for structured data on Google Cloud You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You have already moved your raw data into Cloud Storage How should you build the pipeline on Google Cloud while meeting speed and processing requirements?

A.
Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline on Dataproc to write the data into BigQuery
Answers
A.
Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline on Dataproc to write the data into BigQuery
B.
Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
Answers
B.
Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
C.
Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table
Answers
C.
Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table
D.
Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery
Answers
D.
Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery
Suggested answer: A
asked 18/09/2024
Wessel Beulink
39 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first