ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 339 - Professional Data Engineer discussion

Report
Export

You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run your jobs instead of maintaining a long-lived Hadoop cluster yourself. You have a tight timeline and want to keep code changes to a minimum. What should you do?

A.
Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances.
Answers
A.
Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances.
B.
Move your data to Cloud Storage. Run your jobs on Dataproc.
Answers
B.
Move your data to Cloud Storage. Run your jobs on Dataproc.
C.
Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach.
Answers
C.
Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach.
D.
Rewrite your jobs in Apache Beam. Run your jobs in Dataflow.
Answers
D.
Rewrite your jobs in Apache Beam. Run your jobs in Dataflow.
Suggested answer: B

Explanation:

Dataproc's Compatibility with Apache Spark: Dataproc is a managed service for running Hadoop and Spark clusters on Google Cloud. This means it is designed to seamlessly run Apache Spark jobs with minimal code changes. Your existing Spark jobs should run on Dataproc with little to no modification.

Cloud Storage as a Scalable Data Lake: Cloud Storage provides a highly scalable and durable storage solution for your data. It's designed to handle large volumes of data that Spark jobs typically process.

Minimizing Operational Overhead: By using Dataproc, you eliminate the need to manage and maintain a Hadoop cluster yourself. Google Cloud handles the infrastructure, allowing you to focus on your data processing tasks.

Tight Timeline and Minimal Code Changes: This option directly addresses the requirements of the question. It offers a quick and easy way to migrate your Spark jobs to Google Cloud with minimal disruption to your existing codebase.

Why other options are not suitable:

A . Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances: This option requires you to manage the underlying infrastructure yourself, which contradicts the requirement of using managed services.

C . Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach: While BigQuery is a powerful data warehouse, converting Spark scripts to SQL would require substantial code changes and might not be feasible within a tight timeline.

D . Rewrite your jobs in Apache Beam. Run your jobs in Dataflow: Rewriting jobs in Apache Beam would be a significant undertaking and not suitable for a quick migration with minimal code changes.

asked 18/09/2024
garima sharma
51 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first