List of questions
Related questions
Question 261 - Professional Data Engineer discussion
You've migrated a Hadoop job from an on-premises cluster to Dataproc and Good Storage. Your Spark job is a complex analytical workload fiat consists of many shuffling operations, and initial data are parquet toes (on average 200-400 MB size each) You see some degradation in performance after the migration to Dataproc so you'd like to optimize for it. Your organization is very cost-sensitive so you'd Idee to continue using Dataproc on preemptibles (with 2 non-preemptibles workers only) for this workload. What should you do?
A.
Switch from HODs to SSDs override the preemptible VMs configuration to increase the boot disk size
B.
Increase the see of your parquet files to ensure them to be 1 GB minimum
C.
Switch to TFRecords format (appr 200 MB per We) instead of parquet files
D.
Switch from HDDs to SSDs. copy initial data from Cloud Storage to Hadoop Distributed File System (HDFS) run the Spark job and copy results back to Cloud Storage
Your answer:
0 comments
Sorted by
Leave a comment first