These primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster's local Hadoop Distributed File
System (HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)

Question

These primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster's local Hadoop Distributed File

System (HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)

chitranjan ranga · Accepted Answer

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to any node of the Dataproc cluster. Mount the Hive tables locally.

chitranjan ranga · Accepted Answer

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to the master node of the Dataproc cluster. Then run the Hadoop utility to copy them do HDFS. Mount the Hive tables from HDFS.

chitranjan ranga · Answer

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDFS. Mount the Hive tables locally.

chitranjan ranga · Answer

Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive tables.Replicate external Hive tables to the native ones.

chitranjan ranga · Answer

Load the ORC files into BigQuery. Leverage BigQuery connector for Hadoop to mount the BigQuery tables as external Hive tables. Replicate external Hive tables to the native ones.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 222 - Professional Data Engineer discussion

Suggested answer: B, C

0 comments