ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 341 - Professional Data Engineer discussion

Report
Export

You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and SQL. Security policies need to be enforced at the column level. You need a cost-effective solution that can scale into a data mesh. What should you do?

A.
1. Deploy a long-living Dalaproc cluster with Apache Hive and Ranger enabled. 2. Configure Ranger for column level security. 3. Process with Dataproc Spark or Hive SQL.
Answers
A.
1. Deploy a long-living Dalaproc cluster with Apache Hive and Ranger enabled. 2. Configure Ranger for column level security. 3. Process with Dataproc Spark or Hive SQL.
B.
1. Define a BigLake table. 2. Create a taxonomy of policy tags in Data Catalog. 3. Add policy lags to columns. 4. Process with the Spark-BigQuery connector or BigQuery SOL.
Answers
B.
1. Define a BigLake table. 2. Create a taxonomy of policy tags in Data Catalog. 3. Add policy lags to columns. 4. Process with the Spark-BigQuery connector or BigQuery SOL.
C.
1. Load the data to BigQuery tables. 2. Create a taxonomy of policy tags in Data Catalog. 3. Add policy tags to columns. 4. Procoss with the Spark-BigQuery connector or BigQuery SQL.
Answers
C.
1. Load the data to BigQuery tables. 2. Create a taxonomy of policy tags in Data Catalog. 3. Add policy tags to columns. 4. Procoss with the Spark-BigQuery connector or BigQuery SQL.
D.
1 Apply an Identity and Access Management (IAM) policy at the file level in Cloud Storage 2. Define a BigQuery external table for SQL processing. 3. Use Dataproc Spark to process the Cloud Storage files.
Answers
D.
1 Apply an Identity and Access Management (IAM) policy at the file level in Cloud Storage 2. Define a BigQuery external table for SQL processing. 3. Use Dataproc Spark to process the Cloud Storage files.
Suggested answer: D

Explanation:

For automating the CI/CD pipeline of DAGs running in Cloud Composer, the following approach ensures that DAGs are tested and deployed in a streamlined and efficient manner.

Use Cloud Build for Development Instance Testing:

Use Cloud Build to automate the process of copying the DAG code to the Cloud Storage bucket of the development instance.

This triggers Cloud Composer to automatically pick up and test the new DAGs in the development environment.

Testing and Validation:

Ensure that the DAGs run successfully in the development environment.

Validate the functionality and correctness of the DAGs before promoting them to production.

Deploy to Production:

If the DAGs pass all tests in the development environment, use Cloud Build to copy the tested DAG code to the Cloud Storage bucket of the production instance.

This ensures that only validated and tested DAGs are deployed to production, maintaining the stability and reliability of the production environment.

Simplicity and Reliability:

This approach leverages Cloud Build's capabilities for automation and integrates seamlessly with Cloud Composer's reliance on Cloud Storage for DAG storage.

By using Cloud Storage for both development and production deployments, the process remains simple and robust.

Google Data Engineer

Reference:

Cloud Composer Documentation

Using Cloud Build

Deploying DAGs to Cloud Composer

Automating DAG Deployment with Cloud Build

By implementing this CI/CD pipeline, you ensure that DAGs are thoroughly tested in the development environment before being automatically deployed to the production environment, maintaining high quality and reliability.

asked 18/09/2024
Paul LOUIS DIT PICARD
36 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first