ExamGecko
Home / Google / Associate Data Practitioner / List of questions
Ask Question

Google Associate Data Practitioner Practice Test - Questions Answers, Page 3

Add to Whishlist

List of questions

Question 21

Report Export Collapse

You are working with a large dataset of customer reviews stored in Cloud Storage. The dataset contains several inconsistencies, such as missing values, incorrect data types, and duplicate entries. You need to clean the data to ensure that it is accurate and consistent before using it for analysis. What should you do?

Use the PythonOperator in Cloud Composer to clean the data and load it into BigQuery. Use SQL for analysis.

Use the PythonOperator in Cloud Composer to clean the data and load it into BigQuery. Use SQL for analysis.

Use BigQuery to batch load the data into BigQuery. Use SQL for cleaning and analysis.

Use BigQuery to batch load the data into BigQuery. Use SQL for cleaning and analysis.

Use Storage Transfer Service to move the data to a different Cloud Storage bucket. Use event triggers to invoke Cloud Run functions to load the data into BigQuery. Use SQL for analysis.

Use Storage Transfer Service to move the data to a different Cloud Storage bucket. Use event triggers to invoke Cloud Run functions to load the data into BigQuery. Use SQL for analysis.

Use Cloud Run functions to clean the data and load it into BigQuery. Use SQL for analysis.

Use Cloud Run functions to clean the data and load it into BigQuery. Use SQL for analysis.

Suggested answer: B
Explanation:

Using BigQuery to batch load the data and perform cleaning and analysis with SQL is the best approach for this scenario. BigQuery provides powerful SQL capabilities to handle missing values, enforce correct data types, and remove duplicates efficiently. This method simplifies the pipeline by leveraging BigQuery's built-in processing power for both cleaning and analysis, reducing the need for additional tools or services and minimizing complexity.

asked 13/02/2025
Gregory Destrebecq
37 questions

Question 22

Report Export Collapse

Your retail organization stores sensitive application usage data in Cloud Storage. You need to encrypt the data without the operational overhead of managing encryption keys. What should you do?

Use Google-managed encryption keys (GMEK).

Use Google-managed encryption keys (GMEK).

Use customer-managed encryption keys (CMEK).

Use customer-managed encryption keys (CMEK).

Use customer-supplied encryption keys (CSEK).

Use customer-supplied encryption keys (CSEK).

Use customer-supplied encryption keys (CSEK) for the sensitive data and customer-managed encryption keys (CMEK) for the less sensitive data.

Use customer-supplied encryption keys (CSEK) for the sensitive data and customer-managed encryption keys (CMEK) for the less sensitive data.

Suggested answer: A
Explanation:

Using Google-managed encryption keys (GMEK) is the best choice when you want to encrypt sensitive data in Cloud Storage without the operational overhead of managing encryption keys. GMEK is the default encryption mechanism in Google Cloud, and it ensures that data is automatically encrypted at rest with no additional setup or maintenance required. It provides strong security while eliminating the need for manual key management.

asked 13/02/2025
janet phillips
38 questions

Question 23

Report Export Collapse

You work for a financial organization that stores transaction data in BigQuery. Your organization has a regulatory requirement to retain data for a minimum of seven years for auditing purposes. You need to ensure that the data is retained for seven years using an efficient and cost-optimized approach. What should you do?

Create a partition by transaction date, and set the partition expiration policy to seven years.

Create a partition by transaction date, and set the partition expiration policy to seven years.

Set the table-level retention policy in BigQuery to seven years.

Set the table-level retention policy in BigQuery to seven years.

Set the dataset-level retention policy in BigQuery to seven years.

Set the dataset-level retention policy in BigQuery to seven years.

Export the BigQuery tables to Cloud Storage daily, and enforce a lifecycle management policy that has a seven-year retention rule.

Export the BigQuery tables to Cloud Storage daily, and enforce a lifecycle management policy that has a seven-year retention rule.

Suggested answer: B
Explanation:

Setting a table-level retention policy in BigQuery to seven years is the most efficient and cost-optimized solution to meet the regulatory requirement. A table-level retention policy ensures that the data cannot be deleted or overwritten before the specified retention period expires, providing compliance with auditing requirements while keeping the data within BigQuery for easy access and analysis. This approach avoids the complexity and additional costs of exporting data to Cloud Storage.

asked 13/02/2025
ALOUAT EKRAM
49 questions

Question 24

Report Export Collapse

Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery. What should you do?

Create a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.

Create a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.

Launch a Cloud Data Fusion environment, use plugins to connect to BigQuery and Cloud Storage, and use the SQL join operation to analyze the data.

Launch a Cloud Data Fusion environment, use plugins to connect to BigQuery and Cloud Storage, and use the SQL join operation to analyze the data.

Create external tables over the files in Cloud Storage, and perform SQL joins to tables in BigQuery to analyze the data.

Create external tables over the files in Cloud Storage, and perform SQL joins to tables in BigQuery to analyze the data.

Use the bq load command to load the Parquet files into BigQuery, and perform SQL joins to analyze the data.

Use the bq load command to load the Parquet files into BigQuery, and perform SQL joins to analyze the data.

Suggested answer: C
Explanation:

Creating external tables over the Parquet files in Cloud Storage allows you to perform SQL-based analysis and joins with data already in BigQuery without needing to load the files into BigQuery. This approach is efficient for a one-time analysis as it avoids the time and cost associated with loading large volumes of data into BigQuery. External tables provide seamless integration with Cloud Storage, enabling quick and cost-effective analysis of data stored in Parquet format.

asked 13/02/2025
Liusel Herrera Garcia
32 questions

Question 25

Report Export Collapse

Your team is building several data pipelines that contain a collection of complex tasks and dependencies that you want to execute on a schedule, in a specific order. The tasks and dependencies consist of files in Cloud Storage, Apache Spark jobs, and data in BigQuery. You need to design a system that can schedule and automate these data processing tasks using a fully managed approach. What should you do?

Use Cloud Scheduler to schedule the jobs to run.

Use Cloud Scheduler to schedule the jobs to run.

Use Cloud Tasks to schedule and run the jobs asynchronously.

Use Cloud Tasks to schedule and run the jobs asynchronously.

Create directed acyclic graphs (DAGs) in Cloud Composer. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.

Create directed acyclic graphs (DAGs) in Cloud Composer. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.

Create directed acyclic graphs (DAGs) in Apache Airflow deployed on Google Kubernetes Engine. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.

Create directed acyclic graphs (DAGs) in Apache Airflow deployed on Google Kubernetes Engine. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.

Suggested answer: C
Explanation:

Using Cloud Composer to create Directed Acyclic Graphs (DAGs) is the best solution because it is a fully managed, scalable workflow orchestration service based on Apache Airflow. Cloud Composer allows you to define complex task dependencies and schedules while integrating seamlessly with Google Cloud services such as Cloud Storage, BigQuery, and Dataproc for Apache Spark jobs. This approach minimizes operational overhead, supports scheduling and automation, and provides an efficient and fully managed way to orchestrate your data pipelines.

asked 13/02/2025
Frederico DionΓƒ­sio
45 questions

Question 26

Report Export Collapse

You are responsible for managing Cloud Storage buckets for a research company. Your company has well-defined data tiering and retention rules. You need to optimize storage costs while achieving your data retention needs. What should you do?

Configure the buckets to use the Archive storage class.

Configure the buckets to use the Archive storage class.

Configure a lifecycle management policy on each bucket to downgrade the storage class and remove objects based on age.

Configure a lifecycle management policy on each bucket to downgrade the storage class and remove objects based on age.

Configure the buckets to use the Standard storage class and enable Object Versioning.

Configure the buckets to use the Standard storage class and enable Object Versioning.

Configure the buckets to use the Autoclass feature.

Configure the buckets to use the Autoclass feature.

Suggested answer: B
Explanation:

Configuring a lifecycle management policy on each Cloud Storage bucket allows you to automatically transition objects to lower-cost storage classes (such as Nearline, Coldline, or Archive) based on their age or other criteria. Additionally, the policy can automate the removal of objects once they are no longer needed, ensuring compliance with retention rules and optimizing storage costs. This approach aligns well with well-defined data tiering and retention needs, providing cost efficiency and automation.

asked 13/02/2025
Dipuo Ngwenya
33 questions

Question 27

Report Export Collapse

You are using your own data to demonstrate the capabilities of BigQuery to your organization's leadership team. You need to perform a one- time load of the files stored on your local machine into BigQuery using as little effort as possible. What should you do?

Write and execute a Python script using the BigQuery Storage Write API library.

Write and execute a Python script using the BigQuery Storage Write API library.

Create a Dataproc cluster, copy the files to Cloud Storage, and write an Apache Spark job using the spark-bigquery-connector.

Create a Dataproc cluster, copy the files to Cloud Storage, and write an Apache Spark job using the spark-bigquery-connector.

Execute the bq load command on your local machine.

Execute the bq load command on your local machine.

Create a Dataflow job using the Apache Beam FileIO and BigQueryIO connectors with a local runner.

Create a Dataflow job using the Apache Beam FileIO and BigQueryIO connectors with a local runner.

Suggested answer: C
Explanation:

Using the bq load command is the simplest and most efficient way to perform a one-time load of files from your local machine into BigQuery. This command-line tool is easy to use, requires minimal setup, and supports direct uploads from local files to BigQuery tables. It meets the requirement for minimal effort while allowing you to quickly demonstrate BigQuery's capabilities to your organization's leadership team.

asked 13/02/2025
Richard Banks
52 questions

Question 28

Report Export Collapse

Your organization uses Dataflow pipelines to process real-time financial transactions. You discover that one of your Dataflow jobs has failed. You need to troubleshoot the issue as quickly as possible. What should you do?

Set up a Cloud Monitoring dashboard to track key Dataflow metrics, such as data throughput, error rates, and resource utilization.

Set up a Cloud Monitoring dashboard to track key Dataflow metrics, such as data throughput, error rates, and resource utilization.

Create a custom script to periodically poll the Dataflow API for job status updates, and send email alerts if any errors are identified.

Create a custom script to periodically poll the Dataflow API for job status updates, and send email alerts if any errors are identified.

Navigate to the Dataflow Jobs page in the Google Cloud console. Use the job logs and worker logs to identify the error.

Navigate to the Dataflow Jobs page in the Google Cloud console. Use the job logs and worker logs to identify the error.

Use the gcloud CLI tool to retrieve job metrics and logs, and analyze them for errors and performance bottlenecks.

Use the gcloud CLI tool to retrieve job metrics and logs, and analyze them for errors and performance bottlenecks.

Suggested answer: C
Explanation:

To troubleshoot a failed Dataflow job as quickly as possible, you should navigate to the Dataflow Jobs page in the Google Cloud console. The console provides access to detailed job logs and worker logs, which can help you identify the cause of the failure. The graphical interface also allows you to visualize pipeline stages, monitor performance metrics, and pinpoint where the error occurred, making it the most efficient way to diagnose and resolve the issue promptly.

asked 13/02/2025
David Kimovec
42 questions

Question 29

Report Export Collapse

Your company uses Looker to generate and share reports with various stakeholders. You have a complex dashboard with several visualizations that needs to be delivered to specific stakeholders on a recurring basis, with customized filters applied for each recipient. You need an efficient and scalable solution to automate the delivery of this customized dashboard. You want to follow the Google-recommended approach. What should you do?

Create a separate LookML model for each stakeholder with predefined filters, and schedule the dashboards using the Looker Scheduler.

Create a separate LookML model for each stakeholder with predefined filters, and schedule the dashboards using the Looker Scheduler.

Create a script using the Looker Python SDK, and configure user attribute filter values. Generate a new scheduled plan for each stakeholder.

Create a script using the Looker Python SDK, and configure user attribute filter values. Generate a new scheduled plan for each stakeholder.

Embed the Looker dashboard in a custom web application, and use the application's scheduling features to send the report with personalized filters.

Embed the Looker dashboard in a custom web application, and use the application's scheduling features to send the report with personalized filters.

Use the Looker Scheduler with a user attribute filter on the dashboard, and send the dashboard with personalized filters to each stakeholder based on their attributes.

Use the Looker Scheduler with a user attribute filter on the dashboard, and send the dashboard with personalized filters to each stakeholder based on their attributes.

Suggested answer: D
Explanation:

Using the Looker Scheduler with user attribute filters is the Google-recommended approach to efficiently automate the delivery of a customized dashboard. User attribute filters allow you to dynamically customize the dashboard's content based on the recipient's attributes, ensuring each stakeholder sees data relevant to them. This approach is scalable, does not require creating separate models or custom scripts, and leverages Looker's built-in functionality to automate recurring deliveries effectively.

asked 13/02/2025
Marc Aurele ALLOTCHENOU
40 questions

Question 30

Report Export Collapse

You are predicting customer churn for a subscription-based service. You have a 50 PB historical customer dataset in BigQuery that includes demographics, subscription information, and engagement metrics. You want to build a churn prediction model with minimal overhead. You want to follow the Google-recommended approach. What should you do?

Export the data from BigQuery to a local machine. Use scikit-learn in a Jupyter notebook to build the churn prediction model.

Export the data from BigQuery to a local machine. Use scikit-learn in a Jupyter notebook to build the churn prediction model.

Use Dataproc to create a Spark cluster. Use the Spark MLlib within the cluster to build the churn prediction model.

Use Dataproc to create a Spark cluster. Use the Spark MLlib within the cluster to build the churn prediction model.

Create a Looker dashboard that is connected to BigQuery. Use LookML to predict churn.

Create a Looker dashboard that is connected to BigQuery. Use LookML to predict churn.

Use the BigQuery Python client library in a Jupyter notebook to query and preprocess the data in BigQuery. Use the CREATE MODEL statement in BigQueryML to train the churn prediction model.

Use the BigQuery Python client library in a Jupyter notebook to query and preprocess the data in BigQuery. Use the CREATE MODEL statement in BigQueryML to train the churn prediction model.

Suggested answer: D
Explanation:

Using the BigQuery Python client library to query and preprocess data directly in BigQuery and then leveraging BigQueryML to train the churn prediction model is the Google-recommended approach for this scenario. BigQueryML allows you to build machine learning models directly within BigQuery using SQL, eliminating the need to export data or manage additional infrastructure. This minimizes overhead, scales effectively for a dataset as large as 50 PB, and simplifies the end-to-end process of building and training the churn prediction model.

asked 13/02/2025
Joseph Lewis
52 questions
Total 72 questions
Go to page: of 8
Search

Related questions