ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 6

Question list
Search
Search

Related questions











Your company is loading comma-separated values (CSV) files into Google BigQuery. The data is fully imported successfully; however, the imported data is not matching byte-to-byte to the source file.

What is the most likely cause of this problem?

A.
The CSV data loaded in BigQuery is not flagged as CSV.
A.
The CSV data loaded in BigQuery is not flagged as CSV.
Answers
B.
The CSV data has invalid rows that were skipped on import.
B.
The CSV data has invalid rows that were skipped on import.
Answers
C.
The CSV data loaded in BigQuery is not using BigQuery's default encoding.
C.
The CSV data loaded in BigQuery is not using BigQuery's default encoding.
Answers
D.
The CSV data has not gone through an ETL phase before loading into BigQuery.
D.
The CSV data has not gone through an ETL phase before loading into BigQuery.
Answers
Suggested answer: B

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low.

You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)

A.
Introduce data compression for each file to increase the rate file of file transfer.
A.
Introduce data compression for each file to increase the rate file of file transfer.
Answers
B.
Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
B.
Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
Answers
C.
Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
C.
Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
Answers
D.
Assemble 1,000 files into a tape archive (TAR) file. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.
D.
Assemble 1,000 files into a tape archive (TAR) file. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.
Answers
E.
Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.
E.
Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.
Answers
Suggested answer: C, E

You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of- Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required.

You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)

A.
Redis
A.
Redis
Answers
B.
HBase
B.
HBase
Answers
C.
MySQL
C.
MySQL
Answers
D.
MongoDB
D.
MongoDB
Answers
E.
Cassandra
E.
Cassandra
Answers
F.
HDFS with Hive
F.
HDFS with Hive
Answers
Suggested answer: B, D, F

Explanation:

Topic 5, Practice Questions

Suppose you have a table that includes a nested column called "city" inside a column called "person", but when you try to submit the following query in BigQuery, it gives you an error.

SELECT person FROM `project1.example.table1` WHERE city = "London"

How would you correct the error?

A.
Add ", UNNEST(person)" before the WHERE clause.
A.
Add ", UNNEST(person)" before the WHERE clause.
Answers
B.
Change "person" to "person.city".
B.
Change "person" to "person.city".
Answers
C.
Change "person" to "city.person".
C.
Change "person" to "city.person".
Answers
D.
Add ", UNNEST(city)" before the WHERE clause.
D.
Add ", UNNEST(city)" before the WHERE clause.
Answers
Suggested answer: A

Explanation:

To access the person.city column, you need to "UNNEST(person)" and JOIN it to table1 using a comma.

Reference:

https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacysql#nested_repeated_results

What are two of the benefits of using denormalized data structures in BigQuery?

A.
Reduces the amount of data processed, reduces the amount of storage required
A.
Reduces the amount of data processed, reduces the amount of storage required
Answers
B.
Increases query speed, makes queries simpler
B.
Increases query speed, makes queries simpler
Answers
C.
Reduces the amount of storage required, increases query speed
C.
Reduces the amount of storage required, increases query speed
Answers
D.
Reduces the amount of data processed, increases query speed
D.
Reduces the amount of data processed, increases query speed
Answers
Suggested answer: B

Explanation:

Denormalization increases query speed for tables with billions of rows because BigQuery's performance degrades when doing JOINs on large tables, but with a denormalized data structure, you don't have to use JOINs, since all of the data has been combined into one table.

Denormalization also makes queries simpler because you do not have to use JOIN clauses.

Denormalization increases the amount of data processed and the amount of storage required because it creates redundant data.

Reference:

https://cloud.google.com/solutions/bigquery-data-warehouse#denormalizing_data

Which of these statements about exporting data from BigQuery is false?

A.
To export more than 1 GB of data, you need to put a wildcard in the destination filename.
A.
To export more than 1 GB of data, you need to put a wildcard in the destination filename.
Answers
B.
The only supported export destination is Google Cloud Storage.
B.
The only supported export destination is Google Cloud Storage.
Answers
C.
Data can only be exported in JSON or Avro format.
C.
Data can only be exported in JSON or Avro format.
Answers
D.
The only compression option available is GZIP.
D.
The only compression option available is GZIP.
Answers
Suggested answer: C

Explanation:

Data can be exported in CSV, JSON, or Avro format. If you are exporting nested or repeated data, then CSV format is not supported.

Reference: https://cloud.google.com/bigquery/docs/exporting-data

What are all of the BigQuery operations that Google charges for?

A.
Storage, queries, and streaming inserts
A.
Storage, queries, and streaming inserts
Answers
B.
Storage, queries, and loading data from a file
B.
Storage, queries, and loading data from a file
Answers
C.
Storage, queries, and exporting data
C.
Storage, queries, and exporting data
Answers
D.
Queries and streaming inserts
D.
Queries and streaming inserts
Answers
Suggested answer: A

Explanation:

Google charges for storage, queries, and streaming inserts. Loading data from a file and exporting data are free operations.

Reference: https://cloud.google.com/bigquery/pricing

Which of the following is not possible using primitive roles?

A.
Give a user viewer access to BigQuery and owner access to Google Compute Engine instances.
A.
Give a user viewer access to BigQuery and owner access to Google Compute Engine instances.
Answers
B.
Give UserA owner access and UserB editor access for all datasets in a project.
B.
Give UserA owner access and UserB editor access for all datasets in a project.
Answers
C.
Give a user access to view all datasets in a project, but not run queries on them.
C.
Give a user access to view all datasets in a project, but not run queries on them.
Answers
D.
Give GroupA owner access and GroupB editor access for all datasets in a project.
D.
Give GroupA owner access and GroupB editor access for all datasets in a project.
Answers
Suggested answer: C

Explanation:

Primitive roles can be used to give owner, editor, or viewer access to a user or group, but they can't be used to separate data access permissions from job-running permissions.

Reference: https://cloud.google.com/bigquery/docs/access-control#primitive_iam_roles

Which of these statements about BigQuery caching is true?

A.
By default, a query's results are not cached.
A.
By default, a query's results are not cached.
Answers
B.
BigQuery caches query results for 48 hours.
B.
BigQuery caches query results for 48 hours.
Answers
C.
Query results are cached even if you specify a destination table.
C.
Query results are cached even if you specify a destination table.
Answers
D.
There is no charge for a query that retrieves its results from cache.
D.
There is no charge for a query that retrieves its results from cache.
Answers
Suggested answer: D

Explanation:

When query results are retrieved from a cached results table, you are not charged for the query.

BigQuery caches query results for 24 hours, not 48 hours.

Query results are not cached if you specify a destination table.

A query's results are always cached except under certain conditions, such as if you specify a destination table.

Reference: https://cloud.google.com/bigquery/querying-data#query-caching

Which of these sources can you not load data into BigQuery from?

A.
File upload
A.
File upload
Answers
B.
Google Drive
B.
Google Drive
Answers
C.
Google Cloud Storage
C.
Google Cloud Storage
Answers
D.
Google Cloud SQL
D.
Google Cloud SQL
Answers
Suggested answer: D

Explanation:

You can load data into BigQuery from a file upload, Google Cloud Storage, Google Drive, or Google Cloud Bigtable. It is not possible to load data into BigQuery directly from Google Cloud SQL. One way to get data from Cloud SQL to

BigQuery would be to export data from Cloud SQL to Cloud Storage and then load it from there.

Reference: https://cloud.google.com/bigquery/loading-data

Total 372 questions
Go to page: of 38