ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 7

Question list
Search
Search

Related questions











Which of the following statements about Legacy SQL and Standard SQL is not true?

A.
Standard SQL is the preferred query language for BigQuery.
A.
Standard SQL is the preferred query language for BigQuery.
Answers
B.
If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
B.
If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
Answers
C.
One difference between the two query languages is how you specify fully-qualified table names (i.e. table names that include their associated project name).
C.
One difference between the two query languages is how you specify fully-qualified table names (i.e. table names that include their associated project name).
Answers
D.
You need to set a query language for each dataset and the default is Standard SQL.
D.
You need to set a query language for each dataset and the default is Standard SQL.
Answers
Suggested answer: D

Explanation:

You do not set a query language for each dataset. It is set each time you run a query and the default query language is Legacy SQL.

Standard SQL has been the preferred query language since BigQuery 2.0 was released.

In legacy SQL, to query a table with a project-qualified name, you use a colon, :, as a separator. In standard SQL, you use a period, ., instead.

Due to the differences in syntax between the two query languages (such as with project-qualified table names), if you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.

Reference:

https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql

How would you query specific partitions in a BigQuery table?

A.
Use the DAY column in the WHERE clause
A.
Use the DAY column in the WHERE clause
Answers
B.
Use the EXTRACT(DAY) clause
B.
Use the EXTRACT(DAY) clause
Answers
C.
Use the __PARTITIONTIME pseudo-column in the WHERE clause
C.
Use the __PARTITIONTIME pseudo-column in the WHERE clause
Answers
D.
Use DATE BETWEEN in the WHERE clause
D.
Use DATE BETWEEN in the WHERE clause
Answers
Suggested answer: C

Explanation:

Partitioned tables include a pseudo column named _PARTITIONTIME that contains a date-based timestamp for data loaded into the table. To limit a query to particular partitions (such as Jan 1st and 2nd of 2017), use a clause similar to this:

WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-01-01') AND TIMESTAMP('2017-01-02')

Reference: https://cloud.google.com/bigquery/docs/partitionedtables#the_partitiontime_pseudo_column

Which SQL keyword can be used to reduce the number of columns processed by BigQuery?

A.
BETWEEN
A.
BETWEEN
Answers
B.
WHERE
B.
WHERE
Answers
C.
SELECT
C.
SELECT
Answers
D.
LIMIT
D.
LIMIT
Answers
Suggested answer: C

Explanation:

SELECT allows you to query specific columns rather than the whole table.

LIMIT, BETWEEN, and WHERE clauses will not reduce the number of columns processed by BigQuery.

Reference: https://cloud.google.com/bigquery/launchchecklist#architecture_design_and_development_checklist

To give a user read permission for only the first three columns of a table, which access control method would you use?

A.
Primitive role
A.
Primitive role
Answers
B.
Predefined role
B.
Predefined role
Answers
C.
Authorized view
C.
Authorized view
Answers
D.
It's not possible to give access to only the first three columns of a table.
D.
It's not possible to give access to only the first three columns of a table.
Answers
Suggested answer: C

Explanation:

An authorized view allows you to share query results with particular users and groups without giving them read access to the underlying tables. Authorized views can only be created in a dataset that does not contain the tables queried by the view.

When you create an authorized view, you use the view's SQL query to restrict access to only the rows and columns you want the users to see.

Reference: https://cloud.google.com/bigquery/docs/views#authorized-views

What are two methods that can be used to denormalize tables in BigQuery?

A.
1) Split table into multiple tables; 2) Use a partitioned table
A.
1) Split table into multiple tables; 2) Use a partitioned table
Answers
B.
1) Join tables into one table; 2) Use nested repeated fields
B.
1) Join tables into one table; 2) Use nested repeated fields
Answers
C.
1) Use a partitioned table; 2) Join tables into one table
C.
1) Use a partitioned table; 2) Join tables into one table
Answers
D.
1) Use nested repeated fields; 2) Use a partitioned table
D.
1) Use nested repeated fields; 2) Use a partitioned table
Answers
Suggested answer: B

Explanation:

The conventional method of denormalizing data involves simply writing a fact, along with all its dimensions, into a flat table structure. For example, if you are dealing with sales transactions, you would write each individual fact to a record, along with the accompanying dimensions such as order and customer information.

The other method for denormalizing data takes advantage of BigQuery's native support for nested and repeated structures in JSON or Avro input data. Expressing records using nested and repeated structures can provide a more natural representation of the underlying data. In the case of the sales order, the outer part of a JSON structure would contain the order and customer information, and the inner part of the structure would contain the individual line items of the order, which would be represented as nested, repeated elements.

Reference: https://cloud.google.com/solutions/bigquery-data-warehouse#denormalizing_data

Which of these is not a supported method of putting data into a partitioned table?

A.
If you have existing data in a separate file for each day, then create a partitioned table and upload each file into the appropriate partition.
A.
If you have existing data in a separate file for each day, then create a partitioned table and upload each file into the appropriate partition.
Answers
B.
Run a query to get the records for a specific day from an existing table and for the destination table, specify a partitioned table ending with the day in the format "$YYYYMMDD".
B.
Run a query to get the records for a specific day from an existing table and for the destination table, specify a partitioned table ending with the day in the format "$YYYYMMDD".
Answers
C.
Create a partitioned table and stream new records to it every day.
C.
Create a partitioned table and stream new records to it every day.
Answers
D.
Use ORDER BY to put a table's rows into chronological order and then change the table's type to "Partitioned".
D.
Use ORDER BY to put a table's rows into chronological order and then change the table's type to "Partitioned".
Answers
Suggested answer: D

Explanation:

You cannot change an existing table into a partitioned table. You must create a partitioned table from scratch. Then you can either stream data into it every day and the data will automatically be put in the right partition, or you can load data into a specific partition by using "$YYYYMMDD" at the end of the table name.

Reference: https://cloud.google.com/bigquery/docs/partitioned-tables

Which of these operations can you perform from the BigQuery Web UI?

A.
Upload a file in SQL format.
A.
Upload a file in SQL format.
Answers
B.
Load data with nested and repeated fields.
B.
Load data with nested and repeated fields.
Answers
C.
Upload a 20 MB file.
C.
Upload a 20 MB file.
Answers
D.
Upload multiple files using a wildcard.
D.
Upload multiple files using a wildcard.
Answers
Suggested answer: B

Explanation:

You can load data with nested and repeated fields using the Web UI.

You cannot use the Web UI to:

- Upload a file greater than 10 MB in size

- Upload multiple files at the same time

- Upload a file in SQL format

All three of the above operations can be performed using the "bq" command.

Reference: https://cloud.google.com/bigquery/loading-data

Which methods can be used to reduce the number of rows processed by BigQuery?

A.
Splitting tables into multiple tables; putting data in partitions
A.
Splitting tables into multiple tables; putting data in partitions
Answers
B.
Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause
B.
Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause
Answers
C.
Putting data in partitions; using the LIMIT clause
C.
Putting data in partitions; using the LIMIT clause
Answers
D.
Splitting tables into multiple tables; using the LIMIT clause
D.
Splitting tables into multiple tables; using the LIMIT clause
Answers
Suggested answer: A

Explanation:

If you split a table into multiple tables (such as one table for each day), then you can limit your query to the data in specific tables (such as for particular days). A better method is to use a partitioned table, as long as your data can be separated by the day.

If you use the LIMIT clause, BigQuery will still process the entire table.

Reference: https://cloud.google.com/bigquery/docs/partitioned-tables

Why do you need to split a machine learning dataset into training data and test data?

A.
So you can try two different sets of features
A.
So you can try two different sets of features
Answers
B.
To make sure your model is generalized for more than just the training data
B.
To make sure your model is generalized for more than just the training data
Answers
C.
To allow you to create unit tests in your code
C.
To allow you to create unit tests in your code
Answers
D.
So you can use one dataset for a wide model and one for a deep model
D.
So you can use one dataset for a wide model and one for a deep model
Answers
Suggested answer: B

Explanation:

The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called overfitting.

Reference: https://machinelearningmastery.com/a-simple-intuition-for-overfitting/

Which of these numbers are adjusted by a neural network as it learns from a training dataset (select 2 answers)?

A.
Weights
A.
Weights
Answers
B.
Biases
B.
Biases
Answers
C.
Continuous features
C.
Continuous features
Answers
D.
Input values
D.
Input values
Answers
Suggested answer: A, B

Explanation:

A neural network is a simple mechanism that's implemented with basic math. The only difference between the traditional programming model and a neural network is that you let the computer determine the parameters (weights and bias) by learning from training datasets.

Reference: https://cloud.google.com/blog/big-data/2016/07/understanding-neural-networks-withtensorflow-playground

Total 372 questions
Go to page: of 38