ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 30 - Professional Machine Learning Engineer discussion

Report
Export

You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?

A.
Normalize the data using Google Kubernetes Engine
Answers
A.
Normalize the data using Google Kubernetes Engine
B.
Translate the normalization algorithm into SQL for use with BigQuery
Answers
B.
Translate the normalization algorithm into SQL for use with BigQuery
C.
Use the normalizer_fn argument in TensorFlow's Feature Column API
Answers
C.
Use the normalizer_fn argument in TensorFlow's Feature Column API
D.
Normalize the data with Apache Spark using the Dataproc connector for BigQuery
Answers
D.
Normalize the data with Apache Spark using the Dataproc connector for BigQuery
Suggested answer: B

Explanation:

Z-score normalization is a technique that transforms the values of a numeric variable into standardized units, such that the mean is zero and the standard deviation is one. Z-score normalization can help to compare variables with different scales and ranges, and to reduce the effect of outliers and skewness. The formula for z-score normalization is:

z = (x - mu) / sigma

where x is the original value, mu is the mean of the variable, and sigma is the standard deviation of the variable.

Dataflow is a service that allows you to create and run data processing pipelines on Google Cloud. You can use Dataflow to preprocess raw data prior to model training and prediction, such as applying z-score normalization on data stored in BigQuery. However, using Dataflow for this task may not be the most efficient option, as it involves reading and writing data from and to BigQuery, which can be time-consuming and costly. Moreover, using Dataflow requires manual intervention to update the pipeline whenever new training data is added.

A more efficient way to perform z-score normalization on data stored in BigQuery is to translate the normalization algorithm into SQL and use it with BigQuery. BigQuery is a service that allows you to analyze large-scale and complex data using SQL queries. You can use BigQuery to perform z-score normalization on your data using SQL functions such as AVG(), STDDEV_POP(), and OVER(). For example, the following SQL query can normalize the values of a column called temperature in a table called weather:

SELECT (temperature - AVG(temperature) OVER ()) / STDDEV_POP(temperature) OVER () AS normalized_temperature FROM weather;

By using SQL to perform z-score normalization on BigQuery, you can make the process more efficient by minimizing computation time and manual intervention. You can also leverage the scalability and performance of BigQuery to handle large and complex datasets. Therefore, translating the normalization algorithm into SQL for use with BigQuery is the best option for this use case.

asked 18/09/2024
Jeffrey Sammaritano
33 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first