You developed a BigQuery ML linear regressor model by using a training dataset stored in a BigQuery table. New data is added to the table every minute. You are using Cloud Scheduler and Vertex Al Pipelines to automate hourly model training, and use the model for direct inference. The feature preprocessing logic includes quantile bucketization and MinMax scaling on data received in the last hour. You want to minimize storage and computational overhead. What should you do?

Question

Michael Bodine · Accepted Answer

Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics.

Michael Bodine · Answer

Create a component in the Vertex Al Pipelines directed acyclic graph (DAG) to calculate the required statistics, and pass the statistics on to subsequent components.

Michael Bodine · Answer

Preprocess and stage the data in BigQuery prior to feeding it to the model during training and inference.

Michael Bodine · Answer

Create SQL queries to calculate and store the required statistics in separate BigQuery tables that are referenced in the CREATE MODEL statement.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 248 - Professional Machine Learning Engineer discussion

Suggested answer: D

0 comments