ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 688 - SAA-C03 discussion

Report
Export

A company has an Amazon S3 data lake The company needs a solution that transforms the data from the data lake and loads the data into a data warehouse every day The data warehouse must have massively parallel processing (MPP) capabilities.

Data analysts then need to create and train machine learning (ML) models by using SQL commands on the data The solution must use serverless AWS services wherever possible

Which solution will meet these requirements?

A.
Run a daily Amazon EMR job to transform the data and load the data into Amazon Redshift Use Amazon Redshift ML to create and train the ML models
Answers
A.
Run a daily Amazon EMR job to transform the data and load the data into Amazon Redshift Use Amazon Redshift ML to create and train the ML models
B.
Run a daily Amazon EMR job to transform the data and load the data into Amazon Aurora Serverless Use Amazon Aurora ML to create and train the ML models
Answers
B.
Run a daily Amazon EMR job to transform the data and load the data into Amazon Aurora Serverless Use Amazon Aurora ML to create and train the ML models
C.
Run a daily AWS Glue job to transform the data and load the data into Amazon Redshift Serverless Use Amazon Redshift ML to create and tram the ML models
Answers
C.
Run a daily AWS Glue job to transform the data and load the data into Amazon Redshift Serverless Use Amazon Redshift ML to create and tram the ML models
D.
Run a daily AWS Glue job to transform the data and load the data into Amazon Athena tables Use Amazon Athena ML to create and train the ML models
Answers
D.
Run a daily AWS Glue job to transform the data and load the data into Amazon Athena tables Use Amazon Athena ML to create and train the ML models
Suggested answer: C

Explanation:

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. AWS Glue can automatically discover your data in Amazon S3 and catalog it, so you can query and search the data using SQL. AWS Glue can also run serverless ETL jobs using Apache Spark and Python to transform and load your data into various destinations, such as Amazon Redshift, Amazon Athena, or Amazon Aurora. AWS Glue is a serverless service, so you only pay for the resources consumed by the jobs, and you don't need to provision or manage any infrastructure.

Amazon Redshift is a fully managed, petabyte-scale data warehouse service that enables you to use standard SQL and your existing business intelligence (BI) tools to analyze your data. Amazon Redshift also supports massively parallel processing (MPP), which means it can distribute and execute queries across multiple nodes in parallel, delivering fast performance and scalability. Amazon Redshift Serverless is a new option that automatically scales query compute capacity based on the queries being run, so you don't need to manage clusters or capacity. You only pay for the query processing time and the storage consumed by your data.

Amazon Redshift ML is a feature that enables you to create, train, and deploy machine learning (ML) models using familiar SQL commands. Amazon Redshift ML can automatically discover the best model and hyperparameters for your data, and store the model in Amazon SageMaker, a fully managed service that provides a comprehensive set of tools for building, training, and deploying ML models. You can then use SQL functions to apply the model to your data in Amazon Redshift and generate predictions.

The combination of AWS Glue, Amazon Redshift Serverless, and Amazon Redshift ML meets the requirements of the question, as it provides a serverless, scalable, and SQL-based solution to transform, load, and analyze the data from the Amazon S3 data lake, and to create and train ML models on the data.

Option A is not correct, because Amazon EMR is not a serverless service. Amazon EMR is a managed service that simplifies running Apache Spark, Apache Hadoop, and other big data frameworks on AWS. Amazon EMR requires you to launch and configure clusters of EC2 instances to run your ETL jobs, which adds complexity and cost compared to AWS Glue.

Option B is not correct, because Amazon Aurora Serverless is not a data warehouse service, and it does not support MPP. Amazon Aurora Serverless is an on-demand, auto-scaling configuration for Amazon Aurora, a relational database service that is compatible with MySQL and PostgreSQL. Amazon Aurora Serverless can automatically adjust the database capacity based on the traffic, but it does not distribute the data and queries across multiple nodes like Amazon Redshift does. Amazon Aurora Serverless is more suitable for transactional workloads than analytical workloads.

Option D is not correct, because Amazon Athena is not a data warehouse service, and it does not support MPP. Amazon Athena is an interactive query service that enables you to analyze data in Amazon S3 using standard SQL. Amazon Athena is serverless, so you only pay for the queries you run, and you don't need to load the data into a database. However, Amazon Athena does not store the data in a columnar format, compress the data, or optimize the query execution plan like Amazon Redshift does. Amazon Athena is more suitable for ad-hoc queries than complex analytics and ML.

AWS Glue

Amazon Redshift

Amazon Redshift Serverless

Amazon Redshift ML

Amazon EMR

Amazon Aurora Serverless

Amazon Athena

asked 16/09/2024
Kevin Boddy
31 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first