ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 774 - SAA-C03 discussion

Report
Export

A marketing company receives a large amount of new clickstream data in Amazon S3 from a marketing campaign The company needs to analyze the clickstream data in Amazon S3 quickly. Then the company needs to determine whether to process the data further in the data pipeline.

Which solution will meet these requirements with the LEAST operational overhead?

A.
Create external tables in a Spark catalog Configure jobs in AWS Glue to query the data
Answers
A.
Create external tables in a Spark catalog Configure jobs in AWS Glue to query the data
B.
Configure an AWS Glue crawler to crawl the data. Configure Amazon Athena to query the data.
Answers
B.
Configure an AWS Glue crawler to crawl the data. Configure Amazon Athena to query the data.
C.
Create external tables in a Hive metastore. Configure Spark jobs in Amazon EMR to query the data.
Answers
C.
Create external tables in a Hive metastore. Configure Spark jobs in Amazon EMR to query the data.
D.
Configure an AWS Glue crawler to crawl the data. Configure Amazon Kinesis Data Analytics to use SQL to query the data
Answers
D.
Configure an AWS Glue crawler to crawl the data. Configure Amazon Kinesis Data Analytics to use SQL to query the data
Suggested answer: B

Explanation:

AWS Glue Crawler: AWS Glue is a fully managed ETL (Extract, Transform, Load) service that makes it easy to prepare and load data for analytics. A Glue crawler can automatically discover new data and schema in Amazon S3, making it easy to keep the data catalog up-to-date.

Crawling the Data:

Set up an AWS Glue crawler to scan the S3 bucket containing the clickstream data.

The crawler will automatically detect the schema and create/update the tables in the AWS Glue Data Catalog.

Amazon Athena:

Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

Once the data catalog is updated by the Glue crawler, use Athena to query the clickstream data directly in S3.

Operational Efficiency: This solution leverages fully managed services, reducing operational overhead. Glue crawlers automate data cataloging, and Athena provides a serverless, pay-per-query model for quick data analysis without the need to set up or manage infrastructure.

AWS Glue

Amazon Athena

asked 16/09/2024
Maurice Melgert
38 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first