ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 124 - DEA-C01 discussion

Report
Export

A company needs to load customer data that comes from a third party into an Amazon Redshift data warehouse. The company stores order data and product data in the same data warehouse. The company wants to use the combined dataset to identify potential new customers.

A data engineer notices that one of the fields in the source data includes values that are in JSON format.

How should the data engineer load the JSON data into the data warehouse with the LEAST effort?

A.

Use the SUPER data type to store the data in the Amazon Redshift table.

Answers
A.

Use the SUPER data type to store the data in the Amazon Redshift table.

B.

Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table.

Answers
B.

Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table.

C.

Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data.

Answers
C.

Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data.

D.

Use an AWS Lambda function to flatten the JSON data. Store the data in Amazon S3.

Answers
D.

Use an AWS Lambda function to flatten the JSON data. Store the data in Amazon S3.

Suggested answer: A

Explanation:

In Amazon Redshift, the SUPER data type is designed specifically to handle semi-structured data like JSON, Parquet, ORC, and others. By using the SUPER data type, Redshift can ingest and query JSON data without requiring complex data flattening processes, thus reducing the amount of preprocessing required before loading the data. The SUPER data type also works seamlessly with Redshift Spectrum, enabling complex queries that can combine both structured and semi-structured datasets, which aligns with the company's need to use combined datasets to identify potential new customers.

Using the SUPER data type also allows for automatic parsing and query processing of nested data structures through Amazon Redshift's PARTITION BY and JSONPATH expressions, which makes this option the most efficient approach with the least effort involved. This reduces the overhead associated with using tools like AWS Glue or Lambda for data transformation.

Amazon Redshift Documentation - SUPER Data Type

AWS Certified Data Engineer - Associate Training: Building Batch Data Analytics Solutions on AWS

AWS Certified Data Engineer - Associate Study Guide

By directly leveraging the capabilities of Redshift with the SUPER data type, the data engineer ensures streamlined JSON ingestion with minimal effort while maintaining query efficiency.

asked 29/10/2024
Francisco Julian Mota Fraile
41 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first