ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 240 - DP-203 discussion

Report
Export

You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload. You need to recommend a format for the transformed files. The solution must meet the following requirements:

Contain information about the data types of each column in the files. Support querying a subset of columns in the files.

Support read-heavy analytical workloads.

Minimize the file size.

What should you recommend?

A.
JSON
Answers
A.
JSON
B.
CSV
Answers
B.
CSV
C.
Apache Avro
Answers
C.
Apache Avro
D.
Apache Parquet
Answers
D.
Apache Parquet
Suggested answer: D

Explanation:

Parquet, an open-source file format for Hadoop, stores nested data structures in a flat columnar format. Compared to a traditional approach where data is stored in a row-oriented approach, Parquet file format is more efficient in terms of storage and performance. It is especially good for queries that read particular columns from a “wide” (with many columns) table since only needed columns are read, and IO is minimized.

Reference: https://www.clairvoyant.ai/blog/big-data-file-formats

asked 02/10/2024
Vladimir Kiseliov
37 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first