ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 109 - DAS-C01 discussion

Report
Export

A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake.

The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.

The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day. How should this data be stored for optimal performance?

A.
In Apache ORC partitioned by date and sorted by source IP
Answers
A.
In Apache ORC partitioned by date and sorted by source IP
B.
In compressed .csv partitioned by date and sorted by source IP
Answers
B.
In compressed .csv partitioned by date and sorted by source IP
C.
In Apache Parquet partitioned by source IP and sorted by date
Answers
C.
In Apache Parquet partitioned by source IP and sorted by date
D.
In compressed nested JSON partitioned by source IP and sorted by date
Answers
D.
In compressed nested JSON partitioned by source IP and sorted by date
Suggested answer: D
asked 16/09/2024
Ksu doo Makek
45 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first