ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 173 - DAS-C01 discussion

Report
Export

A gaming company is building a serverless data lake. The company is ingesting streaming data into Amazon Kinesis Data Streams and is writing the data to Amazon S3 through Amazon Kinesis Data Firehose. The company is using 10 MB as the S3 buffer size and is using 90 seconds as the buffer interval. The company runs an AWS Glue ET L job to merge and transform the data to a different format before writing the data back to Amazon S3.

Recently, the company has experienced substantial growth in its data volume. The AWS Glue ETL jobs are frequently showing an OutOfMemoryError error.

Which solutions will resolve this issue without incurring additional costs? (Select TWO.)

A.
Place the small files into one S3 folder. Define one single table for the small S3 files in AWS Glue Data Catalog. Rerun the AWS Glue ET L jobs against this AWS Glue table.
Answers
A.
Place the small files into one S3 folder. Define one single table for the small S3 files in AWS Glue Data Catalog. Rerun the AWS Glue ET L jobs against this AWS Glue table.
B.
Create an AWS Lambda function to merge small S3 files and invoke them periodically. Run the AWS Glue ETL jobs after successful completion of the Lambda function.
Answers
B.
Create an AWS Lambda function to merge small S3 files and invoke them periodically. Run the AWS Glue ETL jobs after successful completion of the Lambda function.
C.
Run the S3DistCp utility in Amazon EMR to merge a large number of small S3 files before running the AWS Glue ETL jobs.
Answers
C.
Run the S3DistCp utility in Amazon EMR to merge a large number of small S3 files before running the AWS Glue ETL jobs.
D.
Use the groupFiIes setting in the AWS Glue ET L job to merge small S3 files and rerun AWS Glue E TL jobs.
Answers
D.
Use the groupFiIes setting in the AWS Glue ET L job to merge small S3 files and rerun AWS Glue E TL jobs.
E.
Update the Kinesis Data Firehose S3 buffer size to 128 MB. Update the buffer interval to 900 seconds.
Answers
E.
Update the Kinesis Data Firehose S3 buffer size to 128 MB. Update the buffer interval to 900 seconds.
Suggested answer: A, D

Explanation:

The groupFiles setting is a feature of AWS Glue that enables an ETL job to group files when they are read from an Amazon S3 data store.This can reduce the number of ETL tasks and in-memory partitions, and improve the performance and memory efficiency of the job1. By using the groupFiles setting in the AWS Glue ETL job, the gaming company can merge small S3 files and avoid the OutOfMemoryError error.

The Kinesis Data Firehose S3 buffer size and buffer interval are parameters that determine how much data is buffered before delivering it to Amazon S3.Increasing the buffer size and buffer interval can result in larger files being delivered to Amazon S3, which can reduce the number of small files and improve the performance of downstream processing2. By updating the Kinesis Data Firehose S3 buffer size to 128 MB and buffer interval to 900 seconds, the gaming company can create fewer, larger S3 files and avoid the OutOfMemoryError error.

asked 16/09/2024
Kishen Morar
45 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first