ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 38 - DEA-C01 discussion

Report
Export

A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to perform big data analysis. The company requires high reliability. A big data team must follow best practices for running cost-optimized and long-running workloads on Amazon EMR. The team must find a solution that will maintain the company's current level of performance.

Which combination of resources will meet these requirements MOST cost-effectively? (Choose two.)

A.

Use Hadoop Distributed File System (HDFS) as a persistent data store.

Answers
A.

Use Hadoop Distributed File System (HDFS) as a persistent data store.

B.

Use Amazon S3 as a persistent data store.

Answers
B.

Use Amazon S3 as a persistent data store.

C.

Use x86-based instances for core nodes and task nodes.

Answers
C.

Use x86-based instances for core nodes and task nodes.

D.

Use Graviton instances for core nodes and task nodes.

Answers
D.

Use Graviton instances for core nodes and task nodes.

E.

Use Spot Instances for all primary nodes.

Answers
E.

Use Spot Instances for all primary nodes.

Suggested answer: B, D

Explanation:

The best combination of resources to meet the requirements of high reliability, cost-optimization, and performance for running Apache Spark jobs on Amazon EMR is to use Amazon S3 as a persistent data store and Graviton instances for core nodes and task nodes.

Amazon S3 is a highly durable, scalable, and secure object storage service that can store any amount of data for a variety of use cases, including big data analytics1. Amazon S3 is a better choice than HDFS as a persistent data store for Amazon EMR, as it decouples the storage from the compute layer, allowing for more flexibility and cost-efficiency.Amazon S3 also supports data encryption, versioning, lifecycle management, and cross-region replication1.Amazon EMR integrates seamlessly with Amazon S3, using EMR File System (EMRFS) to access data stored in Amazon S3 buckets2.EMRFS also supports consistent view, which enables Amazon EMR to provide read-after-write consistency for Amazon S3 objects that are accessed through EMRFS2.

Graviton instances are powered by Arm-based AWS Graviton2 processors that deliver up to 40% better price performance over comparable current generation x86-based instances3.Graviton instances are ideal for running workloads that are CPU-bound, memory-bound, or network-bound, such as big data analytics, web servers, and open-source databases3. Graviton instances are compatible with Amazon EMR, and can be used for both core nodes and task nodes. Core nodes are responsible for running the data processing frameworks, such as Apache Spark, and storing data in HDFS or the local file system. Task nodes are optional nodes that can be added to a cluster to increase the processing power and throughput. By using Graviton instances for both core nodes and task nodes, you can achieve higher performance and lower cost than using x86-based instances.

Using Spot Instances for all primary nodes is not a good option, as it can compromise the reliability and availability of the cluster. Spot Instances are spare EC2 instances that are available at up to 90% discount compared to On-Demand prices, but they can be interrupted by EC2 with a two-minute notice when EC2 needs the capacity back. Primary nodes are the nodes that run the cluster software, such as Hadoop, Spark, Hive, and Hue, and are essential for the cluster operation. If a primary node is interrupted by EC2, the cluster will fail or become unstable. Therefore, it is recommended to use On-Demand Instances or Reserved Instances for primary nodes, and use Spot Instances only for task nodes that can tolerate interruptions.Reference:

Amazon S3 - Cloud Object Storage

EMR File System (EMRFS)

AWS Graviton2 Processor-Powered Amazon EC2 Instances

[Plan and Configure EC2 Instances]

[Amazon EC2 Spot Instances]

[Best Practices for Amazon EMR]

asked 29/10/2024
Ehsan Ali
40 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first