ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 841 - SAA-C03 discussion

Report
Export

A company has a large data workload that runs for 6 hours each day. The company cannot lose any data while the process is running. A solutions architect is designing an Amazon EMR cluster configuration to support this critical data workload.

Which solution will meet these requirements MOST cost-effectively?

A.

Configure a long-running cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.

Answers
A.

Configure a long-running cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.

B.

Configure a transient cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.

Answers
B.

Configure a transient cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.

C.

Configure a transient cluster that runs the primary node on an On-Demand Instance and the core nodes and task nodes on Spot Instances.

Answers
C.

Configure a transient cluster that runs the primary node on an On-Demand Instance and the core nodes and task nodes on Spot Instances.

D.

Configure a long-running cluster that runs the primary node on an On-Demand Instance, the core nodes on Spot Instances, and the task nodes on Spot Instances.

Answers
D.

Configure a long-running cluster that runs the primary node on an On-Demand Instance, the core nodes on Spot Instances, and the task nodes on Spot Instances.

Suggested answer: B

Explanation:

For cost-effectiveness and high availability in Amazon EMR workloads, the best approach is to configure a transient cluster (which runs for the duration of the job and then terminates) with On-Demand Instances for the primary and core nodes, and Spot Instances for the task nodes. Here's why:

Primary and core nodes on On-Demand Instances: These nodes are critical because they manage the cluster and store data on HDFS. Running them on On-Demand Instances ensures stability and that no data is lost, as Spot Instances can be interrupted.

Task nodes on Spot Instances: Task nodes handle additional processing and can be used with Spot Instances to reduce costs. Spot Instances are much cheaper but can be interrupted, which is fine for non-critical tasks as the framework can handle retries.

A transient cluster is more cost-effective than a long-running cluster for workloads that only run for 6 hours a day. Transient clusters automatically terminate after the workload completes, saving costs by not keeping the cluster running when it's not needed.

Option A: A long-running cluster may result in unnecessary costs when the cluster isn't being used.

Option C: Running core nodes on Spot Instances risks data loss if the Spot Instances are interrupted, violating the requirement for zero data loss.

Option D: Running both core and task nodes on Spot Instances is highly risky for data-critical workloads.

AWS

Reference:

Amazon EMR Cluster Management

Using Spot Instances in EMR

asked 27/10/2024
Andrew Vogel
40 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first