AWS DAS-C01 Practice Test – Member Shared Questions, Page 3

Question 21

A company wants to improve user satisfaction for its smart home system by adding more features to its recommendation engine. Each sensor asynchronously pushes its nested JSON data into Amazon Kinesis Data Streams using the Kinesis Producer Library (KPL) in Java. Statistics from a set of failed sensors showed that, when a sensor is malfunctioning, its recorded data is not always sent to the cloud. The company needs a solution that offers near-real-time analytics on the data from the most updated sensors. Which solution enables the company to meet these requirements?

A.

Set the RecordMaxBufferedTime property of the KPL to "-1" to disable the buffering on the sensor side. Use Kinesis Data Analytics to enrich the data based on a company-developed anomaly detection SQL script. Push the enriched datato a fleet of Kinesis data streams and enable the data transformation feature to flatten the JSON file. Instantiate a dense storage Amazon Redshift cluster and use it as the destination for the Kinesis Data Firehose delivery stream.

B.

Update the sensors code to use the PutRecord/PutRecords call from the Kinesis Data Streams API with the AWS SDK for Java. Use Kinesis Data Analytics to enrich the data based on a companydeveloped anomaly detection SQL script.Direct the output of KDA application to a Kinesis Data Firehose delivery stream, enable the data transformation feature to flatten the JSON file, and set the Kinesis Data Firehose destination to an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.

C.

Set the RecordMaxBufferedTime property of the KPL to "0" to disable the buffering on the sensor side. Connect for each stream a dedicated Kinesis Data Firehose delivery stream and enable the data transformation feature to flatten theJSON file before sending it to an Amazon S3 bucket. Load the S3 data into an Amazon Redshift cluster.

D.

Update the sensors code to use the PutRecord/PutRecords call from the Kinesis Data Streams API with the AWS SDK for Java. Use AWS Glue to fetch and process data from the stream using the Kinesis Client Library (KCL). Instantiatean Amazon Elasticsearch Service cluster and use AWS Lambda to directly push data into it.

Show Answer Comment (0)

Question 22

A company has a data warehouse in Amazon Redshift that is approximately 500 TB in size. New data is imported every few hours and read-only queries are run throughout the day and evening. There is a particularly heavy load with no writes for several hours each morning on business days. During those hours, some queries are queued and take a long time to execute. The company needs to optimize query execution and avoid any downtime. What is the MOST cost-effective solution?

A.

Enable concurrency scaling in the workload management (WLM) queue.

B.

Add more nodes using the AWS Management Console during peak hours. Set the distribution style to ALL.

C.

Use elastic resize to quickly add nodes during peak times. Remove the nodes when they are not needed.

D.

Use a snapshot, restore, and resize operation. Switch to the new target cluster.

Show Answer Comment (0)

Question 23

A data engineer is using AWS Glue ETL jobs to process data at frequent intervals. The processed data is then copied into Amazon S3. The ETL jobs run every 15 minutes. The AWS Glue Data Catalog partitions need to be updated automatically after the completion of each job.

Which solution will meet these requirements MOST cost-effectively?

A.

Use the AWS Glue Data Catalog to manage the data catalog. Define an AWS Glue workflow for the ETL process. Define a trigger within the workflow that can start the crawler when an ETL job run is complete.

B.

Use the AWS Glue Data Catalog to manage the data catalog. Use AWS Glue Studio to manage ETL jobs. Use the AWS Glue Studio feature that supports updates to the AWS Glue Data Catalog during job runs.

C.

Use an Apache Hive metastore to manage the data catalog. Update the AWS Glue ETL code to include the enableUpdateCatalog and partitionKeys arguments.

D.

Use the AWS Glue Data Catalog to manage the data catalog. Update the AWS Glue ETL code to include the enableUpdateCatalog and partitionKeys arguments.

Show Answer Comment (0)

Question 24

A global pharmaceutical company receives test results for new drugs from various testing facilities worldwide. The results are sent in millions of 1 KB-sized JSON objects to an Amazon S3 bucket owned by the company. The data engineering team needs to process those files, convert them into Apache Parquet format, and load them into Amazon Redshift for data analysts to perform dashboard reporting. The engineering team uses AWS Glue to process the objects, AWS Step Functions for process orchestration, and Amazon CloudWatch for job scheduling.

More testing facilities were recently added, and the time to process files is increasing. What will MOST efficiently decrease the data processing time?

A.

Use AWS Lambda to group the small files into larger files. Write the files back to Amazon S3. Process the files using AWS Glue and load them into Amazon Redshift tables.

B.

Use the AWS Glue dynamic frame file grouping option while ingesting the raw input files. Process the files and load them into Amazon Redshift tables.

C.

Use the Amazon Redshift COPY command to move the files from Amazon S3 into Amazon Redshift tables directly.Process the files in Amazon Redshift.

D.

Use Amazon EMR instead of AWS Glue to group the small input files. Process the files in Amazon EMR and load them into Amazon Redshift tables.

Show Answer Comment (0)

Question 25

A company has an application that uses the Amazon Kinesis Client Library (KCL) to read records from a Kinesis data stream.

After a successful marketing campaign, the application experienced a significant increase in usage. As a result, a data analyst had to split some shards in the data stream. When the shards were split, the application started throwing an ExpiredIteratorExceptions error sporadically.

What should the data analyst do to resolve this?

A.

Increase the number of threads that process the stream records.

B.

Increase the provisioned read capacity units assigned to the stream’s Amazon DynamoDB table.

C.

Increase the provisioned write capacity units assigned to the stream’s Amazon DynamoDB table.

D.

Decrease the provisioned write capacity units assigned to the stream’s Amazon DynamoDB table.

Show Answer Comment (0)

Question 26

A marketing company collects data from third-party providers and uses transient Amazon EMR clusters to process this data.

The company wants to host an Apache Hive metastore that is persistent, reliable, and can be accessed by EMR clusters and multiple AWS services and accounts simultaneously. The metastore must also be available at all times. Which solution meets these requirements with the LEAST operational overhead?

A.

Use AWS Glue Data Catalog as the metastore

B.

Use an external Amazon EC2 instance running MySQL as the metastore

C.

Use Amazon RDS for MySQL as the metastore

D.

Use Amazon S3 as the metastore

Show Answer Comment (0)

Question 27

A company stores its sales and marketing data that includes personally identifiable information (PII) in Amazon S3. The company allows its analysts to launch their own Amazon EMR cluster and run analytics reports with the data. To meet compliance requirements, the company must ensure the data is not publicly accessible throughout this process. A data engineer has secured Amazon S3 but must ensure the individual EMR clusters created by the analysts are not exposed to the public internet.

Which solution should the data engineer to meet this compliance requirement with LEAST amount of effort?

A.

Create an EMR security configuration and ensure the security configuration is associated with the EMR clusters when they are created.

B.

Check the security group of the EMR clusters regularly to ensure it does not allow inbound traffic from IPv4 0.0.0.0/0 or IPv6 ::/0.

C.

Enable the block public access setting for Amazon EMR at the account level before any EMR cluster is created.

D.

Use AWS WAF to block public internet access to the EMR clusters across the board.

Show Answer Comment (0)

Question 28

A media company is using Amazon QuickSight dashboards to visualize its national sales data. The dashboard is using a dataset with these fields: ID, date, time_zone, city, state, country, longitude, latitude, sales_volume, and number_of_items.

To modify ongoing campaigns, the company wants an interactive and intuitive visualization of which states across the country recorded a significantly lower sales volume compared to the national average.

Which addition to the company’s QuickSight dashboard will meet this requirement?

A.

A geospatial color-coded chart of sales volume data across the country.

B.

A pivot table of sales volume data summed up at the state level.

C.

A drill-down layer for state-level sales volume data.

D.

A drill through to other dashboards containing state-level sales volume data.

Show Answer Comment (0)

Question 29

A human resources company maintains a 10-node Amazon Redshift cluster to run analytics queries on the company’s data.

The Amazon Redshift cluster contains a product table and a transactions table, and both tables have a product_sku column.

The tables are over 100 GB in size. The majority of queries run on both tables.

Which distribution style should the company use for the two tables to achieve optimal query performance?

A.

An EVEN distribution style for both tables

B.

A KEY distribution style for both tables

C.

An ALL distribution style for the product table and an EVEN distribution style for the transactions table

D.

An EVEN distribution style for the product table and an KEY distribution style for the transactions table

Show Answer Comment (0)

Question 30

An energy company collects voltage data in real time from sensors that are attached to buildings. The company wants to receive notifications when a sequence of two voltage drops is detected within 10 minutes of a sudden voltage increase at the same building. All notifications must be delivered as quickly as possible. The system must be highly available. The company needs a solution that will automatically scale when this monitoring feature is implemented in other cities.

The notification system is subscribed to an Amazon Simple Notification Service (Amazon SNS) topic for remediation. Which solution will meet these requirements?

A.

Create an Amazon Managed Streaming for Apache Kafka cluster to ingest the data. Use an Apache Spark Streaming with Apache Kafka consumer API in an automatically scaled Amazon EMR cluster to process the incoming data. Usethe Spark Streaming application to detect the known event sequence and send the SNS message.

B.

Create a REST-based web service by using Amazon API Gateway in front of an AWS Lambda function. Create an Amazon RDS for PostgreSQL database with sufficient Provisioned IOPS to meet current demand. Configure the Lambdafunction to store incoming events in the RDS for PostgreSQL database, query the latest data to detect the known event sequence, and send the SNS message.

C.

Create an Amazon Kinesis Data Firehose delivery stream to capture the incoming sensor data. Use an AWS Lambda transformation function to detect the known event sequence and send the SNS message.

D.

Create an Amazon Kinesis data stream to capture the incoming sensor data. Create another stream for notifications. Set up AWS Application Auto Scaling on both streams. Create an Amazon Kinesis Data Analytics for Java application to detect the known event sequence, and add a message to the message stream Configure an AWS Lambda function to poll the message stream and publish to the SNS topic.