ExamGecko
Home Home / Amazon / DAS-C01

Amazon DAS-C01 Practice Test - Questions Answers, Page 18

Question list
Search
Search

List of questions

Search

Related questions











A machinery company wants to collect data from sensors. A data analytics specialist needs to implement a solution that aggregates the data in near-real time and saves the data to a persistent data store. The data must be stored in nested JSON format and must be queried from the data store with a latency of single-digit milliseconds.

Which solution will meet these requirements?

A.
Use Amazon Kinesis Data Streams to receive the data from the sensors. Use Amazon Kinesis Data Analytics to read the stream, aggregate the data, and send the data to an AWS Lambda function. Configure the Lambda function to store the data in Amazon DynamoDB.
A.
Use Amazon Kinesis Data Streams to receive the data from the sensors. Use Amazon Kinesis Data Analytics to read the stream, aggregate the data, and send the data to an AWS Lambda function. Configure the Lambda function to store the data in Amazon DynamoDB.
Answers
B.
Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use Amazon Kinesis Data Analytics to aggregate the data. Use an AWS Lambda function to read the data from Kinesis Data Analytics and store the data in Amazon S3.
B.
Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use Amazon Kinesis Data Analytics to aggregate the data. Use an AWS Lambda function to read the data from Kinesis Data Analytics and store the data in Amazon S3.
Answers
C.
Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use an AWS Lambda function to aggregate the data during capture. Store the data from Kinesis Data Firehose in Amazon DynamoDB.
C.
Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use an AWS Lambda function to aggregate the data during capture. Store the data from Kinesis Data Firehose in Amazon DynamoDB.
Answers
D.
Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use an AWS Lambda function to aggregate the data during capture. Store the data in Amazon S3.
D.
Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use an AWS Lambda function to aggregate the data during capture. Store the data in Amazon S3.
Answers
Suggested answer: C

Explanation:

This solution meets the requirements because:

Amazon Kinesis Data Firehose is a fully managed service that can capture, transform, and load streaming data into AWS data stores, such as Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Amazon DynamoDB1. It can receive data from sensors and other sources and deliver it to a destination with near-real time latency.

AWS Lambda is a serverless compute service that can run code in response to events and automatically manage the underlying compute resources2.It can be used to perform custom transformations on the data during capture by Kinesis Data Firehose3. It can aggregate the data according to the desired logic and output format.

Amazon DynamoDB is a fully managed NoSQL database service that supports key-value and document data models4. It can store nested JSON data as document attributes and provide single-digit millisecond latency for queries. It can be used as a persistent data store for the aggregated sensor data.

A financial company uses Amazon Athena to query data from an Amazon S3 data lake. Files are stored in the S3 data lake in Apache ORC format. Data analysts recently introduced nested fields in the data lake ORC files, and noticed that queries are taking longer to run in Athena. A data analysts discovered that more data than what is required is being scanned for the queries.

What is the MOST operationally efficient solution to improve query performance?

A.
Flatten nested data and create separate files for each nested dataset.
A.
Flatten nested data and create separate files for each nested dataset.
Answers
B.
Use the Athena query engine V2 and push the query filter to the source ORC file.
B.
Use the Athena query engine V2 and push the query filter to the source ORC file.
Answers
C.
Use Apache Parquet format instead of ORC format.
C.
Use Apache Parquet format instead of ORC format.
Answers
D.
Recreate the data partition strategy and further narrow down the data filter criteria.
D.
Recreate the data partition strategy and further narrow down the data filter criteria.
Answers
Suggested answer: B

Explanation:

This solution meets the requirement because:

The Athena query engine V2 is a new version of the Athena query engine that introduces several improvements and new features, such as federated queries, geospatial functions, prepared statements, schema evolution support, and more1.

One of the improvements of the Athena query engine V2 is that it supports predicate pushdown for nested fields in ORC files. Predicate pushdown is a technique that allows filtering data at the source before it is scanned and loaded into memory.This can reduce the amount of data scanned and processed by Athena, which can improve query performance and reduce cost12.

By using the Athena query engine V2 and pushing the query filter to the source ORC file, the data analysts can leverage the predicate pushdown feature for nested fields and avoid scanning more data than what is required for the queries. This can improve query performance without changing the data format or partitioning strategy.

A gaming company is building a serverless data lake. The company is ingesting streaming data into Amazon Kinesis Data Streams and is writing the data to Amazon S3 through Amazon Kinesis Data Firehose. The company is using 10 MB as the S3 buffer size and is using 90 seconds as the buffer interval. The company runs an AWS Glue ET L job to merge and transform the data to a different format before writing the data back to Amazon S3.

Recently, the company has experienced substantial growth in its data volume. The AWS Glue ETL jobs are frequently showing an OutOfMemoryError error.

Which solutions will resolve this issue without incurring additional costs? (Select TWO.)

A.
Place the small files into one S3 folder. Define one single table for the small S3 files in AWS Glue Data Catalog. Rerun the AWS Glue ET L jobs against this AWS Glue table.
A.
Place the small files into one S3 folder. Define one single table for the small S3 files in AWS Glue Data Catalog. Rerun the AWS Glue ET L jobs against this AWS Glue table.
Answers
B.
Create an AWS Lambda function to merge small S3 files and invoke them periodically. Run the AWS Glue ETL jobs after successful completion of the Lambda function.
B.
Create an AWS Lambda function to merge small S3 files and invoke them periodically. Run the AWS Glue ETL jobs after successful completion of the Lambda function.
Answers
C.
Run the S3DistCp utility in Amazon EMR to merge a large number of small S3 files before running the AWS Glue ETL jobs.
C.
Run the S3DistCp utility in Amazon EMR to merge a large number of small S3 files before running the AWS Glue ETL jobs.
Answers
D.
Use the groupFiIes setting in the AWS Glue ET L job to merge small S3 files and rerun AWS Glue E TL jobs.
D.
Use the groupFiIes setting in the AWS Glue ET L job to merge small S3 files and rerun AWS Glue E TL jobs.
Answers
E.
Update the Kinesis Data Firehose S3 buffer size to 128 MB. Update the buffer interval to 900 seconds.
E.
Update the Kinesis Data Firehose S3 buffer size to 128 MB. Update the buffer interval to 900 seconds.
Answers
Suggested answer: A, D

Explanation:

The groupFiles setting is a feature of AWS Glue that enables an ETL job to group files when they are read from an Amazon S3 data store.This can reduce the number of ETL tasks and in-memory partitions, and improve the performance and memory efficiency of the job1. By using the groupFiles setting in the AWS Glue ETL job, the gaming company can merge small S3 files and avoid the OutOfMemoryError error.

The Kinesis Data Firehose S3 buffer size and buffer interval are parameters that determine how much data is buffered before delivering it to Amazon S3.Increasing the buffer size and buffer interval can result in larger files being delivered to Amazon S3, which can reduce the number of small files and improve the performance of downstream processing2. By updating the Kinesis Data Firehose S3 buffer size to 128 MB and buffer interval to 900 seconds, the gaming company can create fewer, larger S3 files and avoid the OutOfMemoryError error.

A company uses Amazon Connect to manage its contact center. The company uses Salesforce to manage its customer relationship management (CRM) data. The company must build a pipeline to ingest data from Amazon Connect and Salesforce into a data lake that is built on Amazon S3.

Which solution will meet this requirement with the LEAST operational overhead?

A.
Use Amazon Kinesis Data Streams to ingest the Amazon Connect data. Use Amazon AppFlow to ingest the Salesforce data.
A.
Use Amazon Kinesis Data Streams to ingest the Amazon Connect data. Use Amazon AppFlow to ingest the Salesforce data.
Answers
B.
Use Amazon Kinesis Data Firehose to ingest the Amazon Connect data. Use Amazon Kinesis Data Streams to ingest the Salesforce data.
B.
Use Amazon Kinesis Data Firehose to ingest the Amazon Connect data. Use Amazon Kinesis Data Streams to ingest the Salesforce data.
Answers
C.
Use Amazon Kinesis Data Firehose to ingest the Amazon Connect data. Use Amazon AppFlow to ingest the Salesforce data.
C.
Use Amazon Kinesis Data Firehose to ingest the Amazon Connect data. Use Amazon AppFlow to ingest the Salesforce data.
Answers
D.
Use Amazon AppFlow to ingest the Amazon Connect data. Use Amazon Kinesis Data Firehose to ingest the Salesforce data.
D.
Use Amazon AppFlow to ingest the Amazon Connect data. Use Amazon Kinesis Data Firehose to ingest the Salesforce data.
Answers
Suggested answer: B

A company is designing a data warehouse to support business intelligence reporting. Users will access the executive dashboard heavily each Monday and Friday morning for I hour. These read-only queries will run on the active Amazon Redshift cluster, which runs on dc2.8xIarge compute nodes 24 hours a day, 7 days a week. There are three queues set up in workload management: Dashboard, ETL, and System. The Amazon Redshift cluster needs to process the queries without wait time.

What is the MOST cost-effective way to ensure that the cluster processes these queries?

A.
Perform a classic resize to place the cluster in read-only mode while adding an additional node to the cluster.
A.
Perform a classic resize to place the cluster in read-only mode while adding an additional node to the cluster.
Answers
B.
Enable automatic workload management.
B.
Enable automatic workload management.
Answers
C.
Perform an elastic resize to add an additional node to the cluster.
C.
Perform an elastic resize to add an additional node to the cluster.
Answers
D.
Enable concurrency scaling for the Dashboard workload queue.
D.
Enable concurrency scaling for the Dashboard workload queue.
Answers
Suggested answer: D

A marketing company has an application that stores event data in an Amazon RDS database. The company is replicating this data to Amazon Redshift for reporting and business intelligence (BI) purposes. New event data is continuously generated and ingested into the RDS database throughout the day and captured by a change data capture (CDC) replication task in AWS Database Migration Service (AWS DMS). The company requires that the new data be replicated to Amazon Redshift in near-real time.

Which solution meets these requirements?

A.
Use Amazon Kinesis Data Streams as the destination of the CDC replication task in AWS DMS. Use an AWS Glue streaming job to read changed records from Kinesis Data Streams and perform an upsert into the Redshift cluster.
A.
Use Amazon Kinesis Data Streams as the destination of the CDC replication task in AWS DMS. Use an AWS Glue streaming job to read changed records from Kinesis Data Streams and perform an upsert into the Redshift cluster.
Answers
B.
Use Amazon S3 as the destination of the CDC replication task in AWS DMS. Use the COPY command to load data into the Redshift cluster.
B.
Use Amazon S3 as the destination of the CDC replication task in AWS DMS. Use the COPY command to load data into the Redshift cluster.
Answers
C.
Use Amazon DynamoDB as the destination of the CDC replication task in AWS DMS. Use the COPY command to load data into the Redshift cluster.
C.
Use Amazon DynamoDB as the destination of the CDC replication task in AWS DMS. Use the COPY command to load data into the Redshift cluster.
Answers
D.
Use Amazon Kinesis Data Firehose as the destination of the CDC replication task in AWS DMS. Use an AWS Glue streaming job to read changed records from Kinesis Data Firehose and perform an upsert into the Redshift cluster.
D.
Use Amazon Kinesis Data Firehose as the destination of the CDC replication task in AWS DMS. Use an AWS Glue streaming job to read changed records from Kinesis Data Firehose and perform an upsert into the Redshift cluster.
Answers
Suggested answer: A

A company is creating a data lake by using AWS Lake Formation. The data that will be stored in the data lake contains sensitive customer information and must be encrypted at rest using an AWS Key Management Service (AWS KMS) customer managed key to meet regulatory requirements.

How can the company store the data in the data lake to meet these requirements?

A.
Store the data in an encrypted Amazon Elastic Block Store (Amazon EBS) volume. Register the Amazon EBS volume with Lake Formation.
A.
Store the data in an encrypted Amazon Elastic Block Store (Amazon EBS) volume. Register the Amazon EBS volume with Lake Formation.
Answers
B.
Store the data in an Amazon S3 bucket by using server-side encryption with AWS KMS (SSE-KMS). Register the S3 location with Lake Formation.
B.
Store the data in an Amazon S3 bucket by using server-side encryption with AWS KMS (SSE-KMS). Register the S3 location with Lake Formation.
Answers
C.
Encrypt the data on the client side and store the encrypted data in an Amazon S3 bucket. Register the S3 location with Lake Formation.
C.
Encrypt the data on the client side and store the encrypted data in an Amazon S3 bucket. Register the S3 location with Lake Formation.
Answers
D.
Store the data in an Amazon S3 Glacier Flexible Retrieval vault bucket. Register the S3 Glacier Flexible Retrieval vault with Lake Formation.
D.
Store the data in an Amazon S3 Glacier Flexible Retrieval vault bucket. Register the S3 Glacier Flexible Retrieval vault with Lake Formation.
Answers
Suggested answer: B

An ecommerce company uses Amazon Aurora PostgreSQL to process and store live transactional data and uses Amazon Redshift for its data warehouse solution. A nightly ET L job has been implemented to update the Redshift cluster with new data from the PostgreSQL database. The business has grown rapidly and so has the size and cost of the Redshift cluster. The company's data analytics team needs to create a solution to archive historical data and only keep the most recent 12 months of data in Amazon

Redshift to reduce costs. Data analysts should also be able to run analytics queries that effectively combine data from live transactional data in PostgreSQL, current data in Redshift, and archived historical data.

Which combination of tasks will meet these requirements? (Select THREE.)

A.
Configure the Amazon Redshift Federated Query feature to query live transactional data in the PostgreSQL database.
A.
Configure the Amazon Redshift Federated Query feature to query live transactional data in the PostgreSQL database.
Answers
B.
Configure Amazon Redshift Spectrum to query live transactional data in the PostgreSQL database.
B.
Configure Amazon Redshift Spectrum to query live transactional data in the PostgreSQL database.
Answers
C.
Schedule a monthly job to copy data older than 12 months to Amazon S3 by using the UNLOAD command, and then delete that data from the Redshift cluster. Configure Amazon Redshift Spectrum to access historical data in Amazon S3.
C.
Schedule a monthly job to copy data older than 12 months to Amazon S3 by using the UNLOAD command, and then delete that data from the Redshift cluster. Configure Amazon Redshift Spectrum to access historical data in Amazon S3.
Answers
D.
Schedule a monthly job to copy data older than 12 months to Amazon S3 Glacier Flexible Retrieval by using the UNLOAD command, and then delete that data from the Redshift cluster. Configure Redshift Spectrum to access historical data with S3 Glacier Flexible Retrieval.
D.
Schedule a monthly job to copy data older than 12 months to Amazon S3 Glacier Flexible Retrieval by using the UNLOAD command, and then delete that data from the Redshift cluster. Configure Redshift Spectrum to access historical data with S3 Glacier Flexible Retrieval.
Answers
E.
Create a late-binding view in Amazon Redshift that combines live, current, and historical data from different sources.
E.
Create a late-binding view in Amazon Redshift that combines live, current, and historical data from different sources.
Answers
F.
Create a materialized view in Amazon Redshift that combines live, current, and historical data from different sources.
F.
Create a materialized view in Amazon Redshift that combines live, current, and historical data from different sources.
Answers
Suggested answer: A, C, E

A large marketing company needs to store all of its streaming logs and create near-real-time dashboards. The dashboards will be used to help the company make critical business decisions and must be highly available.

Which solution meets these requirements?

A.
Store the streaming logs in Amazon S3 with replication to an S3 bucket in a different Availability Zone. Create the dashboards by using Amazon QuickSight.
A.
Store the streaming logs in Amazon S3 with replication to an S3 bucket in a different Availability Zone. Create the dashboards by using Amazon QuickSight.
Answers
B.
Deploy an Amazon Redshift cluster with at least three nodes in a VPC that spans two Availability Zones. Store the streaming logs and use the Redshift cluster as a source to create the dashboards by using Amazon QuickSight.
B.
Deploy an Amazon Redshift cluster with at least three nodes in a VPC that spans two Availability Zones. Store the streaming logs and use the Redshift cluster as a source to create the dashboards by using Amazon QuickSight.
Answers
C.
Store the streaming logs in Amazon S3 with replication to an S3 bucket in a different Availability Zone. Every time a new log is added in the bucket, invoke an AWS Lambda function to update the dashboards in Amazon QuickSight.
C.
Store the streaming logs in Amazon S3 with replication to an S3 bucket in a different Availability Zone. Every time a new log is added in the bucket, invoke an AWS Lambda function to update the dashboards in Amazon QuickSight.
Answers
D.
Store the streaming logs in Amazon OpenSearch Service deployed across three Availability Zones and with three dedicated master nodes. Create the dashboards by using OpenSearch Dashboards.
D.
Store the streaming logs in Amazon OpenSearch Service deployed across three Availability Zones and with three dedicated master nodes. Create the dashboards by using OpenSearch Dashboards.
Answers
Suggested answer: D

Explanation:

This solution meets the requirements because:

Amazon OpenSearch Service is a fully managed service that makes it easy for you to deploy, secure, and run OpenSearch cost-effectively at scale1. You can build, monitor, and troubleshoot your applications using the tools you love at the scale you need.The service provides support for open-source OpenSearch APIs, managed Kibana, integration with Logstash and other AWS services, and built-in alerting and SQL querying1.

Amazon OpenSearch Service can store and analyze streaming logs from various sources, such as Amazon CloudWatch, Amazon Kinesis Data Firehose, AWS IoT, and AWS Lambda1.You can use the built-in connectors for these services to ingest data into your OpenSearch Service domain1.

Amazon OpenSearch Service allows you to create near-real-time dashboards using OpenSearch Dashboards (formerly Kibana), which is a powerful visualization tool that is integrated with OpenSearch Service1.You can use OpenSearch Dashboards to explore and visualize your data, create interactive charts and graphs, and share your insights with others1.

Amazon OpenSearch Service supports high availability by allowing you to deploy your domain across multiple Availability Zones and use dedicated master nodes to ensure cluster stability1.You can also enable snapshots and cross-cluster replication to back up and restore your data in case of a failure1.

A company's system operators and security engineers need to analyze activities within specific date ranges of AWS CloudTrail logs. All log files are stored in an Amazon S3 bucket, and the size of the logs is more than 5 T B. The solution must be cost-effective and maximize query performance.

Which solution meets these requirements?

A.
Copy the logs to a new S3 bucket with a prefix structure of <PARTITION COLUMN_NAME>. Use the date column as a partition key. Create a table on Amazon Athena based on the objects in the new bucket. Automatically add metadata partitions by using the MSCK REPAIR TABLE command in Athena. Use Athena to query the table and partitions.
A.
Copy the logs to a new S3 bucket with a prefix structure of <PARTITION COLUMN_NAME>. Use the date column as a partition key. Create a table on Amazon Athena based on the objects in the new bucket. Automatically add metadata partitions by using the MSCK REPAIR TABLE command in Athena. Use Athena to query the table and partitions.
Answers
B.
Create a table on Amazon Athena. Manually add metadata partitions by using the ALTER TABLE ADD PARTITION statement, and use multiple columns for the partition key. Use Athena to query the table and partitions.
B.
Create a table on Amazon Athena. Manually add metadata partitions by using the ALTER TABLE ADD PARTITION statement, and use multiple columns for the partition key. Use Athena to query the table and partitions.
Answers
C.
Launch an Amazon EMR cluster and use Amazon S3 as a data store for Apache HBase. Load the logs from the S3 bucket to an HBase table on Amazon EMR. Use Amazon Athena to query the table and partitions.
C.
Launch an Amazon EMR cluster and use Amazon S3 as a data store for Apache HBase. Load the logs from the S3 bucket to an HBase table on Amazon EMR. Use Amazon Athena to query the table and partitions.
Answers
D.
Create an AWS Glue job to copy the logs from the S3 source bucket to a new S3 bucket and create a table using Apache Parquet file format, Snappy as compression codec, and partition by date. Use Amazon Athena to query the table and partitions.
D.
Create an AWS Glue job to copy the logs from the S3 source bucket to a new S3 bucket and create a table using Apache Parquet file format, Snappy as compression codec, and partition by date. Use Amazon Athena to query the table and partitions.
Answers
Suggested answer: D

Explanation:

This solution meets the requirements because:

AWS Glue is a fully managed extract, transform, and load (ETL) service that can be used to prepare and load data for analytics1.You can use AWS Glue to create a job that copies the CloudTrail logs from the source S3 bucket to a new S3 bucket, and converts them to Apache Parquet format2.Parquet is a columnar storage format that is optimized for analytics and supports compression3.Snappy is a compression codec that provides a good balance between compression ratio and speed4.

AWS Glue can also create a table based on the Parquet files in the new S3 bucket, and partition the table by date2.Partitioning is a technique that divides a large dataset into smaller subsets based on a partition key, such as date5.Partitioning can improve query performance by reducing the amount of data scanned and filtering out irrelevant data5.

Amazon Athena is an interactive query service that allows you to analyze data in S3 using standard SQL6. You can use Athena to query the table created by AWS Glue, and specify the partitions you want to query based on the date range. Athena can leverage the benefits of Parquet format and partitioning to run queries faster and more cost-effectively.

Total 214 questions
Go to page: of 22