ExamGecko
Home Home / Amazon / DAS-C01

Amazon DAS-C01 Practice Test - Questions Answers, Page 8

Question list
Search
Search

List of questions

Search

Related questions











A company that monitors weather conditions from remote construction sites is setting up a solution to collect temperature data from the following two weather stations. Station A, which has 10 sensors

Station B, which has five sensors

These weather stations were placed by onsite subject-matter experts.

Each sensor has a unique ID. The data collected from each sensor will be collected using Amazon Kinesis Data Streams.

Based on the total incoming and outgoing data throughput, a single Amazon Kinesis data stream with two shards is created.

Two partition keys are created based on the station names. During testing, there is a bottleneck on data coming from Station A, but not from Station B. Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.

How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?

A.
Increase the number of shards in Kinesis Data Streams to increase the level of parallelism.
A.
Increase the number of shards in Kinesis Data Streams to increase the level of parallelism.
Answers
B.
Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream. C. Modify the partition key to use the sensor ID instead of the station name.
B.
Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream. C. Modify the partition key to use the sensor ID instead of the station name.
Answers
C.
Reduce the number of sensors in Station A from 10 to 5 sensors.
C.
Reduce the number of sensors in Station A from 10 to 5 sensors.
Answers
Suggested answer: A

A company is migrating its existing on-premises ETL jobs to Amazon EMR. The code consists of a series of jobs written in Java. The company needs to reduce overhead for the system administrators without changing the underlying code. Due to the sensitivity of the data, compliance requires that the company use root device volume encryption on all nodes in the cluster. Corporate standards require that environments be provisioned though AWS CloudFormation when possible.

Which solution satisfies these requirements?

A.
Install open-source Hadoop on Amazon EC2 instances with encrypted root device volumes. Configure the cluster in the CloudFormation template.
A.
Install open-source Hadoop on Amazon EC2 instances with encrypted root device volumes. Configure the cluster in the CloudFormation template.
Answers
B.
Use a CloudFormation template to launch an EMR cluster. In the configuration section of the cluster, define a bootstrap action to enable TLS.
B.
Use a CloudFormation template to launch an EMR cluster. In the configuration section of the cluster, define a bootstrap action to enable TLS.
Answers
C.
Create a custom AMI with encrypted root device volumes. Configure Amazon EMR to use the custom AMI using the CustomAmild property in the CloudFormation template.
C.
Create a custom AMI with encrypted root device volumes. Configure Amazon EMR to use the custom AMI using the CustomAmild property in the CloudFormation template.
Answers
D.
Use a CloudFormation template to launch an EMR cluster. In the configuration section of the cluster, define a bootstrap action to encrypt the root device volume of every node.
D.
Use a CloudFormation template to launch an EMR cluster. In the configuration section of the cluster, define a bootstrap action to encrypt the root device volume of every node.
Answers
Suggested answer: C

Explanation:


Reference: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-custom-ami.html

A company provides an incentive to users who are physically active. The company wants to determine how active the users are by using an application on their mobile devices to track the number of steps they take each day. The company needs to ingest and perform near-real-time analytics on live data. The processed data must be stored and must remain available for 1 year for analytics purposes. Which solution will meet these requirements with the LEAST operational overhead?

A.
Use Amazon Cognito to write the data from the application to Amazon DynamoD
A.
Use Amazon Cognito to write the data from the application to Amazon DynamoD
Answers
B.
Use an AWS Step Functions workflow to create a transient Amazon EMR cluster every hour and process the new data from DynamoDB. Output theprocessed data to Amazon Redshift for analytics. Archive the data from Amazon Redshift after 1 year.
B.
Use an AWS Step Functions workflow to create a transient Amazon EMR cluster every hour and process the new data from DynamoDB. Output theprocessed data to Amazon Redshift for analytics. Archive the data from Amazon Redshift after 1 year.
Answers
C.
Ingest the data into Amazon DynamoDB by using an Amazon API Gateway API as a DynamoDB proxy. Use an AWS Step Functions workflow to create a transient Amazon EMR cluster every hour and process the new data fromDynamoDOutput the processed data to Amazon Redshift to run analytics calculations. Archive the data from Amazon Redshift after 1 year.
C.
Ingest the data into Amazon DynamoDB by using an Amazon API Gateway API as a DynamoDB proxy. Use an AWS Step Functions workflow to create a transient Amazon EMR cluster every hour and process the new data fromDynamoDOutput the processed data to Amazon Redshift to run analytics calculations. Archive the data from Amazon Redshift after 1 year.
Answers
D.
Ingest the data into Amazon Kinesis Data Streams by using an Amazon API Gateway API as a Kinesis proxy. Run Amazon Kinesis Data Analytics on the stream data. Output the processed data into Amazon S3 by using Amazon KinesisData Firehose. Use Amazon Athena to run analytics calculations. Use S3 Lifecycle rules to transition objects to S3 Glacier after 1 year.
D.
Ingest the data into Amazon Kinesis Data Streams by using an Amazon API Gateway API as a Kinesis proxy. Run Amazon Kinesis Data Analytics on the stream data. Output the processed data into Amazon S3 by using Amazon KinesisData Firehose. Use Amazon Athena to run analytics calculations. Use S3 Lifecycle rules to transition objects to S3 Glacier after 1 year.
Answers
E.
Write the data from the application into Amazon S3 by using Amazon Kinesis Data Firehose. Use Amazon Athena to run the analytics on the data in Amazon S3. Use S3 Lifecycle rules to transition objects to S3 Glacier after 1 year.
E.
Write the data from the application into Amazon S3 by using Amazon Kinesis Data Firehose. Use Amazon Athena to run the analytics on the data in Amazon S3. Use S3 Lifecycle rules to transition objects to S3 Glacier after 1 year.
Answers
Suggested answer: C

A company is streaming its high-volume billing data (100 MBps) to Amazon Kinesis Data Streams. A data analyst partitioned the data on account_id to ensure that all records belonging to an account go to the same Kinesis shard and order is maintained. While building a custom consumer using the Kinesis Java SDK, the data analyst notices that, sometimes, the messages arrive out of order for account_id.

Upon further investigation, the data analyst discovers the messages that are out of order seem to be arriving from different shards for the same account_id and are seen when a stream resize runs. What is an explanation for this behavior and what is the solution?

A.
There are multiple shards in a stream and order needs to be maintained in the shard. The data analyst needs to make sure there is only a single shard in the stream and no stream resize runs.
A.
There are multiple shards in a stream and order needs to be maintained in the shard. The data analyst needs to make sure there is only a single shard in the stream and no stream resize runs.
Answers
B.
The hash key generation process for the records is not working correctly. The data analyst should generate an explicit hash key on the producer side so the records are directed to the appropriate shard accurately.
B.
The hash key generation process for the records is not working correctly. The data analyst should generate an explicit hash key on the producer side so the records are directed to the appropriate shard accurately.
Answers
C.
The records are not being received by Kinesis Data Streams in order. The producer should use the PutRecords API call instead of the PutRecord API call with the SequenceNumberForOrdering parameter.
C.
The records are not being received by Kinesis Data Streams in order. The producer should use the PutRecords API call instead of the PutRecord API call with the SequenceNumberForOrdering parameter.
Answers
D.
D.
Answers
Suggested answer: A

Explanation:

A. There are multiple shards in a stream and order needs to be maintained in the shard. The data analyst needs to make sure there is only a single shard in the stream and no stream resize runs.

B. The hash key generation process for the records is not working correctly. The data analyst should generate an explicit hash key on the producer side so the records are directed to the appropriate shard accurately.

C. The records are not being received by Kinesis Data Streams in order. The producer should use the PutRecords API call instead of the PutRecord API call with the SequenceNumberForOrdering parameter.

D. The consumer is not processing the parent shard completely before processing the child shards after a stream resize.The data analyst should process the parent shard completely first before processing the child shards.

Answer: A

Explanation:

A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following tables.

A trips fact table for information on completed rides.

A drivers dimension table for driver profiles.

A customers fact table holding customer profile information.

The company analyzes trip details by date and destination to examine profitability by region. The drivers data rarely changes. The customers data frequently changes. What table design provides optimal query performance?

A.
Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers and customers tables.
A.
Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers and customers tables.
Answers
B.
Use DISTSTYLE EVEN for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
B.
Use DISTSTYLE EVEN for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
Answers
C.
Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
C.
Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
Answers
D.
Use DISTSTYLE EVEN for the drivers table and sort by date. Use DISTSTYLE ALL for both fact tables.
D.
Use DISTSTYLE EVEN for the drivers table and sort by date. Use DISTSTYLE ALL for both fact tables.
Answers
Suggested answer: A

An education provider’s learning management system (LMS) is hosted in a 100 TB data lake that is built on Amazon S3. The provider’s LMS supports hundreds of schools. The provider wants to build an advanced analytics reporting platform using Amazon Redshift to handle complex queries with optimal performance. System users will query the most recent 4 months of data 95% of the time while 5% of the queries will leverage data from the previous 12 months. Which solution meets these requirements in the MOST cost-effective way?

A.
Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Use S3 lifecycle management rules to store data from the previous 12 months in Amazon S3 Glacierstorage.
A.
Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Use S3 lifecycle management rules to store data from the previous 12 months in Amazon S3 Glacierstorage.
Answers
B.
Leverage DS2 nodes for the Amazon Redshift cluster. Migrate all data from Amazon S3 to Amazon Redshift.Decommission the data lake.
B.
Leverage DS2 nodes for the Amazon Redshift cluster. Migrate all data from Amazon S3 to Amazon Redshift.Decommission the data lake.
Answers
C.
Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Ensure the S3 Standard storage class is in use with objects in the data lake.
C.
Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Ensure the S3 Standard storage class is in use with objects in the data lake.
Answers
D.
Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift federated queries to join cluster data with the data lake to reduce costs. Ensure the S3 Standard storage class is in use with objects in the datalake.
D.
Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift federated queries to join cluster data with the data lake to reduce costs. Ensure the S3 Standard storage class is in use with objects in the datalake.
Answers
Suggested answer: C

Explanation:


Reference: https://aws.amazon.com/redshift/pricing/

A large financial company is running its ETL process. Part of this process is to move data from Amazon S3 into an Amazon Redshift cluster. The company wants to use the most cost-efficient method to load the dataset into Amazon Redshift.

Which combination of steps would meet these requirements? (Choose two.)

A.
Use the COPY command with the manifest file to load data into Amazon Redshift.
A.
Use the COPY command with the manifest file to load data into Amazon Redshift.
Answers
B.
Use S3DistCp to load files into Amazon Redshift.
B.
Use S3DistCp to load files into Amazon Redshift.
Answers
C.
Use temporary staging tables during the loading process.
C.
Use temporary staging tables during the loading process.
Answers
D.
Use the UNLOAD command to upload data into Amazon Redshift.
D.
Use the UNLOAD command to upload data into Amazon Redshift.
Answers
E.
Use Amazon Redshift Spectrum to query files from Amazon S3.
E.
Use Amazon Redshift Spectrum to query files from Amazon S3.
Answers
Suggested answer: C, E

Explanation:


Reference: https://aws.amazon.com/blogs/big-data/top-8-best-practices-for-high-performance-etl-processing-using-amazonredshift/

A telecommunications company is looking for an anomaly-detection solution to identify fraudulent calls. The company currently uses Amazon Kinesis to stream voice call records in a JSON format from its on-premises database to Amazon S3.

The existing dataset contains voice call records with 200 columns. To detect fraudulent calls, the solution would need to look at 5 of these columns only. The company is interested in a cost-effective solution using AWS that requires minimal effort and experience in anomalydetection algorithms. Which solution meets these requirements?

A.
Use an AWS Glue job to transform the data from JSON to Apache Parquet. Use AWS Glue crawlers to discover the schema and build the AWS Glue Data Catalog. Use Amazon Athena to create a table with a subset of columns. UseAmazon QuickSight to visualize the data and then use Amazon QuickSight machine learning-powered anomaly detection.
A.
Use an AWS Glue job to transform the data from JSON to Apache Parquet. Use AWS Glue crawlers to discover the schema and build the AWS Glue Data Catalog. Use Amazon Athena to create a table with a subset of columns. UseAmazon QuickSight to visualize the data and then use Amazon QuickSight machine learning-powered anomaly detection.
Answers
B.
Use Kinesis Data Firehose to detect anomalies on a data stream from Kinesis by running SQL queries, which compute an anomaly score for all calls and store the output in Amazon RDS. Use Amazon Athena to build a dataset andAmazon QuickSight to visualize the results.
B.
Use Kinesis Data Firehose to detect anomalies on a data stream from Kinesis by running SQL queries, which compute an anomaly score for all calls and store the output in Amazon RDS. Use Amazon Athena to build a dataset andAmazon QuickSight to visualize the results.
Answers
C.
Use an AWS Glue job to transform the data from JSON to Apache Parquet. Use AWS Glue crawlers to discover the schema and build the AWS Glue Data Catalog. Use Amazon SageMaker to build an anomaly detection model that candetect fraudulent calls by ingesting data from Amazon S3.
C.
Use an AWS Glue job to transform the data from JSON to Apache Parquet. Use AWS Glue crawlers to discover the schema and build the AWS Glue Data Catalog. Use Amazon SageMaker to build an anomaly detection model that candetect fraudulent calls by ingesting data from Amazon S3.
Answers
D.
Use Kinesis Data Analytics to detect anomalies on a data stream from Kinesis by running SQL queries, which compute an anomaly score for all calls. Connect Amazon QuickSight to Kinesis Data Analytics to visualize the anomalyscores.
D.
Use Kinesis Data Analytics to detect anomalies on a data stream from Kinesis by running SQL queries, which compute an anomaly score for all calls. Connect Amazon QuickSight to Kinesis Data Analytics to visualize the anomalyscores.
Answers
Suggested answer: A

A financial company uses Amazon S3 as its data lake and has set up a data warehouse using a multi-node Amazon Redshift cluster. The data files in the data lake are organized in folders based on the data source of each data file. All the data files are loaded to one table in the Amazon Redshift cluster using a separate COPY command for each data file location. With this approach, loading all the data files into Amazon Redshift takes a long time to complete. Users want a faster solution with little or no increase in cost while maintaining the segregation of the data files in the S3 data lake. Which solution meets these requirements?

A.
Use Amazon EMR to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
A.
Use Amazon EMR to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
Answers
B.
Load all the data files in parallel to Amazon Aurora, and run an AWS Glue job to load the data into Amazon Redshift.
B.
Load all the data files in parallel to Amazon Aurora, and run an AWS Glue job to load the data into Amazon Redshift.
Answers
C.
Use an AWS Glue job to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
C.
Use an AWS Glue job to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
Answers
D.
Create a manifest file that contains the data file locations and issue a COPY command to load the data into Amazon Redshift.
D.
Create a manifest file that contains the data file locations and issue a COPY command to load the data into Amazon Redshift.
Answers
Suggested answer: A

Explanation:


Reference: https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

A financial services company is building a data lake solution on Amazon S3. The company plans to use analytics offerings from AWS to meet user needs for one-time querying and business intelligence reports. A portion of the columns will contain personally identifiable information (PII) Only authorized users should be able to see plaintext PII data. What is the MOST operationally efficient solution that meets these requirements?

A.
Define a bucket policy for each S3 bucket of the data lake to allow access to users who have authorization to see PII data. Catalog the data by using AWS Glue. Create two IAM roles. Attach a permissions policy with access to PIIcolumns to one role. Attach a policy without these permissions to the other role.
A.
Define a bucket policy for each S3 bucket of the data lake to allow access to users who have authorization to see PII data. Catalog the data by using AWS Glue. Create two IAM roles. Attach a permissions policy with access to PIIcolumns to one role. Attach a policy without these permissions to the other role.
Answers
B.
Register the S3 locations with AWS Lake Formation. Create two IAM roles. Use Lake Formation data permissions to grant Select permissions to all of the columns for one role. Grant Select permissions to only columns that contain non-PII data for the other role.
B.
Register the S3 locations with AWS Lake Formation. Create two IAM roles. Use Lake Formation data permissions to grant Select permissions to all of the columns for one role. Grant Select permissions to only columns that contain non-PII data for the other role.
Answers
C.
Register the S3 locations with AWS Lake Formation. Create an AWS Glue job to create an ETL workflow that removes the PII columns from the data and creates a separate copy of the data in another data lake S3 bucket. Register thenew S3 locations with Lake Formation. Grant users the permissions to each data lake data based on whether the users are authorized to see PII data.
C.
Register the S3 locations with AWS Lake Formation. Create an AWS Glue job to create an ETL workflow that removes the PII columns from the data and creates a separate copy of the data in another data lake S3 bucket. Register thenew S3 locations with Lake Formation. Grant users the permissions to each data lake data based on whether the users are authorized to see PII data.
Answers
D.
Register the S3 locations with AWS Lake Formation. Create two IAM roles. Attach a permissions policy with access to PII columns to one role. Attach a policy without these permissions to the other role. For each downstream analyticsservice, use its native security functionality and the IAM roles to secure the PII data.
D.
Register the S3 locations with AWS Lake Formation. Create two IAM roles. Attach a permissions policy with access to PII columns to one role. Attach a policy without these permissions to the other role. For each downstream analyticsservice, use its native security functionality and the IAM roles to secure the PII data.
Answers
Suggested answer: C

Explanation:


Reference: https://docs.aws.amazon.com/lake-formation/latest/dg/lake-formation-dg.pdf

Total 214 questions
Go to page: of 22