ExamGecko
Home Home / Amazon / BDS-C00

Amazon BDS-C00 Practice Test - Questions Answers, Page 5

Question list
Search
Search

List of questions

Search

Related questions











An organization uses Amazon Elastic MapReduce(EMR) to process a series of extract-transform-load (ETL) steps that run in sequence. The output of each step must be fully processed in subsequent steps but will not be retained.

Which of the following techniques will meet this requirement most efficiently?

A.
Use the EMR File System (EMRFS) to store the outputs from each step as objects in Amazon Simple Storage Service (S3).
A.
Use the EMR File System (EMRFS) to store the outputs from each step as objects in Amazon Simple Storage Service (S3).
Answers
B.
Use the s3n URI to store the data to be processed as objects in Amazon S3.
B.
Use the s3n URI to store the data to be processed as objects in Amazon S3.
Answers
C.
Define the ETL steps as separate AWS Data Pipeline activities.
C.
Define the ETL steps as separate AWS Data Pipeline activities.
Answers
D.
Load the data to be processed into HDFS, and then write the final output to Amazon S3.
D.
Load the data to be processed into HDFS, and then write the final output to Amazon S3.
Answers
Suggested answer: B

The department of transportation for a major metropolitan area has placed sensors on roads at key locations around the city. The goal is to analyze the flow of traffic and notifications from emergency services to identify potential issues and to help planners correct trouble spots.

A data engineer needs a scalable and fault-tolerant solution that allows planners to respond to issues within 30 seconds of their occurrence. Which solution should the data engineer choose?

A.
Collect the sensor data with Amazon Kinesis Firehose and store it in Amazon Redshift for analysis. Collect emergency services events with Amazon SQSand store in Amazon DynampDB for analysis.
A.
Collect the sensor data with Amazon Kinesis Firehose and store it in Amazon Redshift for analysis. Collect emergency services events with Amazon SQSand store in Amazon DynampDB for analysis.
Answers
B.
Collect the sensor data with Amazon SQS and store in Amazon DynamoDB for analysis. Collect emergency services events with Amazon Kinesis Firehoseand store in Amazon Redshift for analysis.
B.
Collect the sensor data with Amazon SQS and store in Amazon DynamoDB for analysis. Collect emergency services events with Amazon Kinesis Firehoseand store in Amazon Redshift for analysis.
Answers
C.
Collect both sensor data and emergency services events with Amazon Kinesis Streams and use DynamoDB for analysis.
C.
Collect both sensor data and emergency services events with Amazon Kinesis Streams and use DynamoDB for analysis.
Answers
D.
Collect both sensor data and emergency services events with Amazon Kinesis Firehose and use Amazon Redshift for analysis.
D.
Collect both sensor data and emergency services events with Amazon Kinesis Firehose and use Amazon Redshift for analysis.
Answers
Suggested answer: A

A telecommunications company needs to predict customer churn (i.e., customers who decide to switch to a competitor). The company has historic records of each customer, including monthly consumption patterns, calls to customer service, and whether the customer ultimately quit the service. All of this data is stored in Amazon S3. The company needs to know which customers are likely going to churn soon so that they can win back their loyalty. What is the optimal approach to meet these requirements?

A.
Use the Amazon Machine Learning service to build the binary classification model based on the dataset stored in Amazon S3. The model will be usedregularly to predict churn attribute for existing customers.
A.
Use the Amazon Machine Learning service to build the binary classification model based on the dataset stored in Amazon S3. The model will be usedregularly to predict churn attribute for existing customers.
Answers
B.
Use AWS QuickSight to connect it to data stored in Amazon S3 to obtain the necessary business insight. Plot the churn trend graph to extrapolate churnlikelihood for existing customers.
B.
Use AWS QuickSight to connect it to data stored in Amazon S3 to obtain the necessary business insight. Plot the churn trend graph to extrapolate churnlikelihood for existing customers.
Answers
C.
Use EMR to run the Hive queries to build a profile of a churning customer. Apply a profile to existing customers to determine the likelihood of churn.
C.
Use EMR to run the Hive queries to build a profile of a churning customer. Apply a profile to existing customers to determine the likelihood of churn.
Answers
D.
Use a Redshift cluster to COPY the data from Amazon S3. Create a User Defined Function in Redshift that computes the likelihood of churn.
D.
Use a Redshift cluster to COPY the data from Amazon S3. Create a User Defined Function in Redshift that computes the likelihood of churn.
Answers
Suggested answer: B

A system needs to collect on-premises application spool files into a persistent storage layer in AWS. Each spool file is 2 KB. The application generates 1 M files per hour. Each source file is automatically deleted from the local server after an hour. What is the most cost-efficient option to meet these requirements?

A.
Write file contents to an Amazon DynamoDB table.
A.
Write file contents to an Amazon DynamoDB table.
Answers
B.
Copy files to Amazon S3 Standard Storage.
B.
Copy files to Amazon S3 Standard Storage.
Answers
C.
Write file contents to Amazon ElastiCache.
C.
Write file contents to Amazon ElastiCache.
Answers
D.
Copy files to Amazon S3 infrequent Access Storage.
D.
Copy files to Amazon S3 infrequent Access Storage.
Answers
Suggested answer: C

An administrator receives about 100 files per hour into Amazon S3 and will be loading the files into Amazon Redshift. Customers who analyze the data within Redshift gain significant value when they receive data as quickly as possible. The customers have agreed to a maximum loading interval of 5 minutes. Which loading approach should the administrator use to meet this objective?

A.
Load each file as it arrives because getting data into the cluster as quickly as possibly is the priority.
A.
Load each file as it arrives because getting data into the cluster as quickly as possibly is the priority.
Answers
B.
Load the cluster as soon as the administrator has the same number of files as nodes in the cluster.
B.
Load the cluster as soon as the administrator has the same number of files as nodes in the cluster.
Answers
C.
Load the cluster when the administrator has an event multiple of files relative to Cluster Slice Count, or 5 minutes, whichever comes first.
C.
Load the cluster when the administrator has an event multiple of files relative to Cluster Slice Count, or 5 minutes, whichever comes first.
Answers
D.
Load the cluster when the number of files is less than the Cluster Slice Count.
D.
Load the cluster when the number of files is less than the Cluster Slice Count.
Answers
Suggested answer: C

An enterprise customer is migrating to Redshift and is considering using dense storage nodes in its Redshift cluster. The customer wants to migrate 50 TB of data. The customer's query patterns involve performing many joins with thousands of rows.

The customer needs to know how many nodes are needed in its target Redshift cluster. The customer has a limited budget and needs to avoid performing tests unless absolutely needed.

Which approach should this customer use?

A.
Start with many small nodes.
A.
Start with many small nodes.
Answers
B.
Start with fewer large nodes.
B.
Start with fewer large nodes.
Answers
C.
Have two separate clusters with a mix of a small and large nodes.
C.
Have two separate clusters with a mix of a small and large nodes.
Answers
D.
Insist on performing multiple tests to determine the optimal configuration.
D.
Insist on performing multiple tests to determine the optimal configuration.
Answers
Suggested answer: A

A company is centralizing a large number of unencrypted small files from multiple Amazon S3 buckets. The company needs to verify that the files contain the same data after centralization.

Which method meets the requirements?

A.
Compare the S3 Etags from the source and destination objects.
A.
Compare the S3 Etags from the source and destination objects.
Answers
B.
Call the S3 CompareObjects API for the source and destination objects.
B.
Call the S3 CompareObjects API for the source and destination objects.
Answers
C.
Place a HEAD request against the source and destination objects comparing SIG v4 headers.
C.
Place a HEAD request against the source and destination objects comparing SIG v4 headers.
Answers
D.
Compare the size of the source and destination objects.
D.
Compare the size of the source and destination objects.
Answers
Suggested answer: A

An online gaming company uses DynamoDB to store user activity logs and is experiencing throttled writes on the company's DynamoDB table. The company is NOT consuming close to the provisioned capacity. The table contains a large number of items and is partitioned on user and sorted by date. The table is 200GB and is currently provisioned at 10K WCU and 20K RCU.

Which two additional pieces of information are required to determine the cause of the throttling? (Choose two.)

A.
The structure of any GSIs that have been defined on the table
A.
The structure of any GSIs that have been defined on the table
Answers
B.
CloudWatch data showing consumed and provisioned write capacity when writes are being throttled
B.
CloudWatch data showing consumed and provisioned write capacity when writes are being throttled
Answers
C.
Application-level metrics showing the average item size and peak update rates for each attribute
C.
Application-level metrics showing the average item size and peak update rates for each attribute
Answers
D.
The structure of any LSIs that have been defined on the table
D.
The structure of any LSIs that have been defined on the table
Answers
E.
The maximum historical WCU and RCU for the table
E.
The maximum historical WCU and RCU for the table
Answers
Suggested answer: A, D

A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints: Bicycle origination points

Bicycle destination points

Mileage between the points

Number of bicycle slots available at the station (which is variable based on the station location) Number of slots available and taken at a given time The program has received additional funds to increase the number of bicycle stations available. All data is regularly archived to Amazon Glacier. The new bicycle stations must be located to provide the most riders access to bicycles. How should this task be performed?

A.
Move the data from Amazon S3 into Amazon EBS-backed volumes and use an EC-2 based Hadoop cluster with spot instances to run a Spark job thatperforms a stochastic gradient descent optimization.
A.
Move the data from Amazon S3 into Amazon EBS-backed volumes and use an EC-2 based Hadoop cluster with spot instances to run a Spark job thatperforms a stochastic gradient descent optimization.
Answers
B.
Use the Amazon Redshift COPY command to move the data from Amazon S3 into Redshift and perform a SQL query that outputs the most popular bicyclestations.
B.
Use the Amazon Redshift COPY command to move the data from Amazon S3 into Redshift and perform a SQL query that outputs the most popular bicyclestations.
Answers
C.
Persist the data on Amazon S3 and use a transient EMR cluster with spot instances to run a Spark streaming job that will move the data into AmazonKinesis.
C.
Persist the data on Amazon S3 and use a transient EMR cluster with spot instances to run a Spark streaming job that will move the data into AmazonKinesis.
Answers
D.
Keep the data on Amazon S3 and use an Amazon EMR-based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradientdescent optimization over EMRFS.
D.
Keep the data on Amazon S3 and use an Amazon EMR-based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradientdescent optimization over EMRFS.
Answers
Suggested answer: B

An administrator tries to use the Amazon Machine Learning service to classify social media posts that mention the administrator's company into posts that require a response and posts that do not. The training dataset of 10,000 posts contains the details of each post including the timestamp, author, and full text of the post. The administrator is missing the target labels that are required for training. Which Amazon Machine Learning model is the most appropriate for the task?

A.
Binary classification model, where the target class is the require-response post
A.
Binary classification model, where the target class is the require-response post
Answers
B.
Binary classification model, where the two classes are the require-response post and does-not-require-response
B.
Binary classification model, where the two classes are the require-response post and does-not-require-response
Answers
C.
Multi-class prediction model, with two classes: require-response post and does-not-require-response
C.
Multi-class prediction model, with two classes: require-response post and does-not-require-response
Answers
D.
Regression model where the predicted value is the probability that the post requires a response
D.
Regression model where the predicted value is the probability that the post requires a response
Answers
Suggested answer: A
Total 85 questions
Go to page: of 9