ExamGecko
Home Home / Amazon / BDS-C00

Amazon BDS-C00 Practice Test - Questions Answers

Question list
Search
Search

List of questions

Search

Related questions











A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema in Amazon Redshift. What is the most efficient architecture strategy for this purpose?

A.
Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data into the analysis schema within Redshift.
A.
Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data into the analysis schema within Redshift.
Answers
B.
Load the unstructured data into Redshift, and use string parsing functions to extract structured data for inserting into the analysis schema.
B.
Load the unstructured data into Redshift, and use string parsing functions to extract structured data for inserting into the analysis schema.
Answers
C.
When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insert the data into the analysis schemaon Redshift.
C.
When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insert the data into the analysis schemaon Redshift.
Answers
D.
Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data into Redshift.
D.
Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data into Redshift.
Answers
Suggested answer: A

A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage. Which AWS service strategy is best for this use case?

A.
Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.
A.
Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.
Answers
B.
Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.
B.
Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.
Answers
C.
Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.
C.
Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.
Answers
D.
Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.
D.
Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.
Answers
Suggested answer: C

Explanation:

Reference: https://aws.amazon.com/blogs/database/indexing-metadata-in-amazon-elasticsearch-service-using-aws-lambda-and-python/

A data engineer chooses Amazon DynamoDB as a data store for a regulated application. This application must be submitted to regulators for review. The data engineer needs to provide a control framework that lists the security controls from the process to follow to add new users down to the physical controls of the data center, including items like security guards and cameras. How should this control mapping be achieved using AWS?

A.
Request AWS third-party audit reports and/or the AWS quality addendum and map the AWS responsibilities to the controls that must be provided.
A.
Request AWS third-party audit reports and/or the AWS quality addendum and map the AWS responsibilities to the controls that must be provided.
Answers
B.
Request data center Temporary Auditor access to an AWS data center to verify the control mapping.
B.
Request data center Temporary Auditor access to an AWS data center to verify the control mapping.
Answers
C.
Request relevant SLAs and security guidelines for Amazon DynamoDB and define these guidelines within the application's architecture to map to the controlframework.
C.
Request relevant SLAs and security guidelines for Amazon DynamoDB and define these guidelines within the application's architecture to map to the controlframework.
Answers
D.
Request Amazon DynamoDB system architecture designs to determine how to map the AWS responsibilities to the control that must be provided.
D.
Request Amazon DynamoDB system architecture designs to determine how to map the AWS responsibilities to the control that must be provided.
Answers
Suggested answer: A

An administrator needs to design a distribution strategy for a star schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which three circumstances would choosing Key-based distribution be most appropriate? (Select three.)

A.
When the administrator needs to optimize a large, slowly changing dimension table.
A.
When the administrator needs to optimize a large, slowly changing dimension table.
Answers
B.
When the administrator needs to reduce cross-node traffic.
B.
When the administrator needs to reduce cross-node traffic.
Answers
C.
When the administrator needs to optimize the fact table for parity with the number of slices.
C.
When the administrator needs to optimize the fact table for parity with the number of slices.
Answers
D.
When the administrator needs to balance data distribution and collocation data.
D.
When the administrator needs to balance data distribution and collocation data.
Answers
E.
When the administrator needs to take advantage of data locality on a local node for joins and aggregates.
E.
When the administrator needs to take advantage of data locality on a local node for joins and aggregates.
Answers
Suggested answer: A, C, D

Company A operates in Country X. Company A maintains a large dataset of historical purchase orders that contains personal data of their customers in the form of full names and telephone numbers. The dataset consists of 5 text files, 1TB each. Currently the dataset resides on-premises due to legal requirements of storing personal data in-country. The research and development department needs to run a clustering algorithm on the dataset and wants to use Elastic Map Reduce service in the closest AWS region. Due to geographic distance, the minimum latency between the on-premises system and the closet AWS region is 200 ms.

Which option allows Company A to do clustering in the AWS Cloud and meet the legal requirement of maintaining personal data in-country?

A.
Anonymize the personal data portions of the dataset and transfer the data files into Amazon S3 in the AWS region. Have the EMR cluster read the datasetusing EMRFS.
A.
Anonymize the personal data portions of the dataset and transfer the data files into Amazon S3 in the AWS region. Have the EMR cluster read the datasetusing EMRFS.
Answers
B.
Establish a Direct Connect link between the on-premises system and the AWS region to reduce latency. Have the EMR cluster read the data directly from theon-premises storage system over Direct Connect.
B.
Establish a Direct Connect link between the on-premises system and the AWS region to reduce latency. Have the EMR cluster read the data directly from theon-premises storage system over Direct Connect.
Answers
C.
Encrypt the data files according to encryption standards of Country X and store them on AWS region in Amazon S3. Have the EMR cluster read the datasetusing EMRFS.
C.
Encrypt the data files according to encryption standards of Country X and store them on AWS region in Amazon S3. Have the EMR cluster read the datasetusing EMRFS.
Answers
D.
Use AWS Import/Export Snowball device to securely transfer the data to the AWS region and copy the files onto an EBS volume. Have the EMR cluster readthe dataset using EMRFS.
D.
Use AWS Import/Export Snowball device to securely transfer the data to the AWS region and copy the files onto an EBS volume. Have the EMR cluster readthe dataset using EMRFS.
Answers
Suggested answer: B

An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.)

A.
When the tables are highly denormalized and do NOT participate in frequent joins.
A.
When the tables are highly denormalized and do NOT participate in frequent joins.
Answers
B.
When data must be grouped based on a specific key on a defined slice.
B.
When data must be grouped based on a specific key on a defined slice.
Answers
C.
When data transfer between nodes must be eliminated.
C.
When data transfer between nodes must be eliminated.
Answers
D.
When a new table has been loaded and it is unclear how it will be joined to dimension tables.
D.
When a new table has been loaded and it is unclear how it will be joined to dimension tables.
Answers
Suggested answer: B, D

A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job.

Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the EMR job.

Which recommendation should an administrator provide?

A.
Reduce the HDFS block size to increase the number of task processors.
A.
Reduce the HDFS block size to increase the number of task processors.
Answers
B.
Use bzip2 or Snappy rather than gzip for the archives.
B.
Use bzip2 or Snappy rather than gzip for the archives.
Answers
C.
Decompress the gzip archives and store the data as CSV files.
C.
Decompress the gzip archives and store the data as CSV files.
Answers
D.
Use Avro rather than gzip for the archives.
D.
Use Avro rather than gzip for the archives.
Answers
Suggested answer: B

A web-hosting company is building a web analytics tool to capture clickstream data from all of the websites hosted within its platform and to provide near-realtime business intelligence. This entire system is built on AWS services. The webhosting company is interested in using Amazon Kinesis to collect this data and perform sliding window analytics.

What is the most reliable and fault-tolerant technique to get each website to send data to Amazon Kinesis with every click?

A.
After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the sessionID as a partition key and setup a loop to retry until a success response is received.
A.
After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the sessionID as a partition key and setup a loop to retry until a success response is received.
Answers
B.
After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis Producer Library .addRecords method.
B.
After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis Producer Library .addRecords method.
Answers
C.
Each web server buffers the requests until the count reaches 500 and sends them to Amazon Kinesis using the Amazon Kinesis PutRecord API call.
C.
Each web server buffers the requests until the count reaches 500 and sends them to Amazon Kinesis using the Amazon Kinesis PutRecord API call.
Answers
D.
After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the exponential back-off algorithm forretries until a successful response is received.
D.
After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the exponential back-off algorithm forretries until a successful response is received.
Answers
Suggested answer: A

A customer has an Amazon S3 bucket. Objects are uploaded simultaneously by a cluster of servers from multiple streams of data. The customer maintains a catalog of objects uploaded in Amazon S3 using an Amazon DynamoDB table. This catalog has the following fileds: StreamName, TimeStamp, and ServerName, from which ObjectName can be obtained.

The customer needs to define the catalog to support querying for a given stream or server within a defined time range.

Which DynamoDB table scheme is most efficient to support these queries?

A.
Define a Primary Key with ServerName as Partition Key and TimeStamp as Sort Key. Do NOT define a Local Secondary Index or Global Secondary Index.
A.
Define a Primary Key with ServerName as Partition Key and TimeStamp as Sort Key. Do NOT define a Local Secondary Index or Global Secondary Index.
Answers
B.
Define a Primary Key with StreamName as Partition Key and TimeStamp followed by ServerName as Sort Key. Define a Global Secondary Index withServerName as partition key and TimeStamp followed by StreamName.
B.
Define a Primary Key with StreamName as Partition Key and TimeStamp followed by ServerName as Sort Key. Define a Global Secondary Index withServerName as partition key and TimeStamp followed by StreamName.
Answers
C.
Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with StreamName as Partition Key. Define a Global SecondaryIndex with TimeStamp as Partition Key.
C.
Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with StreamName as Partition Key. Define a Global SecondaryIndex with TimeStamp as Partition Key.
Answers
D.
Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with TimeStamp as Partition Key. Define a Global Secondary Indexwith StreamName as Partition Key and TimeStamp as Sort Key.
D.
Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with TimeStamp as Partition Key. Define a Global Secondary Indexwith StreamName as Partition Key and TimeStamp as Sort Key.
Answers
Suggested answer: A

A company has several teams of analysts. Each team of analysts has their own cluster. The teams need to run SQL queries using Hive, Spark-SQL, and Presto with Amazon EMR. The company needs to enable a centralized metadata layer to expose the Amazon S3 objects as tables to the analysts. Which approach meets the requirement for a centralized metadata layer?

A.
EMRFS consistent view with a common Amazon DynamoDB table
A.
EMRFS consistent view with a common Amazon DynamoDB table
Answers
B.
Bootstrap action to change the Hive Metastore to an Amazon RDS database
B.
Bootstrap action to change the Hive Metastore to an Amazon RDS database
Answers
C.
s3distcp with the outputManifest option to generate RDS DDL
C.
s3distcp with the outputManifest option to generate RDS DDL
Answers
D.
Naming scheme support with automatic partition discovery from Amazon S3
D.
Naming scheme support with automatic partition discovery from Amazon S3
Answers
Suggested answer: A
Total 85 questions
Go to page: of 9