ExamGecko
Home Home / Amazon / DAS-C01

Amazon DAS-C01 Practice Test - Questions Answers, Page 20

Question list
Search
Search

List of questions

Search

Related questions











A central government organization is collecting events from various internal applications using Amazon Managed Streaming for Apache Kafka (Amazon MSK). The organization has configured a separate Kafka topic for each application to separate the data. For security reasons, the Kafka cluster has been configured to only allow TLS encrypted data and it encrypts the data at rest.

A recent application update showed that one of the applications was configured incorrectly, resulting in writing data to a Kafka topic that belongs to another application. This resulted in multiple errors in the analytics pipeline as data from different applications appeared on the same topic. After this incident, the organization wants to prevent applications from writing to a topic different than the one they should write to.

Which solution meets these requirements with the least amount of effort?

A.
Create a different Amazon EC2 security group for each application. Configure each security group to have access to a specific topic in the Amazon MSK cluster. Attach the security group to each application based on the topic that the applications should read and write to.
A.
Create a different Amazon EC2 security group for each application. Configure each security group to have access to a specific topic in the Amazon MSK cluster. Attach the security group to each application based on the topic that the applications should read and write to.
Answers
B.
Install Kafka Connect on each application instance and configure each Kafka Connect instance to write to a specific topic only.
B.
Install Kafka Connect on each application instance and configure each Kafka Connect instance to write to a specific topic only.
Answers
C.
Use Kafka ACLs and configure read and write permissions for each topic. Use the distinguished name of the clients' TLS certificates as the principal of the ACL.
C.
Use Kafka ACLs and configure read and write permissions for each topic. Use the distinguished name of the clients' TLS certificates as the principal of the ACL.
Answers
D.
Create a different Amazon EC2 security group for each application. Create an Amazon MSK cluster and Kafka topic for each application. Configure each security group to have access to the specific cluster.
D.
Create a different Amazon EC2 security group for each application. Create an Amazon MSK cluster and Kafka topic for each application. Configure each security group to have access to the specific cluster.
Answers
Suggested answer: C

Explanation:

Kafka ACLs are a way to control access to Kafka resources, such as topics, consumer groups, or clusters, based on the principal of the client1. A principal is an identity that can be authenticated by Kafka.In this case, since the Kafka cluster only allows TLS encrypted data, the principal can be derived from the distinguished name of the clients' TLS certificates2.For example, if a client has a TLS certificate with the subject name CN=app1.example.com,OU=IT,O=Org,L=City,ST=State,C=US, then the principal name will be in the form of CN=app1.example.com,OU=IT,O=Org,L=City,ST=State,C=US3.

The organization can use Kafka ACLs to configure read and write permissions for each topic. For example, to allow only app1 to write to topic1, the organization can use the following command:

kafka-acls --authorizer-properties zookeeper.connect=zookeeper:2181 --add --allow-principal User:CN=app1.example.com,OU=IT,O=Org,L=City,ST=State,C=US --operation Write --topic topic1

Similarly, to allow only app2 to read from topic2, the organization can use the following command:

kafka-acls --authorizer-properties zookeeper.connect=zookeeper:2181 --add --allow-principal User:CN=app2.example.com,OU=IT,O=Org,L=City,ST=State,C=US --operation Read --topic topic2

By using Kafka ACLs, the organization can prevent applications from writing to a topic different than the one they should write to. If an application tries to write to a topic that it does not have permission for, it will get an authorization error.

A company uses Amazon Redshift as its data warehouse. The Redshift cluster is not encrypted. A data analytics specialist needs to use hardware security module (HSM) managed encryption keys to encrypt the data that is stored in the Redshift cluster.

Which combination of steps will meet these requirements? (Select THREE.)

A.
Stop all write operations on the source cluster. Unload data from the source cluster.
A.
Stop all write operations on the source cluster. Unload data from the source cluster.
Answers
B.
Copy the data to a new target cluster that is encrypted with AWS Key Management Service (AWS KMS).
B.
Copy the data to a new target cluster that is encrypted with AWS Key Management Service (AWS KMS).
Answers
C.
Modify the source cluster by activating AWS CloudHSM encryption. Configure Amazon Redshift to automatically migrate data to a new encrypted cluster.
C.
Modify the source cluster by activating AWS CloudHSM encryption. Configure Amazon Redshift to automatically migrate data to a new encrypted cluster.
Answers
D.
Modify the source cluster by activating encryption from an external HSM. Configure Amazon Redshift to automatically migrate data to a new encrypted cluster.
D.
Modify the source cluster by activating encryption from an external HSM. Configure Amazon Redshift to automatically migrate data to a new encrypted cluster.
Answers
E.
Copy the data to a new target cluster that is encrypted with an HSM from AWS CloudHSM.
E.
Copy the data to a new target cluster that is encrypted with an HSM from AWS CloudHSM.
Answers
F.
Rename the source cluster and the target cluster after the migration so that the target cluster is using the original endpoint.
F.
Rename the source cluster and the target cluster after the migration so that the target cluster is using the original endpoint.
Answers
Suggested answer: A, E, F

An analytics team uses Amazon OpenSearch Service for an analytics API to be used by data analysts. The OpenSearch Service cluster is configured with three master nodes. The analytics team uses Amazon Managed Streaming for Apache Kafka (Amazon MSK) and a customized data pipeline to ingest and store 2 months of data in an OpenSearch Service cluster. The cluster stopped responding, which is regularly causing timeout requests. The analytics team discovers the cluster is handling too many bulk indexing requests.

Which actions would improve the performance of the OpenSearch Service cluster? (Select TWO.)

A.
Reduce the number of API bulk requests on the OpenSearch Service cluster and reduce the size of each bulk request.
A.
Reduce the number of API bulk requests on the OpenSearch Service cluster and reduce the size of each bulk request.
Answers
B.
Scale out the OpenSearch Service cluster by increasing the number of nodes.
B.
Scale out the OpenSearch Service cluster by increasing the number of nodes.
Answers
C.
Reduce the number of API bulk requests on the OpenSearch Service cluster, but increase the size of each bulk request.
C.
Reduce the number of API bulk requests on the OpenSearch Service cluster, but increase the size of each bulk request.
Answers
D.
Increase the number of master nodes for the OpenSearch Service cluster. Scale down the pipeline component that is used to ingest the data into the OpenSearch Service cluster.
D.
Increase the number of master nodes for the OpenSearch Service cluster. Scale down the pipeline component that is used to ingest the data into the OpenSearch Service cluster.
Answers
Suggested answer: A, B

A data analytics specialist has a 50 GB data file in .csv format and wants to perform a data transformation task. The data analytics specialist is using the Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to perform the transformation. The resulting output will be used to query the data from Amazon Redshift Spectrum.

Which CTAS statement should the data analytics specialist use to provide the MOST efficient performance?

A.
Option A
A.
Option A
Answers
B.
Option B
B.
Option B
Answers
C.
Option C
C.
Option C
Answers
D.
Option D
D.
Option D
Answers
Suggested answer: B

A healthcare company ingests patient data from multiple data sources and stores it in an Amazon S3 staging bucket. An AWS Glue ETL job transforms the data, which is written to an S3-based data lake to be queried using Amazon Athena. The company wants to match patient records even when the records do not have a common unique identifier.

Which solution meets this requirement?

A.
Use Amazon Macie pattern matching as part of the ETLjob
A.
Use Amazon Macie pattern matching as part of the ETLjob
Answers
B.
Train and use the AWS Glue PySpark filter class in the ETLjob
B.
Train and use the AWS Glue PySpark filter class in the ETLjob
Answers
C.
Partition tables and use the ETL job to partition the data on patient name
C.
Partition tables and use the ETL job to partition the data on patient name
Answers
D.
Train and use the AWS Glue FindMatches ML transform in the ETLjob
D.
Train and use the AWS Glue FindMatches ML transform in the ETLjob
Answers
Suggested answer: D

An online food delivery company wants to optimize its storage costs. The company has been collecting operational data for the last 10 years in a data lake that was built on Amazon S3 by using a Standard storage class. The company does not keep data that is older than 7 years. The data analytics team frequently uses data from the past 6 months for reporting and runs queries on data from the last 2 years about once a month. Data that is more than 2 years old is rarely accessed and is only used for audit purposes.

Which combination of solutions will optimize the company's storage costs? (Select TWO.)

A.
Create an S3 Lifecycle configuration rule to transition data that is older than 6 months to the S3 Standard-Infrequent Access (S3 Standard-IA) storage class.
A.
Create an S3 Lifecycle configuration rule to transition data that is older than 6 months to the S3 Standard-Infrequent Access (S3 Standard-IA) storage class.
Answers
B.
Create another S3 Lifecycle configuration rule to transition data that is older than 2 years to the S3 Glacier Deep Archive storage class. Create an S3 Lifecycle configuration rule to transition data that is older than 6 months to the S3 One Zone-Infrequent Access (S3 One Zone-IA) storage class.
B.
Create another S3 Lifecycle configuration rule to transition data that is older than 2 years to the S3 Glacier Deep Archive storage class. Create an S3 Lifecycle configuration rule to transition data that is older than 6 months to the S3 One Zone-Infrequent Access (S3 One Zone-IA) storage class.
Answers
C.
Create another S3 Lifecycle configuration rule to transition data that is older than 2 years to the S3 Glacier Flexible Retrieval storage class.
C.
Create another S3 Lifecycle configuration rule to transition data that is older than 2 years to the S3 Glacier Flexible Retrieval storage class.
Answers
D.
Use the S3 Intelligent-Tiering storage class to store data instead of the S3 Standard storage class.
D.
Use the S3 Intelligent-Tiering storage class to store data instead of the S3 Standard storage class.
Answers
E.
reate an S3 Lifecycle expiration rule to delete data that is older than 7 years.
E.
reate an S3 Lifecycle expiration rule to delete data that is older than 7 years.
Answers
F.
Create an S3 Lifecycle configuration rule to transition data that is older than 7 years to the S3 Glacier Deep Archive storage class.
F.
Create an S3 Lifecycle configuration rule to transition data that is older than 7 years to the S3 Glacier Deep Archive storage class.
Answers
Suggested answer: A, B

Explanation:

These solutions are based on the following facts from the results:

The S3 Standard-IA storage class is designed for data that is accessed less frequently, but requires rapid access when needed.It offers a lower storage cost than S3 Standard, but charges a retrieval fee1. This storage class is suitable for data that is used for reporting and queries every few months, such as data that is older than 6 months but less than 2 years in this case.

The S3 Glacier Deep Archive storage class is the lowest-cost storage class and supports long-term retention and digital preservation for data that may be accessed once or twice in a year.It has a default retrieval time of 12 hours2. This storage class is suitable for data that is rarely accessed and only used for audit purposes, such as data that is older than 2 years in this case.

Creating S3 Lifecycle configuration rules to transition data to different storage classes based on their age can help optimize the storage costs by reducing the amount of data stored in higher-cost storage classes. For more information, seeManaging your storage lifecycle.

A company receives datasets from partners at various frequencies. The datasets include baseline data and incremental data. The company needs to merge and store all the datasets without reprocessing the data.

Which solution will meet these requirements with the LEAST development effort?

A.
Use an AWS Glue job with a temporary table to process the datasets. Store the data in an Amazon RDS table.
A.
Use an AWS Glue job with a temporary table to process the datasets. Store the data in an Amazon RDS table.
Answers
B.
Use an Apache Spark job in an Amazon EMR cluster to process the datasets. Store the data in EMR File System (EMRFS).
B.
Use an Apache Spark job in an Amazon EMR cluster to process the datasets. Store the data in EMR File System (EMRFS).
Answers
C.
Use an AWS Glue job with job bookmarks enabled to process the datasets. Store the data in Amazon S3.
C.
Use an AWS Glue job with job bookmarks enabled to process the datasets. Store the data in Amazon S3.
Answers
D.
Use an AWS Lambda function to process the datasets. Store the data in Amazon S3.
D.
Use an AWS Lambda function to process the datasets. Store the data in Amazon S3.
Answers
Suggested answer: C

Explanation:

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics1. It can process datasets from various sources and formats, such as JDBC, Amazon S3, Amazon RDS, etc.

AWS Glue job bookmarks are a feature that helps AWS Glue track data that has already been processed during a previous run of an ETL job.This can prevent the reprocessing of old data and enable the processing of new data when rerunning on a scheduled interval2. Job bookmarks can handle both baseline data and incremental data from different sources.

Amazon S3 is a highly scalable, durable, and secure object storage service that can store any amount and type of data3. It can be used as a data lake to store the merged and processed datasets from AWS Glue. It can also integrate with other AWS services, such as Amazon Athena, Amazon Redshift Spectrum, Amazon EMR, etc., for further analysis and processing.

A manufacturing company is storing data from its operational systems in Amazon S3. The company's business analysts need to perform one-time queries of the data in Amazon S3 with Amazon Athena. The company needs to access the Athena service from the on-premises network by using a JDBC connection. The company has created a VPC. Security policies mandate that requests to AWS services cannot traverse the internet.

Which combination of steps should a data analytics specialist take to meet these requirements? (Select TWO.)

A.
Establish an AWS Direct Connect connection between the on-premises network and the VPC.
A.
Establish an AWS Direct Connect connection between the on-premises network and the VPC.
Answers
B.
Configure the JDBC connection to connect to Athena through Amazon API Gateway.
B.
Configure the JDBC connection to connect to Athena through Amazon API Gateway.
Answers
C.
Configure the JDBC connection to use a gateway VPC endpoint for Amazon S3.
C.
Configure the JDBC connection to use a gateway VPC endpoint for Amazon S3.
Answers
D.
Configure the JDBC connection to use an interface VPC endpoint for Athena.
D.
Configure the JDBC connection to use an interface VPC endpoint for Athena.
Answers
E.
Deploy Athena within a private subnet.
E.
Deploy Athena within a private subnet.
Answers
Suggested answer: A, D

Explanation:

AWS Direct Connect is a service that establishes a dedicated network connection between your on-premises network and AWS1.It can help you reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections1. It can also help you meet the security policy that requires requests to AWS services not to traverse the internet.

An interface VPC endpoint is a type of VPC endpoint that enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink2.It is represented by one or more Elastic Network Interfaces (ENIs) with private IP addresses in your VPC subnets2. It can also help you meet the security policy that requires requests to AWS services not to traverse the internet.

Amazon Athena now provides an interface VPC endpoint that allows you to connect directly to Athena through an interface VPC endpoint in your VPC3.You can create an interface VPC endpoint to connect to Athena using the AWS console or AWS CLI commands4.You can also configure the JDBC connection to use the interface VPC endpoint for Athena by specifying the endpoint URL as the JDBC URL5.

A company has a process that writes two datasets in CSV format to an Amazon S3 bucket every 6 hours. The company needs to join the datasets, convert the data to Apache Parquet, and store the data within another bucket for users to query using Amazon Athena. The data also needs to be loaded to Amazon Redshift for advanced analytics. The company needs a solution that is resilient to the failure of any individual job component and can be restarted in case of an error.

Which solution meets these requirements with the LEAST amount of operational overhead?

A.
Use AWS Step Functions to orchestrate an Amazon EMR cluster running Apache Spark. Use PySpark to generate data frames of the datasets in Amazon S3, transform the data, join the data, write the data back to Amazon S3, and load the data to Amazon Redshift.
A.
Use AWS Step Functions to orchestrate an Amazon EMR cluster running Apache Spark. Use PySpark to generate data frames of the datasets in Amazon S3, transform the data, join the data, write the data back to Amazon S3, and load the data to Amazon Redshift.
Answers
B.
Create an AWS Glue job using Python Shell that generates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift. Use an AWS Glue workflow to orchestrate the AWS Glue job at the desired frequency.
B.
Create an AWS Glue job using Python Shell that generates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift. Use an AWS Glue workflow to orchestrate the AWS Glue job at the desired frequency.
Answers
C.
Use AWS Step Functions to orchestrate the AWS Glue job. Create an AWS Glue job using Python Shell that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift.
C.
Use AWS Step Functions to orchestrate the AWS Glue job. Create an AWS Glue job using Python Shell that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift.
Answers
D.
Create an AWS Glue job using PySpark that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift. Use an AWS Glue workflow to orchestrate the AWS Glue job.
D.
Create an AWS Glue job using PySpark that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift. Use an AWS Glue workflow to orchestrate the AWS Glue job.
Answers
Suggested answer: D

Explanation:

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics1.It can process datasets from various sources and formats, such as CSV and Parquet, and write them to different destinations, such as Amazon S3 and Amazon Redshift2.

AWS Glue provides two types of jobs: Spark and Python Shell.Spark jobs run on Apache Spark, a distributed processing framework that supports a wide range of data processing tasks3.Python Shell jobs run Python scripts on a managed serverless infrastructure4. Spark jobs are more suitable for complex data transformations and joins than Python Shell jobs.

AWS Glue provides dynamic frames, which are an extension of Apache Spark data frames. Dynamic frames handle schema variations and errors in the data more easily than data frames. They also provide a set of transformations that can be applied to the data, such as join, filter, map, etc.

AWS Glue provides workflows, which are directed acyclic graphs (DAGs) that orchestrate multiple ETL jobs and crawlers. Workflows can handle dependencies, retries, error handling, and concurrency for ETL jobs and crawlers. They can also be triggered by schedules or events.

By creating an AWS Glue job using PySpark that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift, the company can perform the required ETL tasks with a single job. By using an AWS Glue workflow to orchestrate the AWS Glue job, the company can schedule and monitor the job execution with minimal operational overhead.

An IOT company is collecting data from multiple sensors and is streaming the data to Amazon Managed Streaming for Apache Kafka (Amazon MSK). Each sensor type has its own topic, and each topic has the same number of partitions.

The company is planning to turn on more sensors. However, the company wants to evaluate which sensor types are producing the most data so that the company can scale accordingly. The company needs to know which sensor types have the largest values for the following metrics: ByteslnPerSec and MessageslnPerSec.

Which level of monitoring for Amazon MSK will meet these requirements?

A.
DEFAULT level
A.
DEFAULT level
Answers
B.
PER TOPIC PER BROKER level
B.
PER TOPIC PER BROKER level
Answers
C.
PER BROKER level
C.
PER BROKER level
Answers
D.
PER TOPIC level
D.
PER TOPIC level
Answers
Suggested answer: B
Total 214 questions
Go to page: of 22