ExamGecko
Home Home / Amazon / DEA-C01

Amazon DEA-C01 Practice Test - Questions Answers, Page 12

Question list
Search
Search

List of questions

Search

Related questions











A data engineer is launching an Amazon EMR duster. The data that the data engineer needs to load into the new cluster is currently in an Amazon S3 bucket. The data engineer needs to ensure that data is encrypted both at rest and in transit.

The data that is in the S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key. The data engineer has an Amazon S3 path that has a Privacy Enhanced Mail (PEM) file.

Which solution will meet these requirements?

A.

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Create a second security configuration. Specify the Amazon S3 path of the PEM file for in-transit encryption. Create the EMR cluster, and attach both security configurations to the cluster.

A.

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Create a second security configuration. Specify the Amazon S3 path of the PEM file for in-transit encryption. Create the EMR cluster, and attach both security configurations to the cluster.

Answers
B.

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for local disk encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation.

B.

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for local disk encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation.

Answers
C.

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation.

C.

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation.

Answers
D.

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Create the EMR cluster, and attach the security configuration to the cluster.

D.

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Create the EMR cluster, and attach the security configuration to the cluster.

Answers
Suggested answer: C

Explanation:

The data engineer needs to ensure that the data in an Amazon EMR cluster is encrypted both at rest and in transit. The data in Amazon S3 is already encrypted using an AWS KMS key. To meet the requirements, the most suitable solution is to create an EMR security configuration that specifies the correct KMS key for at-rest encryption and use the PEM file for in-transit encryption.

Option C: Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation. This option configures encryption for both data at rest (using KMS keys) and data in transit (using the PEM file for SSL/TLS encryption). This approach ensures that data is fully protected during storage and transfer.

Options A, B, and D either involve creating unnecessary additional security configurations or make inaccurate assumptions about the way encryption configurations are attached.

Amazon EMR Security Configuration

Amazon S3 Encryption

A company analyzes data in a data lake every quarter to perform inventory assessments. A data engineer uses AWS Glue DataBrew to detect any personally identifiable information (PII) about customers within the data. The company's privacy policy considers some custom categories of information to be PII. However, the categories are not included in standard DataBrew data quality rules.

The data engineer needs to modify the current process to scan for the custom PII categories across multiple datasets within the data lake.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Manually review the data for custom PII categories.

A.

Manually review the data for custom PII categories.

Answers
B.

Implement custom data quality rules in Data Brew. Apply the custom rules across datasets.

B.

Implement custom data quality rules in Data Brew. Apply the custom rules across datasets.

Answers
C.

Develop custom Python scripts to detect the custom PII categories. Call the scripts from DataBrew.

C.

Develop custom Python scripts to detect the custom PII categories. Call the scripts from DataBrew.

Answers
D.

Implement regex patterns to extract PII information from fields during extract transform, and load (ETL) operations into the data lake.

D.

Implement regex patterns to extract PII information from fields during extract transform, and load (ETL) operations into the data lake.

Answers
Suggested answer: B

Explanation:

The data engineer needs to detect custom categories of PII within the data lake using AWS Glue DataBrew. While DataBrew provides standard data quality rules, the solution must support custom PII categories.

Option B: Implement custom data quality rules in DataBrew. Apply the custom rules across datasets. This option is the most efficient because DataBrew allows the creation of custom data quality rules that can be applied to detect specific data patterns, including custom PII categories. This approach minimizes operational overhead while ensuring that the specific privacy requirements are met.

Options A, C, and D either involve manual intervention or developing custom scripts, both of which increase operational effort compared to using DataBrew's built-in capabilities.

AWS Glue DataBrew Documentation

A marketing company uses Amazon S3 to store marketing data. The company uses versioning in some buckets. The company runs several jobs to read and load data into the buckets.

To help cost-optimize its storage, the company wants to gather information about incomplete multipart uploads and outdated versions that are present in the S3 buckets.

Which solution will meet these requirements with the LEAST operational effort?

A.

Use AWS CLI to gather the information.

A.

Use AWS CLI to gather the information.

Answers
B.

Use Amazon S3 Inventory configurations reports to gather the information.

B.

Use Amazon S3 Inventory configurations reports to gather the information.

Answers
C.

Use the Amazon S3 Storage Lens dashboard to gather the information.

C.

Use the Amazon S3 Storage Lens dashboard to gather the information.

Answers
D.

Use AWS usage reports for Amazon S3 to gather the information.

D.

Use AWS usage reports for Amazon S3 to gather the information.

Answers
Suggested answer: B

Explanation:

The company wants to gather information about incomplete multipart uploads and outdated versions in its Amazon S3 buckets to optimize storage costs.

Option B: Use Amazon S3 Inventory configurations reports to gather the information. S3 Inventory provides reports that can list incomplete multipart uploads and versions of objects stored in S3. It offers an easy, automated way to track object metadata across buckets, including data necessary for cost optimization, without manual effort.

Options A (AWS CLI), C (S3 Storage Lens), and D (usage reports) either do not specifically gather the required information about incomplete uploads and outdated versions or require more manual intervention.

Amazon S3 Inventory Documentation

A telecommunications company collects network usage data throughout each day at a rate of several thousand data points each second. The company runs an application to process the usage data in real time. The company aggregates and stores the data in an Amazon Aurora DB instance.

Sudden drops in network usage usually indicate a network outage. The company must be able to identify sudden drops in network usage so the company can take immediate remedial actions.

Which solution will meet this requirement with the LEAST latency?

A.

Create an AWS Lambda function to query Aurora for drops in network usage. Use Amazon EventBridge to automatically invoke the Lambda function every minute.

A.

Create an AWS Lambda function to query Aurora for drops in network usage. Use Amazon EventBridge to automatically invoke the Lambda function every minute.

Answers
B.

Modify the processing application to publish the data to an Amazon Kinesis data stream. Create an Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) application to detect drops in network usage.

B.

Modify the processing application to publish the data to an Amazon Kinesis data stream. Create an Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) application to detect drops in network usage.

Answers
C.

Replace the Aurora database with an Amazon DynamoDB table. Create an AWS Lambda function to query the DynamoDB table for drops in network usage every minute. Use DynamoDB Accelerator (DAX) between the processing application and DynamoDB table.

C.

Replace the Aurora database with an Amazon DynamoDB table. Create an AWS Lambda function to query the DynamoDB table for drops in network usage every minute. Use DynamoDB Accelerator (DAX) between the processing application and DynamoDB table.

Answers
D.

Create an AWS Lambda function within the Database Activity Streams feature of Aurora to detect drops in network usage.

D.

Create an AWS Lambda function within the Database Activity Streams feature of Aurora to detect drops in network usage.

Answers
Suggested answer: B

Explanation:

The telecommunications company needs a low-latency solution to detect sudden drops in network usage from real-time data collected throughout the day.

Option B: Modify the processing application to publish the data to an Amazon Kinesis data stream. Create an Amazon Managed Service for Apache Flink (Amazon Kinesis Data Analytics) application to detect drops in network usage. Using Amazon Kinesis with Managed Service for Apache Flink (formerly Kinesis Data Analytics) is ideal for real-time stream processing with minimal latency. Flink can analyze the incoming data stream in real-time and detect anomalies, such as sudden drops in usage, which makes it the best fit for this scenario.

Other options (A, C, and D) either introduce unnecessary delays (e.g., querying databases) or do not provide the same real-time, low-latency processing that is critical for this use case.

Amazon Kinesis Data Analytics for Apache Flink

Amazon Kinesis Documentation

A company maintains a data warehouse in an on-premises Oracle database. The company wants to build a data lake on AWS. The company wants to load data warehouse tables into Amazon S3 and synchronize the tables with incremental data that arrives from the data warehouse every day.

Each table has a column that contains monotonically increasing values. The size of each table is less than 50 GB. The data warehouse tables are refreshed every night between 1 AM and 2 AM. A business intelligence team queries the tables between 10 AM and 8 PM every day.

Which solution will meet these requirements in the MOST operationally efficient way?

A.

Use an AWS Database Migration Service (AWS DMS) full load plus CDC job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3. Use custom logic in AWS Glue to append the daily incremental data to a full-load copy that is in Amazon S3.

A.

Use an AWS Database Migration Service (AWS DMS) full load plus CDC job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3. Use custom logic in AWS Glue to append the daily incremental data to a full-load copy that is in Amazon S3.

Answers
B.

Use an AWS Glue Java Database Connectivity (JDBC) connection. Configure a job bookmark for a column that contains monotonically increasing values. Write custom logic to append the daily incremental data to a full-load copy that is in Amazon S3.

B.

Use an AWS Glue Java Database Connectivity (JDBC) connection. Configure a job bookmark for a column that contains monotonically increasing values. Write custom logic to append the daily incremental data to a full-load copy that is in Amazon S3.

Answers
C.

Use an AWS Database Migration Service (AWS DMS) full load migration to load the data warehouse tables into Amazon S3 every day Overwrite the previous day's full-load copy every day.

C.

Use an AWS Database Migration Service (AWS DMS) full load migration to load the data warehouse tables into Amazon S3 every day Overwrite the previous day's full-load copy every day.

Answers
D.

Use AWS Glue to load a full copy of the data warehouse tables into Amazon S3 every day. Overwrite the previous day's full-load copy every day.

D.

Use AWS Glue to load a full copy of the data warehouse tables into Amazon S3 every day. Overwrite the previous day's full-load copy every day.

Answers
Suggested answer: A

Explanation:

The company needs to load data warehouse tables into Amazon S3 and perform incremental synchronization with daily updates. The most efficient solution is to use AWS Database Migration Service (AWS DMS) with a combination of full load and change data capture (CDC) to handle the initial load and daily incremental updates.

Option A: Use an AWS Database Migration Service (AWS DMS) full load plus CDC job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3. Use custom logic in AWS Glue to append the daily incremental data to a full-load copy that is in Amazon S3. DMS is designed to migrate databases to AWS, and the combination of full load plus CDC is ideal for handling incremental data changes efficiently. AWS Glue can then be used to append the incremental data to the full data set in S3. This solution is highly operationally efficient because it automates both the full load and incremental updates.

Options B, C, and D are less operationally efficient because they either require writing custom logic to handle bookmarks manually or involve unnecessary daily full loads.

AWS Database Migration Service Documentation

AWS Glue Documentation

A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of tiles into a tact table that is in a Redshift cluster.

The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the tact table.

Which solution will meet these requirements?

A.

Use multiple COPY commands to load the data into the Redshift cluster.

A.

Use multiple COPY commands to load the data into the Redshift cluster.

Answers
B.

Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.

B.

Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.

Answers
C.

Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.

C.

Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.

Answers
D.

Use a single COPY command to load the data into the Redshift cluster.

D.

Use a single COPY command to load the data into the Redshift cluster.

Answers
Suggested answer: D

Explanation:

To achieve the highest throughput and efficiently use cluster resources while loading data into an Amazon Redshift cluster, the optimal approach is to use a single COPY command that ingests data in parallel.

Option D: Use a single COPY command to load the data into the Redshift cluster. The COPY command is designed to load data from multiple files in parallel into a Redshift table, using all the cluster nodes to optimize the load process. Redshift is optimized for parallel processing, and a single COPY command can load multiple files at once, maximizing throughput.

Options A, B, and C either involve unnecessary complexity or inefficient approaches, such as using multiple COPY commands or INSERT statements, which are not optimized for bulk loading.

Amazon Redshift COPY Command Documentation

A company has a data warehouse that contains a table that is named Sales. The company stores the table in Amazon Redshift The table includes a column that is named city_name. The company wants to query the table to find all rows that have a city_name that starts with 'San' or 'El.'

Which SQL query will meet this requirement?

A.

Select * from Sales where city_name - '$(San|EI)';

A.

Select * from Sales where city_name - '$(San|EI)';

Answers
B.

Select * from Sales where city_name -, ^(San|EI) *';

B.

Select * from Sales where city_name -, ^(San|EI) *';

Answers
C.

Select * from Sales where city_name - '$(San&EI)';

C.

Select * from Sales where city_name - '$(San&EI)';

Answers
D.

Select * from Sales where city_name -, ^(San&EI)';

D.

Select * from Sales where city_name -, ^(San&EI)';

Answers
Suggested answer: B

Explanation:

To query the Sales table in Amazon Redshift for city names that start with 'San' or 'El,' the appropriate query uses a regular expression (regex) pattern to match city names that begin with those prefixes.

Option B: Select * from Sales where city_name ~ '^(San|El)'; In Amazon Redshift, the ~ operator is used to perform pattern matching using regular expressions. The ^(San|El) pattern matches city names that start with 'San' or 'El.' This is the correct SQL syntax for this use case.

Other options (A, C, D) contain incorrect syntax or incorrect use of special characters, making them invalid queries.

Amazon Redshift Pattern Matching Documentation

A company plans to use Amazon Kinesis Data Firehose to store data in Amazon S3. The source data consists of 2 MB csv files. The company must convert the .csv files to JSON format. The company must store the files in Apache Parquet format.

Which solution will meet these requirements with the LEAST development effort?

A.

Use Kinesis Data Firehose to convert the csv files to JSON. Use an AWS Lambda function to store the files in Parquet format.

A.

Use Kinesis Data Firehose to convert the csv files to JSON. Use an AWS Lambda function to store the files in Parquet format.

Answers
B.

Use Kinesis Data Firehose to convert the csv files to JSON and to store the files in Parquet format.

B.

Use Kinesis Data Firehose to convert the csv files to JSON and to store the files in Parquet format.

Answers
C.

Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON and stores the files in Parquet format.

C.

Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON and stores the files in Parquet format.

Answers
D.

Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON. Use Kinesis Data Firehose to store the files in Parquet format.

D.

Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON. Use Kinesis Data Firehose to store the files in Parquet format.

Answers
Suggested answer: B

Explanation:

The company wants to use Amazon Kinesis Data Firehose to transform CSV files into JSON format and store the files in Apache Parquet format with the least development effort.

Option B: Use Kinesis Data Firehose to convert the CSV files to JSON and to store the files in Parquet format. Kinesis Data Firehose supports data format conversion natively, including converting incoming CSV data to JSON format and storing the resulting files in Parquet format in Amazon S3. This solution requires the least development effort because it uses built-in transformation features of Kinesis Data Firehose.

Other options (A, C, D) involve invoking AWS Lambda functions, which would introduce additional complexity and development effort compared to Kinesis Data Firehose's native format conversion capabilities.

Amazon Kinesis Data Firehose Documentation

A data engineer maintains a materialized view that is based on an Amazon Redshift database. The view has a column named load_date that stores the date when each row was loaded.

The data engineer needs to reclaim database storage space by deleting all the rows from the materialized view.

Which command will reclaim the MOST database storage space?

A.

Option A

A.

Option A

Answers
B.

Option B

B.

Option B

Answers
C.

Option C

C.

Option C

Answers
D.

Option D

D.

Option D

Answers
Suggested answer: A

Explanation:

To reclaim the most storage space from a materialized view in Amazon Redshift, you should use a DELETE operation that removes all rows from the view. The most efficient way to remove all rows is to use a condition that always evaluates to true, such as 1=1. This will delete all rows without needing to evaluate each row individually based on specific column values like load_date.

Option A: DELETE FROM materialized_view_name WHERE 1=1; This statement will delete all rows in the materialized view and free up the space. Since materialized views in Redshift store precomputed data, performing a DELETE operation will remove all stored rows.

Other options either involve inappropriate SQL statements (e.g., VACUUM in option C is used for reclaiming storage space in tables, not materialized views), or they don't remove data effectively in the context of a materialized view (e.g., TRUNCATE cannot be used directly on a materialized view).

Amazon Redshift Materialized Views Documentation

Deleting Data from Redshift

A company wants to migrate data from an Amazon RDS for PostgreSQL DB instance in the eu-east-1 Region of an AWS account named Account_A. The company will migrate the data to an Amazon Redshift cluster in the eu-west-1 Region of an AWS account named Account_B. Which solution will give AWS Database Migration Service (AWS DMS) the ability to replicate data between two data stores?

A.

Set up an AWS DMS replication instance in Account_B in eu-west-1.

A.

Set up an AWS DMS replication instance in Account_B in eu-west-1.

Answers
B.

Set up an AWS DMS replication instance in Account_B in eu-east-1.

B.

Set up an AWS DMS replication instance in Account_B in eu-east-1.

Answers
C.

Set up an AWS DMS replication instance in a new AWS account in eu-west-1

C.

Set up an AWS DMS replication instance in a new AWS account in eu-west-1

Answers
D.

Set up an AWS DMS replication instance in Account_A in eu-east-1.

D.

Set up an AWS DMS replication instance in Account_A in eu-east-1.

Answers
Suggested answer: A

Explanation:

To migrate data from an Amazon RDS for PostgreSQL DB instance in the eu-east-1 Region (Account_A) to an Amazon Redshift cluster in the eu-west-1 Region (Account_B), AWS DMS needs a replication instance located in the target region (in this case, eu-west-1) to facilitate the data transfer between regions.

Option A: Set up an AWS DMS replication instance in Account_B in eu-west-1. Placing the DMS replication instance in the target account and region (Account_B in eu-west-1) is the most efficient solution. The replication instance can connect to the source RDS PostgreSQL in eu-east-1 and migrate the data to the Redshift cluster in eu-west-1. This setup ensures data is replicated across AWS accounts and regions.

Options B, C, and D place the replication instance in either the wrong account or region, which increases complexity without adding any benefit.

AWS Database Migration Service (DMS) Documentation

Cross-Region and Cross-Account Replication

Total 129 questions
Go to page: of 13