ExamGecko
Home Home / Amazon / DEA-C01

Amazon DEA-C01 Practice Test - Questions Answers, Page 11

Question list
Search
Search

List of questions

Search

Related questions











A company hosts its applications on Amazon EC2 instances. The company must use SSL/TLS connections that encrypt data in transit to communicate securely with AWS infrastructure that is managed by a customer.

A data engineer needs to implement a solution to simplify the generation, distribution, and rotation of digital certificates. The solution must automatically renew and deploy SSL/TLS certificates.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Store self-managed certificates on the EC2 instances.

A.

Store self-managed certificates on the EC2 instances.

Answers
B.

Use AWS Certificate Manager (ACM).

B.

Use AWS Certificate Manager (ACM).

Answers
C.

Implement custom automation scripts in AWS Secrets Manager.

C.

Implement custom automation scripts in AWS Secrets Manager.

Answers
D.

Use Amazon Elastic Container Service (Amazon ECS) Service Connect.

D.

Use Amazon Elastic Container Service (Amazon ECS) Service Connect.

Answers
Suggested answer: B

Explanation:

The best solution for managing SSL/TLS certificates on EC2 instances with minimal operational overhead is to use AWS Certificate Manager (ACM). ACM simplifies certificate management by automating the provisioning, renewal, and deployment of certificates.

AWS Certificate Manager (ACM):

ACM manages SSL/TLS certificates for EC2 and other AWS resources, including automatic certificate renewal. This reduces the need for manual management and avoids operational complexity.

ACM also integrates with other AWS services to simplify secure connections between AWS infrastructure and customer-managed environments.

Alternatives Considered:

A (Self-managed certificates): Managing certificates manually on EC2 instances increases operational overhead and lacks automatic renewal.

C (Secrets Manager automation): While Secrets Manager can store keys and certificates, it requires custom automation for rotation and does not handle SSL/TLS certificates directly.

D (ECS Service Connect): This is unrelated to SSL/TLS certificate management and would not address the operational need.

AWS Certificate Manager Documentation

A company saves customer data to an Amazon S3 bucket. The company uses server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the bucket. The dataset includes personally identifiable information (PII) such as social security numbers and account details.

Data that is tagged as PII must be masked before the company uses customer data for analysis. Some users must have secure access to the PII data during the preprocessing phase. The company needs a low-maintenance solution to mask and secure the PII data throughout the entire engineering pipeline.

Which combination of solutions will meet these requirements? (Select TWO.)

A.

Use AWS Glue DataBrew to perform extract, transform, and load (ETL) tasks that mask the PII data before analysis.

A.

Use AWS Glue DataBrew to perform extract, transform, and load (ETL) tasks that mask the PII data before analysis.

Answers
B.

Use Amazon GuardDuty to monitor access patterns for the PII data that is used in the engineering pipeline.

B.

Use Amazon GuardDuty to monitor access patterns for the PII data that is used in the engineering pipeline.

Answers
C.

Configure an Amazon Made discovery job for the S3 bucket.

C.

Configure an Amazon Made discovery job for the S3 bucket.

Answers
D.

Use AWS Identity and Access Management (IAM) to manage permissions and to control access to the PII data.

D.

Use AWS Identity and Access Management (IAM) to manage permissions and to control access to the PII data.

Answers
E.

Write custom scripts in an application to mask the PII data and to control access.

E.

Write custom scripts in an application to mask the PII data and to control access.

Answers
Suggested answer: A, D

Explanation:

To address the requirement of masking PII data and ensuring secure access throughout the data pipeline, the combination of AWS Glue DataBrew and IAM provides a low-maintenance solution.

A . AWS Glue DataBrew for Masking:

AWS Glue DataBrew provides a visual tool to perform data transformations, including masking PII data. It allows for easy configuration of data transformation tasks without requiring manual coding, making it ideal for this use case.

D . AWS Identity and Access Management (IAM):

Using IAM policies allows fine-grained control over access to PII data, ensuring that only authorized users can view or process sensitive data during the pipeline stages.

Alternatives Considered:

B (Amazon GuardDuty): GuardDuty is for threat detection and does not handle data masking or access control for PII.

C (Amazon Macie): Macie can help discover sensitive data but does not handle the masking of PII or access control.

E (Custom scripts): Custom scripting increases the operational burden compared to a built-in solution like DataBrew.

AWS Glue DataBrew for Data Masking

IAM Policies for PII Access Control

A data engineer needs to onboard a new data producer into AWS. The data producer needs to migrate data products to AWS.

The data producer maintains many data pipelines that support a business application. Each pipeline must have service accounts and their corresponding credentials. The data engineer must establish a secure connection from the data producer's on-premises data center to AWS. The data engineer must not use the public internet to transfer data from an on-premises data center to AWS.

Which solution will meet these requirements?

A.

Instruct the new data producer to create Amazon Machine Images (AMIs) on Amazon Elastic Container Service (Amazon ECS) to store the code base of the application. Create security groups in a public subnet that allow connections only to the on-premises data center.

A.

Instruct the new data producer to create Amazon Machine Images (AMIs) on Amazon Elastic Container Service (Amazon ECS) to store the code base of the application. Create security groups in a public subnet that allow connections only to the on-premises data center.

Answers
B.

Create an AWS Direct Connect connection to the on-premises data center. Store the service account credentials in AWS Secrets manager.

B.

Create an AWS Direct Connect connection to the on-premises data center. Store the service account credentials in AWS Secrets manager.

Answers
C.

Create a security group in a public subnet. Configure the security group to allow only connections from the CIDR blocks that correspond to the data producer. Create Amazon S3 buckets than contain presigned URLS that have one-day expiration dates.

C.

Create a security group in a public subnet. Configure the security group to allow only connections from the CIDR blocks that correspond to the data producer. Create Amazon S3 buckets than contain presigned URLS that have one-day expiration dates.

Answers
D.

Create an AWS Direct Connect connection to the on-premises data center. Store the application keys in AWS Secrets Manager. Create Amazon S3 buckets that contain resigned URLS that have one-day expiration dates.

D.

Create an AWS Direct Connect connection to the on-premises data center. Store the application keys in AWS Secrets Manager. Create Amazon S3 buckets that contain resigned URLS that have one-day expiration dates.

Answers
Suggested answer: B

Explanation:

For secure migration of data from an on-premises data center to AWS without using the public internet, AWS Direct Connect is the most secure and reliable method. Using Secrets Manager to store service account credentials ensures that the credentials are managed securely with automatic rotation.

AWS Direct Connect:

Direct Connect establishes a dedicated, private connection between the on-premises data center and AWS, avoiding the public internet. This is ideal for secure, high-speed data transfers.

AWS Secrets Manager:

Secrets Manager securely stores and rotates service account credentials, reducing operational overhead while ensuring security.

Alternatives Considered:

A (ECS with security groups): This does not address the need for a secure, private connection from the on-premises data center.

C (Public subnet with presigned URLs): This involves using the public internet, which does not meet the requirement.

D (Direct Connect with presigned URLs): While Direct Connect is correct, presigned URLs with short expiration dates are unnecessary for this use case.

AWS Direct Connect Documentation

AWS Secrets Manager Documentation

A company uses AWS Glue Data Catalog to index data that is uploaded to an Amazon S3 bucket every day. The company uses a daily batch processes in an extract, transform, and load (ETL) pipeline to upload data from external sources into the S3 bucket.

The company runs a daily report on the S3 data. Some days, the company runs the report before all the daily data has been uploaded to the S3 bucket. A data engineer must be able to send a message that identifies any incomplete data to an existing Amazon Simple Notification Service (Amazon SNS) topic.

Which solution will meet this requirement with the LEAST operational overhead?

A.

Create data quality checks for the source datasets that the daily reports use. Create a new AWS managed Apache Airflow cluster. Run the data quality checks by using Airflow tasks that run data quality queries on the columns data type and the presence of null values. Configure Airflow Directed Acyclic Graphs (DAGs) to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.

A.

Create data quality checks for the source datasets that the daily reports use. Create a new AWS managed Apache Airflow cluster. Run the data quality checks by using Airflow tasks that run data quality queries on the columns data type and the presence of null values. Configure Airflow Directed Acyclic Graphs (DAGs) to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.

Answers
B.

Create data quality checks on the source datasets that the daily reports use. Create a new Amazon EMR cluster. Use Apache Spark SQL to create Apache Spark jobs in the EMR cluster that run data quality queries on the columns data type and the presence of null values. Orchestrate the ETL pipeline by using an AWS Step Functions workflow. Configure the workflow to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.

B.

Create data quality checks on the source datasets that the daily reports use. Create a new Amazon EMR cluster. Use Apache Spark SQL to create Apache Spark jobs in the EMR cluster that run data quality queries on the columns data type and the presence of null values. Orchestrate the ETL pipeline by using an AWS Step Functions workflow. Configure the workflow to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.

Answers
C.

Create data quality checks on the source datasets that the daily reports use. Create data quality actions by using AWS Glue workflows to confirm the completeness and consistency of the datasets. Configure the data quality actions to create an event in Amazon EventBridge if a dataset is incomplete. Configure EventBridge to send the event that informs the data engineer about the incomplete datasets to the Amazon SNS topic.

C.

Create data quality checks on the source datasets that the daily reports use. Create data quality actions by using AWS Glue workflows to confirm the completeness and consistency of the datasets. Configure the data quality actions to create an event in Amazon EventBridge if a dataset is incomplete. Configure EventBridge to send the event that informs the data engineer about the incomplete datasets to the Amazon SNS topic.

Answers
D.

Create AWS Lambda functions that run data quality queries on the columns data type and the presence of null values. Orchestrate the ETL pipeline by using an AWS Step Functions workflow that runs the Lambda functions. Configure the Step Functions workflow to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.

D.

Create AWS Lambda functions that run data quality queries on the columns data type and the presence of null values. Orchestrate the ETL pipeline by using an AWS Step Functions workflow that runs the Lambda functions. Configure the Step Functions workflow to send an email notification that informs the data engineer about the incomplete datasets to the SNS topic.

Answers
Suggested answer: C

Explanation:

AWS Glue workflows are designed to orchestrate the ETL pipeline, and you can create data quality checks to ensure the uploaded datasets are complete before running reports. If there is an issue with the data, AWS Glue workflows can trigger an Amazon EventBridge event that sends a message to an SNS topic.

AWS Glue Workflows:

AWS Glue workflows allow users to automate and monitor complex ETL processes. You can include data quality actions to check for null values, data types, and other consistency checks.

In the event of incomplete data, an EventBridge event can be generated to notify via SNS.

Alternatives Considered:

A (Airflow cluster): Managed Airflow introduces more operational overhead and complexity compared to Glue workflows.

B (EMR cluster): Setting up an EMR cluster is also more complex compared to the Glue-centric solution.

D (Lambda functions): While Lambda functions can work, using Glue workflows offers a more integrated and lower operational overhead solution.

AWS Glue Workflow Documentation

Two developers are working on separate application releases. The developers have created feature branches named Branch A and Branch B by using a GitHub repository's master branch as the source.

The developer for Branch A deployed code to the production system. The code for Branch B will merge into a master branch in the following week's scheduled application release.

Which command should the developer for Branch B run before the developer raises a pull request to the master branch?

A.

git diff branchB master git commit -m <message>

A.

git diff branchB master git commit -m <message>

Answers
B.

git pull master

B.

git pull master

Answers
C.

git rebase master

C.

git rebase master

Answers
D.

git fetch -b master

D.

git fetch -b master

Answers
Suggested answer: C

Explanation:

To ensure that Branch B is up to date with the latest changes in the master branch before submitting a pull request, the correct approach is to perform a git rebase. This command rewrites the commit history so that Branch B will be based on the latest changes in the master branch.

git rebase master:

This command moves the commits of Branch B to be based on top of the latest state of the master branch. It allows the developer to resolve any conflicts and create a clean history.

Alternatives Considered:

A (git diff): This will only show differences between Branch B and master but won't resolve conflicts or bring Branch B up to date.

B (git pull master): Pulling the master branch directly does not offer the same clean history management as rebase.

D (git fetch -b): This is an incorrect command.

Git Rebase Best Practices

A company needs a solution to manage costs for an existing Amazon DynamoDB table. The company also needs to control the size of the table. The solution must not disrupt any ongoing read or write operations. The company wants to use a solution that automatically deletes data from the table after 1 month.

Which solution will meet these requirements with the LEAST ongoing maintenance?

A.

Use the DynamoDB TTL feature to automatically expire data based on timestamps.

A.

Use the DynamoDB TTL feature to automatically expire data based on timestamps.

Answers
B.

Configure a scheduled Amazon EventBridge rule to invoke an AWS Lambda function to check for data that is older than 1 month. Configure the Lambda function to delete old data.

B.

Configure a scheduled Amazon EventBridge rule to invoke an AWS Lambda function to check for data that is older than 1 month. Configure the Lambda function to delete old data.

Answers
C.

Configure a stream on the DynamoDB table to invoke an AWS Lambda function. Configure the Lambda function to delete data in the table that is older than 1 month.

C.

Configure a stream on the DynamoDB table to invoke an AWS Lambda function. Configure the Lambda function to delete data in the table that is older than 1 month.

Answers
D.

Use an AWS Lambda function to periodically scan the DynamoDB table for data that is older than 1 month. Configure the Lambda function to delete old data.

D.

Use an AWS Lambda function to periodically scan the DynamoDB table for data that is older than 1 month. Configure the Lambda function to delete old data.

Answers
Suggested answer: A

Explanation:

The requirement is to manage the size of an Amazon DynamoDB table by automatically deleting data older than 1 month without disrupting ongoing read or write operations. The simplest and most maintenance-free solution is to use DynamoDB Time-to-Live (TTL).

Option A: Use the DynamoDB TTL feature to automatically expire data based on timestamps. DynamoDB TTL allows you to specify an attribute (e.g., a timestamp) that defines when items in the table should expire. After the expiration time, DynamoDB automatically deletes the items, freeing up storage space and keeping the table size under control without manual intervention or disruptions to ongoing operations.

Other options involve higher maintenance and manual scheduling or scanning operations, which increase complexity unnecessarily compared to the native TTL feature.

DynamoDB Time-to-Live (TTL)

A company stores CSV files in an Amazon S3 bucket. A data engineer needs to process the data in the CSV files and store the processed data in a new S3 bucket.

The process needs to rename a column, remove specific columns, ignore the second row of each file, create a new column based on the values of the first row of the data, and filter the results by a numeric value of a column.

Which solution will meet these requirements with the LEAST development effort?

A.

Use AWS Glue Python jobs to read and transform the CSV files.

A.

Use AWS Glue Python jobs to read and transform the CSV files.

Answers
B.

Use an AWS Glue custom crawler to read and transform the CSV files.

B.

Use an AWS Glue custom crawler to read and transform the CSV files.

Answers
C.

Use an AWS Glue workflow to build a set of jobs to crawl and transform the CSV files.

C.

Use an AWS Glue workflow to build a set of jobs to crawl and transform the CSV files.

Answers
D.

Use AWS Glue DataBrew recipes to read and transform the CSV files.

D.

Use AWS Glue DataBrew recipes to read and transform the CSV files.

Answers
Suggested answer: D

Explanation:

The requirement involves transforming CSV files by renaming columns, removing rows, and other operations with minimal development effort. AWS Glue DataBrew is the best solution here because it allows you to visually create transformation recipes without writing extensive code.

Option D: Use AWS Glue DataBrew recipes to read and transform the CSV files. DataBrew provides a visual interface where you can build transformation steps (e.g., renaming columns, filtering rows, creating new columns, etc.) as a 'recipe' that can be applied to datasets, making it easy to handle complex transformations on CSV files with minimal coding.

Other options (A, B, C) involve more manual development and configuration effort (e.g., writing Python jobs or creating custom workflows in Glue) compared to the low-code/no-code approach of DataBrew.

AWS Glue DataBrew Documentation

A company uses Amazon Redshift as its data warehouse. Data encoding is applied to the existing tables of the data warehouse. A data engineer discovers that the compression encoding applied to some of the tables is not the best fit for the data.

The data engineer needs to improve the data encoding for the tables that have sub-optimal encoding.

Which solution will meet this requirement?

A.

Run the ANALYZE command against the identified tables. Manually update the compression encoding of columns based on the output of the command.

A.

Run the ANALYZE command against the identified tables. Manually update the compression encoding of columns based on the output of the command.

Answers
B.

Run the ANALYZE COMPRESSION command against the identified tables. Manually update the compression encoding of columns based on the output of the command.

B.

Run the ANALYZE COMPRESSION command against the identified tables. Manually update the compression encoding of columns based on the output of the command.

Answers
C.

Run the VACUUM REINDEX command against the identified tables.

C.

Run the VACUUM REINDEX command against the identified tables.

Answers
D.

Run the VACUUM RECLUSTER command against the identified tables.

D.

Run the VACUUM RECLUSTER command against the identified tables.

Answers
Suggested answer: B

Explanation:

To improve data encoding for Amazon Redshift tables where sub-optimal encoding has been applied, the correct approach is to analyze the table to determine the optimal encoding based on the data distribution and characteristics.

Option B: Run the ANALYZE COMPRESSION command against the identified tables. Manually update the compression encoding of columns based on the output of the command. The ANALYZE COMPRESSION command in Amazon Redshift analyzes the columnar data and suggests the best compression encoding for each column. The output provides recommendations for changing the current encoding to improve storage efficiency and query performance. After analyzing, you can manually apply the recommended encoding to the columns.

Option A (ANALYZE command) is incorrect because it is primarily used to update statistics on tables, not to analyze or suggest compression encoding.

Options C and D (VACUUM commands) deal with reclaiming disk space and reorganizing data, not optimizing compression encoding.

Amazon Redshift ANALYZE COMPRESSION Command

A retail company is using an Amazon Redshift cluster to support real-time inventory management. The company has deployed an ML model on a real-time endpoint in Amazon SageMaker.

The company wants to make real-time inventory recommendations. The company also wants to make predictions about future inventory needs.

Which solutions will meet these requirements? (Select TWO.)

A.

Use Amazon Redshift ML to generate inventory recommendations.

A.

Use Amazon Redshift ML to generate inventory recommendations.

Answers
B.

Use SQL to invoke a remote SageMaker endpoint for prediction.

B.

Use SQL to invoke a remote SageMaker endpoint for prediction.

Answers
C.

Use Amazon Redshift ML to schedule regular data exports for offline model training.

C.

Use Amazon Redshift ML to schedule regular data exports for offline model training.

Answers
D.

Use SageMaker Autopilot to create inventory management dashboards in Amazon Redshift.

D.

Use SageMaker Autopilot to create inventory management dashboards in Amazon Redshift.

Answers
E.

Use Amazon Redshift as a file storage system to archive old inventory management reports.

E.

Use Amazon Redshift as a file storage system to archive old inventory management reports.

Answers
Suggested answer: A, B

Explanation:

The company needs to use machine learning models for real-time inventory recommendations and future inventory predictions while leveraging both Amazon Redshift and Amazon SageMaker.

Option A: Use Amazon Redshift ML to generate inventory recommendations. Amazon Redshift ML allows you to build, train, and deploy machine learning models directly from Redshift using SQL statements. It integrates with SageMaker to train models and run inference. This feature is useful for generating inventory recommendations directly from the data stored in Redshift.

Option B: Use SQL to invoke a remote SageMaker endpoint for prediction. You can use SQL in Redshift to call a SageMaker endpoint for real-time inference. By invoking a SageMaker endpoint from within Redshift, the company can get real-time predictions on inventory, allowing for integration between the data warehouse and the machine learning model hosted in SageMaker.

Option C (offline model training) and Option D (creating dashboards with SageMaker Autopilot) are not relevant to the real-time prediction and recommendation requirements.

Option E (archiving inventory reports in Redshift) is not related to making predictions or recommendations.

Amazon Redshift ML Documentation

Invoking SageMaker Endpoints from SQL

A company implements a data mesh that has a central governance account. The company needs to catalog all data in the governance account. The governance account uses AWS Lake Formation to centrally share data and grant access permissions.

The company has created a new data product that includes a group of Amazon Redshift Serverless tables. A data engineer needs to share the data product with a marketing team. The marketing team must have access to only a subset of columns. The data engineer needs to share the same data product with a compliance team. The compliance team must have access to a different subset of columns than the marketing team needs access to.

Which combination of steps should the data engineer take to meet these requirements? (Select TWO.)

A.

Create views of the tables that need to be shared. Include only the required columns.

A.

Create views of the tables that need to be shared. Include only the required columns.

Answers
B.

Create an Amazon Redshift data than that includes the tables that need to be shared.

B.

Create an Amazon Redshift data than that includes the tables that need to be shared.

Answers
C.

Create an Amazon Redshift managed VPC endpoint in the marketing team's account. Grant the marketing team access to the views.

C.

Create an Amazon Redshift managed VPC endpoint in the marketing team's account. Grant the marketing team access to the views.

Answers
D.

Share the Amazon Redshift data share to the Lake Formation catalog in the governance account.

D.

Share the Amazon Redshift data share to the Lake Formation catalog in the governance account.

Answers
E.

Share the Amazon Redshift data share to the Amazon Redshift Serverless workgroup in the marketing team's account.

E.

Share the Amazon Redshift data share to the Amazon Redshift Serverless workgroup in the marketing team's account.

Answers
Suggested answer: A, E

Explanation:

The company is using a data mesh architecture with AWS Lake Formation for governance and needs to share specific subsets of data with different teams (marketing and compliance) using Amazon Redshift Serverless.

Option A: Create views of the tables that need to be shared. Include only the required columns. Creating views in Amazon Redshift that include only the necessary columns allows for fine-grained access control. This method ensures that each team has access to only the data they are authorized to view.

Option E: Share the Amazon Redshift data share to the Amazon Redshift Serverless workgroup in the marketing team's account. Amazon Redshift data sharing enables live access to data across Redshift clusters or Serverless workgroups. By sharing data with specific workgroups, you can ensure that the marketing team and compliance team each access the relevant subset of data based on the views created.

Option B (creating a Redshift data share) is close but does not address the fine-grained column-level access.

Option C (creating a managed VPC endpoint) is unnecessary for sharing data with specific teams.

Option D (sharing with the Lake Formation catalog) is incorrect because Redshift data shares do not integrate directly with Lake Formation catalogs; they are specific to Redshift workgroups.

Amazon Redshift Data Sharing

AWS Lake Formation Documentation

Total 129 questions
Go to page: of 13