Amazon DEA-C01 Practice Test - Questions Answers, Page 10
List of questions
Related questions
A company stores logs in an Amazon S3 bucket. When a data engineer attempts to access several log files, the data engineer discovers that some files have been unintentionally deleted.
The data engineer needs a solution that will prevent unintentional file deletion in the future.
Which solution will meet this requirement with the LEAST operational overhead?
Manually back up the S3 bucket on a regular basis.
Enable S3 Versioning for the S3 bucket.
Configure replication for the S3 bucket.
Use an Amazon S3 Glacier storage class to archive the data that is in the S3 bucket.
A company currently uses a provisioned Amazon EMR cluster that includes general purpose Amazon EC2 instances. The EMR cluster uses EMR managed scaling between one to five task nodes for the company's long-running Apache Spark extract, transform, and load (ETL) job. The company runs the ETL job every day.
When the company runs the ETL job, the EMR cluster quickly scales up to five nodes. The EMR cluster often reaches maximum CPU usage, but the memory usage remains under 30%.
The company wants to modify the EMR cluster configuration to reduce the EMR costs to run the daily ETL job.
Which solution will meet these requirements MOST cost-effectively?
Increase the maximum number of task nodes for EMR managed scaling to 10.
Change the task node type from general purpose EC2 instances to memory optimized EC2 instances.
Switch the task node type from general purpose EC2 instances to compute optimized EC2 instances.
Reduce the scaling cooldown period for the provisioned EMR cluster.
A company is building a data stream processing application. The application runs in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. The application stores processed data in an Amazon DynamoDB table.
The company needs the application containers in the EKS cluster to have secure access to the DynamoDB table. The company does not want to embed AWS credentials in the containers.
Which solution will meet these requirements?
Store the AWS credentials in an Amazon S3 bucket. Grant the EKS containers access to the S3 bucket to retrieve the credentials.
Attach an IAM role to the EKS worker nodes. Grant the IAM role access to DynamoDB. Use the IAM role to set up IAM roles service accounts (IRSA) functionality.
Create an IAM user that has an access key to access the DynamoDB table. Use environment variables in the EKS containers to store the IAM user access key data.
Create an IAM user that has an access key to access the DynamoDB table. Use Kubernetes secrets that are mounted in a volume of the EKS cluster nodes to store the user access key data.
A technology company currently uses Amazon Kinesis Data Streams to collect log data in real time. The company wants to use Amazon Redshift for downstream real-time queries and to enrich the log data.
Which solution will ingest data into Amazon Redshift with the LEAST operational overhead?
Set up an Amazon Data Firehose delivery stream to send data to a Redshift provisioned cluster table.
Set up an Amazon Data Firehose delivery stream to send data to Amazon S3. Configure a Redshift provisioned cluster to load data every minute.
Configure Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to send data directly to a Redshift provisioned cluster table.
Use Amazon Redshift streaming ingestion from Kinesis Data Streams and to present data as a materialized view.
A company has three subsidiaries. Each subsidiary uses a different data warehousing solution. The first subsidiary hosts its data warehouse in Amazon Redshift. The second subsidiary uses Teradata Vantage on AWS. The third subsidiary uses Google BigQuery.
The company wants to aggregate all the data into a central Amazon S3 data lake. The company wants to use Apache Iceberg as the table format.
A data engineer needs to build a new pipeline to connect to all the data sources, run transformations by using each source engine, join the data, and write the data to Iceberg.
Which solution will meet these requirements with the LEAST operational effort?
Use native Amazon Redshift, Teradata, and BigQuery connectors to build the pipeline in AWS Glue. Use native AWS Glue transforms to join the data. Run a Merge operation on the data lake Iceberg table.
Use the Amazon Athena federated query connectors for Amazon Redshift, Teradata, and BigQuery to build the pipeline in Athena. Write a SQL query to read from all the data sources, join the data, and run a Merge operation on the data lake Iceberg table.
Use the native Amazon Redshift connector, the Java Database Connectivity (JDBC) connector for Teradata, and the open source Apache Spark BigQuery connector to build the pipeline in Amazon EMR. Write code in PySpark to join the data. Run a Merge operation on the data lake Iceberg table.
Use the native Amazon Redshift, Teradata, and BigQuery connectors in Amazon Appflow to write data to Amazon S3 and AWS Glue Data Catalog. Use Amazon Athena to join the data. Run a Merge operation on the data lake Iceberg table.
A company has a data lake in Amazon S3. The company collects AWS CloudTrail logs for multiple applications. The company stores the logs in the data lake, catalogs the logs in AWS Glue, and partitions the logs based on the year. The company uses Amazon Athena to analyze the logs.
Recently, customers reported that a query on one of the Athena tables did not return any data. A data engineer must resolve the issue.
Which combination of troubleshooting steps should the data engineer take? (Select TWO.)
Confirm that Athena is pointing to the correct Amazon S3 location.
Increase the query timeout duration.
Use the MSCK REPAIR TABLE command.
Restart Athena.
Delete and recreate the problematic Athena table.
A company is using an AWS Transfer Family server to migrate data from an on-premises environment to AWS. Company policy mandates the use of TLS 1.2 or above to encrypt the data in transit.
Which solution will meet these requirements?
Generate new SSH keys for the Transfer Family server. Make the old keys and the new keys available for use.
Update the security group rules for the on-premises network to allow only connections that use TLS 1.2 or above.
Update the security policy of the Transfer Family server to specify a minimum protocol version of TLS 1.2.
Install an SSL certificate on the Transfer Family server to encrypt data transfers by using TLS 1.2.
A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.
The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.
Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Select TWO.)
Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.
Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.
Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.
Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.
Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on 53 events that the SQS queue receives.
A company uploads .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.
An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.
If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.
Which solution will meet these requirements?
Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands to update the existing rows with new values from the staging Redshift table.
Modify the AWS Glue job to load the previously inserted data into a MySQL database. Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift tables.
Use Apache Spark's DataFrame dropDuplicates() API to eliminate duplicates. Write the data to the Redshift tables.
Use the AWS Glue ResolveChoice built-in transform to select the value of the column from the most recent record.
A financial company recently added more features to its mobile app. The new features required the company to create a new topic in an existing Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster.
A few days after the company added the new topic, Amazon CloudWatch raised an alarm on the RootDiskUsed metric for the MSK cluster.
How should the company address the CloudWatch alarm?
Expand the storage of the MSK broker. Configure the MSK cluster storage to expand automatically.
Expand the storage of the Apache ZooKeeper nodes.
Update the MSK broker instance to a larger instance type. Restart the MSK cluster.
Specify the Target-Volume-in-GiB parameter for the existing topic.
Question