Amazon DEA-C01 Practice Test - Questions Answers, Page 13
List of questions
Related questions
A data engineer uses Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to run data pipelines in an AWS account. A workflow recently failed to run. The data engineer needs to use Apache Airflow logs to diagnose the failure of the workflow. Which log type should the data engineer use to diagnose the cause of the failure?
YourEnvironmentName-WebServer
YourEnvironmentName-Scheduler
YourEnvironmentName-DAGProcessing
YourEnvironmentName-Task
An ecommerce company wants to use AWS to migrate data pipelines from an on-premises environment into the AWS Cloud. The company currently uses a third-party too in the on-premises environment to orchestrate data ingestion processes.
The company wants a migration solution that does not require the company to manage servers. The solution must be able to orchestrate Python and Bash scripts. The solution must not require the company to refactor any code.
Which solution will meet these requirements with the LEAST operational overhead?
AWS Lambda
Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
AWS Step Functions
AWS Glue
A data engineer needs to use Amazon Neptune to develop graph applications.
Which programming languages should the engineer use to develop the graph applications? (Select TWO.)
Gremlin
SQL
ANSI SQL
SPARQL
Spark SQL
A company needs to load customer data that comes from a third party into an Amazon Redshift data warehouse. The company stores order data and product data in the same data warehouse. The company wants to use the combined dataset to identify potential new customers.
A data engineer notices that one of the fields in the source data includes values that are in JSON format.
How should the data engineer load the JSON data into the data warehouse with the LEAST effort?
Use the SUPER data type to store the data in the Amazon Redshift table.
Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table.
Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data.
Use an AWS Lambda function to flatten the JSON data. Store the data in Amazon S3.
A company stores employee data in Amazon Redshift A table named Employee uses columns named Region ID, Department ID, and Role ID as a compound sort key. Which queries will MOST increase the speed of a query by using a compound sort key of the table? (Select TWO.)
Select * from Employee where Region ID='North America';
Select * from Employee where Region ID='North America' and Department ID=20;
Select * from Employee where Department ID=20 and Region ID='North America';
Select ' from Employee where Role ID=50;
Select * from Employee where Region ID='North America' and Role ID=50;
A car sales company maintains data about cars that are listed for sale in an are a. The company receives data about new car listings from vendors who upload the data daily as compressed files into Amazon S3. The compressed files are up to 5 KB in size. The company wants to see the most up-to-date listings as soon as the data is uploaded to Amazon S3.
A data engineer must automate and orchestrate the data processing workflow of the listings to feed a dashboard. The data engineer must also provide the ability to perform one-time queries and analytical reporting. The query solution must be scalable.
Which solution will meet these requirements MOST cost-effectively?
Use an Amazon EMR cluster to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Apache Hive for one-time queries and analytical reporting. Use Amazon OpenSearch Service to bulk ingest the data into compute optimized instances. Use OpenSearch Dashboards in OpenSearch Service for the dashboard.
Use a provisioned Amazon EMR cluster to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.
Use AWS Glue to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Amazon Redshift Spectrum for one-time queries and analytical reporting. Use OpenSearch Dashboards in Amazon OpenSearch Service for the dashboard.
Use AWS Glue to process incoming data. Use AWS Lambda and S3 Event Notifications to orchestrate workflows. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.
A data engineer needs to debug an AWS Glue job that reads from Amazon S3 and writes to Amazon Redshift. The data engineer enabled the bookmark feature for the AWS Glue job. The data engineer has set the maximum concurrency for the AWS Glue job to 1.
The AWS Glue job is successfully writing the output to Amazon Redshift. However, the Amazon S3 files that were loaded during previous runs of the AWS Glue job are being reprocessed by subsequent runs.
What is the likely reason the AWS Glue job is reprocessing the files?
The AWS Glue job does not have the s3:GetObjectAcl permission that is required for bookmarks to work correctly.
The maximum concurrency for the AWS Glue job is set to 1.
The data engineer incorrectly specified an older version of AWS Glue for the Glue job.
The AWS Glue job does not have a required commit statement.
A company is building a data lake for a new analytics team. The company is using Amazon S3 for storage and Amazon Athena for query analysis. All data that is in Amazon S3 is in Apache Parquet format.
The company is running a new Oracle database as a source system in the company's data center. The company has 70 tables in the Oracle database. All the tables have primary keys. Data can occasionally change in the source system. The company wants to ingest the tables every day into the data lake.
Which solution will meet this requirement with the LEAST effort?
Create an Apache Sqoop job in Amazon EMR to read the data from the Oracle database. Configure the Sqoop job to write the data to Amazon S3 in Parquet format.
Create an AWS Glue connection to the Oracle database. Create an AWS Glue bookmark job to ingest the data incrementally and to write the data to Amazon S3 in Parquet format.
Create an AWS Database Migration Service (AWS DMS) task for ongoing replication. Set the Oracle database as the source. Set Amazon S3 as the target. Configure the task to write the data in Parquet format.
Create an Oracle database in Amazon RDS. Use AWS Database Migration Service (AWS DMS) to migrate the on-premises Oracle database to Amazon RDS. Configure triggers on the tables to invoke AWS Lambda functions to write changed records to Amazon S3 in Parquet format.
A data engineer needs to build an enterprise data catalog based on the company's Amazon S3 buckets and Amazon RDS databases. The data catalog must include storage format metadata for the data in the catalog.
Which solution will meet these requirements with the LEAST effort?
Use an AWS Glue crawler to scan the S3 buckets and RDS databases and build a data catalog. Use data stewards to inspect the data and update the data catalog with the data format.
Use an AWS Glue crawler to build a data catalog. Use AWS Glue crawler classifiers to recognize the format of data and store the format in the catalog.
Use Amazon Macie to build a data catalog and to identify sensitive data elements. Collect the data format information from Macie.
Use scripts to scan data elements and to assign data classifications based on the format of the data.
Question