ExamGecko
Home Home / Amazon / DAS-C01

Amazon DAS-C01 Practice Test - Questions Answers, Page 19

Question list
Search
Search

List of questions

Search

Related questions











A company has multiple data workflows to ingest data from its operational databases into its data lake on Amazon S3. The workflows use AWS Glue and Amazon EMR for data processing and ETL. The company wants to enhance its architecture to provide automated orchestration and minimize manual intervention Which solution should the company use to manage the data workflows to meet these requirements?

A.
AWS Glue workflows
A.
AWS Glue workflows
Answers
B.
AWS Step Functions
B.
AWS Step Functions
Answers
C.
AWS Lambda
C.
AWS Lambda
Answers
D.
AWS Batch
D.
AWS Batch
Answers
Suggested answer: B

Explanation:

This solution meets the requirements because:

AWS Step Functions is a fully managed service that allows you to create and orchestrate workflows that connect various AWS services, such as AWS Glue, Amazon EMR, Amazon S3, and others1.You can use Step Functions to automate your data workflows and handle complex logic, such as branching, parallel processing, error handling, retries, and timeouts1.

AWS Step Functions provides a graphical interface that lets you design and visualize your workflows as state machines, which are composed of a series of steps or tasks1.You can use the AWS Step Functions console, the AWS CLI, or the AWS SDKs to create and manage your state machines1.

AWS Step Functions integrates with AWS Glue and Amazon EMR to enable you to run data processing and ETL jobs as part of your workflows23.You can use the built-in connectors for these services to invoke them from your state machines23.You can also use Step Functions to monitor the status of your jobs and trigger actions based on the job outcomes23.

AWS Step Functions can help you minimize manual intervention by providing features such as automatic retries, catch blocks, and fallback states, which allow you to handle errors and failures gracefully in your workflows1.You can also use Step Functions to trigger your workflows based on events, such as a new file in S3 or a CloudWatch alarm1.

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's data analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data.

The amount of data that is ingested into Amazon S3 has increased to 5 PB over time. The query latency also has increased. The company needs to segment the data to reduce the amount of data that is scanned.

Which solutions will improve query performance? (Select TWO.)

A.
Use MySQL Workbench on an Amazon EC2 instance. Connect to Athena by using a JDBC connector. Run the query from MySQL Workbench instead of Athena directly.
A.
Use MySQL Workbench on an Amazon EC2 instance. Connect to Athena by using a JDBC connector. Run the query from MySQL Workbench instead of Athena directly.
Answers
B.
Configure Athena to use S3 Select to load only the files of the data subset.
B.
Configure Athena to use S3 Select to load only the files of the data subset.
Answers
C.
Create the data subset in Apache Parquet format each day by using the Athena CREATE TABLE AS SELECT (CTAS) statement. Query the Parquet data.
C.
Create the data subset in Apache Parquet format each day by using the Athena CREATE TABLE AS SELECT (CTAS) statement. Query the Parquet data.
Answers
D.
Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.
D.
Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.
Answers
E.
Create an S3 gateway endpoint. Configure VPC routing to access Amazon S3 through the gateway endpoint.
E.
Create an S3 gateway endpoint. Configure VPC routing to access Amazon S3 through the gateway endpoint.
Answers
Suggested answer: B, C

Explanation:

This solution will improve query performance because:

Apache Parquet is a columnar storage format that is optimized for analytics and supports compression1.Parquet files can reduce the amount of data scanned and transferred by Athena, thus improving performance and reducing cost1.

The Athena CREATE TABLE AS SELECT (CTAS) statement allows you to create a new table from the results of a SELECT query2.You can use this statement to convert the CSV files to Parquet format and store them in a different location in S32.You can also specify partitioning keys for the new table, which can further improve query performance by filtering out irrelevant data2.

Querying the Parquet data will be faster and cheaper than querying the CSV data, as Parquet files are more efficient for analytical queries1.

C) Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.

This solution will improve query performance because:

AWS Glue is a fully managed extract, transform, and load (ETL) service that can be used to prepare and load data for analytics3.You can use AWS Glue to create a job that copies the CSV files from the source S3 bucket to a new S3 bucket, and converts them to Apache Parquet format3.

A business intelligence (Bl) engineer must create a dashboard to visualize how often certain keywords are used in relation to others in social media posts about a public figure. The Bl engineer extracts the keywords from the posts and loads them into an Amazon Redshift table. The table displays the keywords and the count corresponding to each keyword.

The Bl engineer needs to display the top keywords with more emphasis on the most frequently used keywords.

Which visual type in Amazon QuickSight meets these requirements?

A.
Bar charts
A.
Bar charts
Answers
B.
Word clouds
B.
Word clouds
Answers
C.
Circle packing
C.
Circle packing
Answers
D.
Heat maps
D.
Heat maps
Answers
Suggested answer: B

A company uses Amazon Redshift for its data warehouse. The company is running an ET L process that receives data in data parts from five third-party providers. The data parts contain independent records that are related to one specific job. The company receives the data parts at various times throughout each day.

A data analytics specialist must implement a solution that loads the data into Amazon Redshift only after the company receives all five data parts.

Which solution will meet these requirements?

A.
Create an Amazon S3 bucket to receive the data. Use S3 multipart upload to collect the data from the different sources and to form a single object before loading the data into Amazon Redshift.
A.
Create an Amazon S3 bucket to receive the data. Use S3 multipart upload to collect the data from the different sources and to form a single object before loading the data into Amazon Redshift.
Answers
B.
Use an AWS Lambda function that is scheduled by cron to load the data into a temporary table in Amazon Redshift. Use Amazon Redshift database triggers to consolidate the final data when all five data parts are ready.
B.
Use an AWS Lambda function that is scheduled by cron to load the data into a temporary table in Amazon Redshift. Use Amazon Redshift database triggers to consolidate the final data when all five data parts are ready.
Answers
C.
Create an Amazon S3 bucket to receive the data. Create an AWS Lambda function that is invoked by S3 upload events. Configure the function to validate that all five data parts are gathered before the function loads the data into Amazon Redshift.
C.
Create an Amazon S3 bucket to receive the data. Create an AWS Lambda function that is invoked by S3 upload events. Configure the function to validate that all five data parts are gathered before the function loads the data into Amazon Redshift.
Answers
D.
Create an Amazon Kinesis Data Firehose delivery stream. Program a Python condition that will invoke a buffer flush when all five data parts are received.
D.
Create an Amazon Kinesis Data Firehose delivery stream. Program a Python condition that will invoke a buffer flush when all five data parts are received.
Answers
Suggested answer: D

A financial services company is building a data lake solution on Amazon S3. The company plans to use analytics offerings from AWS to meet user needs for one-time querying and business intelligence reports. A portion of the columns will contain personally identifiable information (Pll). Only authorized users should be able to see plaintext PII data.

What is the MOST operationally efficient solution that meets these requirements?

A.
Define a bucket policy for each S3 bucket of the data lake to allow access to users who have authorization to see PII data. Catalog the data by using AWS Glue. Create two IAM roles. Attach a permissions policy with access to PII columns to one role. Attach a policy without these permissions to the other role.
A.
Define a bucket policy for each S3 bucket of the data lake to allow access to users who have authorization to see PII data. Catalog the data by using AWS Glue. Create two IAM roles. Attach a permissions policy with access to PII columns to one role. Attach a policy without these permissions to the other role.
Answers
B.
Register the S3 locations with AWS Lake Formation. Create two IAM roles. Use Lake Formation data permissions to grant Select permissions to all of the columns for one role. Grant Select permissions to only columns that contain non-PII data for the other role.
B.
Register the S3 locations with AWS Lake Formation. Create two IAM roles. Use Lake Formation data permissions to grant Select permissions to all of the columns for one role. Grant Select permissions to only columns that contain non-PII data for the other role.
Answers
C.
Register the S3 locations with AWS Lake Formation. Create an AWS Glue job to create an E TL workflow that removes the Pll columns from the data and creates a separate copy of the data in another data lake S3 bucket. Register the new S3 locations with Lake Formation. Grant users the permissions to each data lake data based on whether the users are authorized to see PII data.
C.
Register the S3 locations with AWS Lake Formation. Create an AWS Glue job to create an E TL workflow that removes the Pll columns from the data and creates a separate copy of the data in another data lake S3 bucket. Register the new S3 locations with Lake Formation. Grant users the permissions to each data lake data based on whether the users are authorized to see PII data.
Answers
D.
Register the S3 locations with AWS Lake Formation. Create two IAM roles. Attach a permissions policy with access to Pll columns to one role. Attach a policy without these permissions to the other role. For each downstream analytics service, use its native security functionality and the IAM roles to secure the Pll data.
D.
Register the S3 locations with AWS Lake Formation. Create two IAM roles. Attach a permissions policy with access to Pll columns to one role. Attach a policy without these permissions to the other role. For each downstream analytics service, use its native security functionality and the IAM roles to secure the Pll data.
Answers
Suggested answer: B

Explanation:

This solution meets the requirements because:

AWS Lake Formation is a fully managed service that allows you to build, secure, and manage data lakes on AWS1.You can use Lake Formation to register your S3 locations as data sources and catalog your data using AWS Glue1.

AWS Lake Formation provides fine-grained data permissions that enable you to control access to your data at the column or row level1.You can use Lake Formation to create two IAM roles and grant them different Select permissions based on the PII status of the columns1.

AWS Lake Formation integrates with various analytics services from AWS, such as Amazon Athena, Amazon Redshift, Amazon EMR, and Amazon QuickSight1.You can use these services to query and visualize your data in S3 using the IAM roles and permissions defined by Lake Formation1.

A company developed a new voting results reporting website that uses Amazon Kinesis Data Firehose to deliver full logs from AWS WAF to an Amazon S3 bucket. The company is now seeking a solution to perform this infrequent data analysis with data visualization capabilities in a way that requires minimal development effort.

Which solution MOST cost-effectively meets these requirements?

A.
Use an AWS Glue crawler to create and update a table in the AWS Glue data catalog from the logs. Use Amazon Athena to perform ad-hoc analyses. Develop data visualizations by using Amazon QuickSight.
A.
Use an AWS Glue crawler to create and update a table in the AWS Glue data catalog from the logs. Use Amazon Athena to perform ad-hoc analyses. Develop data visualizations by using Amazon QuickSight.
Answers
B.
Configure Kinesis Data Firehose to deliver the logs to an Amazon OpenSearch Service cluster. Use OpenSearch Service REST APIs to analyze the data. Visualize the data by building an OpenSearch Service dashboard.
B.
Configure Kinesis Data Firehose to deliver the logs to an Amazon OpenSearch Service cluster. Use OpenSearch Service REST APIs to analyze the data. Visualize the data by building an OpenSearch Service dashboard.
Answers
C.
Create an AWS Lambda function to convert the logs to CSV format. Add the Lambda function to the Kinesis Data Firehose transformation configuration. Use Amazon Redshift to perform a one-time analysis of the logs by using SQL queries. Develop data visualizations by using Amazon QuickSight.
C.
Create an AWS Lambda function to convert the logs to CSV format. Add the Lambda function to the Kinesis Data Firehose transformation configuration. Use Amazon Redshift to perform a one-time analysis of the logs by using SQL queries. Develop data visualizations by using Amazon QuickSight.
Answers
D.
Create an Amazon EMR cluster and use Amazon S3 as the data source. Create an Apache Spark job to perform a one-time analysis of the logs. Develop data visualizations by using Amazon QuickSight.
D.
Create an Amazon EMR cluster and use Amazon S3 as the data source. Create an Apache Spark job to perform a one-time analysis of the logs. Develop data visualizations by using Amazon QuickSight.
Answers
Suggested answer: A

Explanation:

This solution meets the requirements because:

AWS Glue is a fully managed extract, transform, and load (ETL) service that can be used to prepare and load data for analytics1.You can use AWS Glue to create a crawler that automatically scans your logs in S3 and infers their schema and format1.The crawler can also update the AWS Glue Data Catalog, which is a central metadata repository that Athena uses to access your data in S31.

Amazon Athena is an interactive query service that allows you to analyze data in S3 using standard SQL2.You can use Athena to perform ad-hoc analyses on your logs without having to load them into a database or data warehouse2.Athena is serverless, so you only pay for the queries you run and the amount of data scanned by each query2.

Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence service that can create interactive dashboards3.You can use QuickSight to develop data visualizations from your Athena queries and share them with others3.QuickSight also supports live analytics, which means you can see the latest data without having to refresh your dashboards3.

A data analyst notices the following error message while loading data to an Amazon Redshift cluster:

'The bucket you are attempting to access must be addressed using the specified endpoint.'

What should the data analyst do to resolve this issue?

A.
Specify the correct AWS Region for the Amazon S3 bucket by using the REGION option with the COPY command.
A.
Specify the correct AWS Region for the Amazon S3 bucket by using the REGION option with the COPY command.
Answers
B.
Change the Amazon S3 object's ACL to grant the S3 bucket owner full control of the object.
B.
Change the Amazon S3 object's ACL to grant the S3 bucket owner full control of the object.
Answers
C.
Launch the Redshift cluster in a VPC.
C.
Launch the Redshift cluster in a VPC.
Answers
D.
Configure the timeout settings according to the operating system used to connect to the Redshift cluster.
D.
Configure the timeout settings according to the operating system used to connect to the Redshift cluster.
Answers
Suggested answer: A

Explanation:

The correct answer is

A Specify the correct AWS Region for the Amazon S3 bucket by using the REGION option with the COPY command.

The error message indicates that the Amazon S3 bucket and the Redshift cluster are not in the same region. To load data from a different region, the COPY command needs to specify the source region using the REGION option. For example, if the Redshift cluster is in US East (N. Virginia) and the S3 bucket is in Asia Pacific (Mumbai), the COPY command should include REGION 'ap-south-1'. This option tells Redshift to use the appropriate endpoint to access the S3 bucket. For more information, see Copy command options and COPY - Amazon Redshift.

A company has a fitness tracker application that generates data from subscribers. The company needs real-time reporting on this data. The data is sent immediately, and the processing latency must be less than 1 second. The company wants to perform anomaly detection on the data as the data is collected. The company also requires a solution that minimizes operational overhead.

Which solution meets these requirements?

A.
Amazon EMR cluster with Apache Spark streaming, Spark SQL, and Spark's machine learning library (MLIib)
A.
Amazon EMR cluster with Apache Spark streaming, Spark SQL, and Spark's machine learning library (MLIib)
Answers
B.
Amazon Kinesis Data Firehose with Amazon S3 and Amazon Athena
B.
Amazon Kinesis Data Firehose with Amazon S3 and Amazon Athena
Answers
C.
Amazon Kinesis Data Firehose with Amazon QuickSight
C.
Amazon Kinesis Data Firehose with Amazon QuickSight
Answers
D.
Amazon Kinesis Data Streams with Amazon Kinesis Data Analytics
D.
Amazon Kinesis Data Streams with Amazon Kinesis Data Analytics
Answers
Suggested answer: D

A large company has several independent business units. Each business unit is responsible for its own data, but needs to share data with other units for collaboration.

Each unit stores data in an Amazon S3 data lake created with AWS Lake Formation. To create dashboard reports, the marketing team wants to join its data stored in an Amazon Redshift cluster with the sales team customer table stored in the data lake. The sales team has a large number of tables and schemas, but the marketing team should only have access to the customer table. The solution must be secure and scalable.

Which set of actions meets these requirements?

A.
The sales team shares the AWS Glue Data Catalog customer table with the marketing team in read-only mode using the named resource method. The marketing team accepts the datashare using AWS Resource Access Manager (AWS RAM) and creates a resource link to the shared customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum.
A.
The sales team shares the AWS Glue Data Catalog customer table with the marketing team in read-only mode using the named resource method. The marketing team accepts the datashare using AWS Resource Access Manager (AWS RAM) and creates a resource link to the shared customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum.
Answers
B.
The marketing team creates an S3 cross-account replication between the sales team's S3 bucket as the source and the marketing team's S3 bucket as the destination. The marketing team runs an AWS Glue crawler on the replicated data in its AWS account to create an AWS Glue Data Catalog customer table.
B.
The marketing team creates an S3 cross-account replication between the sales team's S3 bucket as the source and the marketing team's S3 bucket as the destination. The marketing team runs an AWS Glue crawler on the replicated data in its AWS account to create an AWS Glue Data Catalog customer table.
Answers
C.
The marketing team joins its data with the customer table using Amazon Redshift Spectrum. The marketing team creates an AWS Lambda function in the sales team's account to replicate data between the sale team's S3 bucket as the source and the marketing team's S3 bucket as the destination. The marketing team runs an AWS Glue crawler on the replicated data in its AWS account to create an AWS Glue Data Catalog customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum
C.
The marketing team joins its data with the customer table using Amazon Redshift Spectrum. The marketing team creates an AWS Lambda function in the sales team's account to replicate data between the sale team's S3 bucket as the source and the marketing team's S3 bucket as the destination. The marketing team runs an AWS Glue crawler on the replicated data in its AWS account to create an AWS Glue Data Catalog customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum
Answers
D.
The sales team shares the AWS Glue Data Catalog customer table with the marketing team in read-only mode using the Lake Formation tag-based access control (LF-TBAC) method. The sales team updates the AWS Glue Data Catalog resource policy to add relevant permissions for the marketing team. The marketing team creates a resource link to the shared customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum.
D.
The sales team shares the AWS Glue Data Catalog customer table with the marketing team in read-only mode using the Lake Formation tag-based access control (LF-TBAC) method. The sales team updates the AWS Glue Data Catalog resource policy to add relevant permissions for the marketing team. The marketing team creates a resource link to the shared customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum.
Answers
Suggested answer: D

A company collects and transforms data files from third-party providers by using an on-premises SFTP server. The company uses a Python script to transform the dat a.

The company wants to reduce the overhead of maintaining the SFTP server and storing large amounts of data on premises. However, the company does not want to change the existing upload process for the third-party providers.

Which solution will meet these requirements with the LEAST development effort?

A.
Deploy the Python script on an Amazon EC2 instance. Install a third-party SFTP server on the EC2 instance. Schedule the script to run periodically on the EC2 instance to perform a data transformation on new files. Copy the transformed files to Amazon S3.
A.
Deploy the Python script on an Amazon EC2 instance. Install a third-party SFTP server on the EC2 instance. Schedule the script to run periodically on the EC2 instance to perform a data transformation on new files. Copy the transformed files to Amazon S3.
Answers
B.
Create an Amazon S3 bucket that includes a separate prefix for each provider. Provide the S3 URL to each provider for its respective prefix. Instruct the providers to use the S3 COPY command to upload data. Configure an AWS Lambda function that transforms the data when new files are uploaded.
B.
Create an Amazon S3 bucket that includes a separate prefix for each provider. Provide the S3 URL to each provider for its respective prefix. Instruct the providers to use the S3 COPY command to upload data. Configure an AWS Lambda function that transforms the data when new files are uploaded.
Answers
C.
Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Schedule a Python shell job in AWS Glue to use the existing Python script to run periodically and transform the uploaded files.
C.
Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Schedule a Python shell job in AWS Glue to use the existing Python script to run periodically and transform the uploaded files.
Answers
D.
Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Use AWS Data Pipeline to schedule a transient Amazon EMR cluster with an Apache Spark step to periodically transform the files.
D.
Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Use AWS Data Pipeline to schedule a transient Amazon EMR cluster with an Apache Spark step to periodically transform the files.
Answers
Suggested answer: C

Explanation:

This solution meets the requirements because:

AWS Transfer Family is a fully managed service that enables secure file transfers to and from Amazon S3 or Amazon EFS using standard protocols such as SFTP, FTPS, and FTP1. By using AWS Transfer Family, the company can reduce the overhead of maintaining the on-premises SFTP server and storing large amounts of data on premises.

The company can create an SFTP-enabled server with a publicly accessible endpoint using AWS Transfer Family. This endpoint can be accessed by the third-party providers over the internet using their existing SFTP clients. The company can also change the server name to match the name of the on-premises SFTP server, so that the existing upload process for the third-party providers does not change. For more information, seeCreate an SFTP-enabled server.

The company can configure the new SFTP server to use Amazon S3 as the storage service. This way, the data files uploaded by the third-party providers will be stored in an Amazon S3 bucket. The company can also use AWS Identity and Access Management (IAM) roles and policies to control access to the S3 bucket and its objects. For more information, seeUsing Amazon S3 as your storage service.

The company can schedule a Python shell job in AWS Glue to use the existing Python script to run periodically and transform the uploaded files.AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics2.A Python shell job is a type of job that runs Python scripts in a managed Apache Spark environment3.The company can use AWS Glue triggers to schedule the Python shell job based on time or events4. For more information, seeWorking with Python shell jobs.

Total 214 questions
Go to page: of 22