ExamGecko
Home Home / Amazon / DAS-C01

Amazon DAS-C01 Practice Test - Questions Answers, Page 9

Question list
Search
Search

List of questions

Search

Related questions











A healthcare company ingests patient data from multiple data sources and stores it in an Amazon S3 staging bucket. An AWS Glue ETL job transforms the data, which is written to an S3-based data lake to be queried using Amazon Athena. The company wants to match patient records even when the records do not have a common unique identifier. Which solution meets this requirement?

A.
Use Amazon Macie pattern matching as part of the ETLjob
A.
Use Amazon Macie pattern matching as part of the ETLjob
Answers
B.
Train and use the AWS Glue PySpark filter class in the ETLjob
B.
Train and use the AWS Glue PySpark filter class in the ETLjob
Answers
C.
Partition tables and use the ETL job to partition the data on patient name
C.
Partition tables and use the ETL job to partition the data on patient name
Answers
D.
Train and use the AWS Glue FindMatches ML transform in the ETLjob
D.
Train and use the AWS Glue FindMatches ML transform in the ETLjob
Answers
Suggested answer: D

Explanation:


The FindMatches transform enables you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. Reference: https://docs.aws.amazon.com/glue/latest/dg/machine-learning.html

A retail company stores order invoices in an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster Indices on the cluster are created monthly. Once a new month begins, no new writes are made to any of the indices from the previous months. The company has been expanding the storage on the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster to avoid running out of space, but the company wants to reduce costs. Most searches on the cluster are on the most recent 3 months of data, while the audit team requires infrequent access to older data to generate periodic reports. The most recent 3 months of data must be quickly available for queries, but the audit team can tolerate slower queries if the solution saves on cluster costs Which of the following is the MOST operationally efficient solution to meet these requirements?

A.
Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to store the indices in Amazon S3 Glacier. When the audit team requires the archived data, restore the archived indices back to theAmazon OpenSearch Service (Amazon Elasticsearch Service) cluster.
A.
Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to store the indices in Amazon S3 Glacier. When the audit team requires the archived data, restore the archived indices back to theAmazon OpenSearch Service (Amazon Elasticsearch Service) cluster.
Answers
B.
Archive indices that are older than 3 months by taking manual snapshots and storing the snapshots in Amazon S3. When the audit team requires the archived data, restore the archived indices back to the Amazon OpenSearch Service(Amazon Elasticsearch Service) cluster.
B.
Archive indices that are older than 3 months by taking manual snapshots and storing the snapshots in Amazon S3. When the audit team requires the archived data, restore the archived indices back to the Amazon OpenSearch Service(Amazon Elasticsearch Service) cluster.
Answers
C.
Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage.
C.
Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage.
Answers
D.
Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage. When the audit teamrequires the older data, migrate the indices in UltraWarm storage back to hot storage.
D.
Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage. When the audit teamrequires the older data, migrate the indices in UltraWarm storage back to hot storage.
Answers
Suggested answer: D

Explanation:


Reference: https://docs.aws.amazon.com/da_pv/opensearch-service/latest/developerguide/opensearch-service-dg.pdf

An analytics software as a service (SaaS) provider wants to offer its customers business intelligence (BI) reporting capabilities that are self-service. The provider is using Amazon QuickSight to build these reports. The data for the reports resides in a multi-tenant database, but each customer should only be able to access their own data.

The provider wants to give customers two user role options:

Read-only users for individuals who only need to view dashboards.

Power users for individuals who are allowed to create and share new dashboards with other users. Which QuickSught feature allows the provider to meet these requirements?

A.
Embedded dashboards
A.
Embedded dashboards
Answers
B.
Table calculations
B.
Table calculations
Answers
C.
Isolated namespaces
C.
Isolated namespaces
Answers
D.
SPICE
D.
SPICE
Answers
Suggested answer: D

Explanation:


Reference: https://docs.aws.amazon.com/quicksight/latest/user/provisioning-users.html

A medical company has a system with sensor devices that read metrics and send them in real time to an Amazon Kinesis data stream. The Kinesis data stream has multiple shards. The company needs to calculate the average value of a numeric metric every second and set an alarm for whenever the value is above one threshold or below another threshold. The alarm must be sent to Amazon Simple Notification Service (Amazon SNS) in less than 30 seconds. Which architecture meets these requirements?

A.
Use an Amazon Kinesis Data Firehose delivery stream to read the data from the Kinesis data stream with an AWS Lambda transformation function that calculates the average per second and sends the alarm to Amazon SNS.
A.
Use an Amazon Kinesis Data Firehose delivery stream to read the data from the Kinesis data stream with an AWS Lambda transformation function that calculates the average per second and sends the alarm to Amazon SNS.
Answers
B.
Use an AWS Lambda function to read from the Kinesis data stream to calculate the average per second and sent the alarm to Amazon SNS.
B.
Use an AWS Lambda function to read from the Kinesis data stream to calculate the average per second and sent the alarm to Amazon SNS.
Answers
C.
Use an Amazon Kinesis Data Firehose deliver stream to read the data from the Kinesis data stream and store it on Amazon S3. Have Amazon S3 trigger an AWS Lambda function that calculates the average per second and sends thealarm to Amazon SNS.
C.
Use an Amazon Kinesis Data Firehose deliver stream to read the data from the Kinesis data stream and store it on Amazon S3. Have Amazon S3 trigger an AWS Lambda function that calculates the average per second and sends thealarm to Amazon SNS.
Answers
D.
Use an Amazon Kinesis Data Analytics application to read from the Kinesis data stream and calculate the average per second. Send the results to an AWS Lambda function that sends the alarm to Amazon SNS.
D.
Use an Amazon Kinesis Data Analytics application to read from the Kinesis data stream and calculate the average per second. Send the results to an AWS Lambda function that sends the alarm to Amazon SNS.
Answers
Suggested answer: C

Explanation:


Reference: https://docs.aws.amazon.com/firehose/latest/dev/firehose-dg.pdf

A banking company is currently using Amazon Redshift for sensitive data. An audit found that the current cluster is unencrypted. Compliance requires that a database with sensitive data must be encrypted using a hardware security module (HSM) with customer managed keys.

Which modifications are required in the cluster to ensure compliance?

A.
Create a new HSM-encrypted Amazon Redshift cluster and migrate the data to the new cluster.
A.
Create a new HSM-encrypted Amazon Redshift cluster and migrate the data to the new cluster.
Answers
B.
Modify the DB parameter group with the appropriate encryption settings and then restart the cluster.
B.
Modify the DB parameter group with the appropriate encryption settings and then restart the cluster.
Answers
C.
Enable HSM encryption in Amazon Redshift using the command line.
C.
Enable HSM encryption in Amazon Redshift using the command line.
Answers
D.
Modify the Amazon Redshift cluster from the console and enable encryption using the HSM option.
D.
Modify the Amazon Redshift cluster from the console and enable encryption using the HSM option.
Answers
Suggested answer: A

Explanation:


When you modify your cluster to enable AWS KMS encryption, Amazon Redshift automatically migrates your data to a new encrypted cluster. Reference: https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html

An ecommerce company is migrating its business intelligence environment from on premises to the AWS Cloud. The company will use Amazon Redshift in a public subnet and Amazon QuickSight. The tables already are loaded into Amazon Redshift and can be accessed by a SQL tool.

The company starts QuickSight for the first time. During the creation of the data source, a data analytics specialist enters all the information and tries to validate the connection. An error with the following message occurs: “Creating a connection to your data source timed out.” How should the data analytics specialist resolve this error?

A.
Grant the SELECT permission on Amazon Redshift tables.
A.
Grant the SELECT permission on Amazon Redshift tables.
Answers
B.
Add the QuickSight IP address range into the Amazon Redshift security group.
B.
Add the QuickSight IP address range into the Amazon Redshift security group.
Answers
C.
Create an IAM role for QuickSight to access Amazon Redshift.
C.
Create an IAM role for QuickSight to access Amazon Redshift.
Answers
D.
Use a QuickSight admin user for creating the dataset.
D.
Use a QuickSight admin user for creating the dataset.
Answers
Suggested answer: A

Explanation:


Connection to the database times out

Your client connection to the database appears to hang or time out when running long queries, such as a COPY command.

In this case, you might observe that the Amazon Redshift console displays that the query has completed, but the client tool itself still appears to be running the query. The results of the query might be missing or incomplete depending on when the connection stopped.

Reference: https://docs.aws.amazon.com/redshift/latest/dg/queries-troubleshooting.html

A company has a business unit uploading .csv files to an Amazon S3 bucket. The company’s data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day, duplicate records are introduced into the Amazon Redshift table.

Which solution will update the Redshift table without duplicates when jobs are rerun?

A.
Modify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class.
A.
Modify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class.
Answers
B.
Load the previously inserted data into a MySQL database in the AWS Glue job. Perform an upsert operation in MySQL, and copy the results to the Amazon Redshift table.
B.
Load the previously inserted data into a MySQL database in the AWS Glue job. Perform an upsert operation in MySQL, and copy the results to the Amazon Redshift table.
Answers
C.
Use Apache Spark’s DataFrame dropDuplicates() API to eliminate duplicates and then write the data to Amazon Redshift.
C.
Use Apache Spark’s DataFrame dropDuplicates() API to eliminate duplicates and then write the data to Amazon Redshift.
Answers
D.
Use the AWS Glue ResolveChoice built-in transform to select the most recent value of the column.
D.
Use the AWS Glue ResolveChoice built-in transform to select the most recent value of the column.
Answers
Suggested answer: B

An online retail company is using Amazon Redshift to run queries and perform analytics on customer shopping behavior.

When multiple queries are running on the cluster, runtime for small queries increases significantly. The company’s data analytics team to decrease the runtime of these small queries by prioritizing them ahead of large queries. Which solution will meet these requirements?

A.
Use Amazon Redshift Spectrum for small queries
A.
Use Amazon Redshift Spectrum for small queries
Answers
B.
Increase the concurrency limit in workload management (WLM)
B.
Increase the concurrency limit in workload management (WLM)
Answers
C.
Configure short query acceleration in workload management (WLM)
C.
Configure short query acceleration in workload management (WLM)
Answers
D.
Add a dedicated compute node for small queries
D.
Add a dedicated compute node for small queries
Answers
Suggested answer: C

Explanation:


Short query acceleration (SQA) prioritizes selected short-running queries ahead of longer-running queries. SQA executes short-running queries in a dedicated space, so that SQA queries aren't forced to wait in queues behind longer queries.

Reference: https://docs.aws.amazon.com/redshift/latest/mgmt/workload-mgmt-config.html

An online retail company is migrating its reporting system to AWS. The company’s legacy system runs data processing on online transactions using a complex series of nested Apache Hive queries. Transactional data is exported from the online system to the reporting system several times a day. Schemas in the files are stable between updates.

A data analyst wants to quickly migrate the data processing to AWS, so any code changes should be minimized. To keep storage costs low, the data analyst decides to store the data in Amazon S3. It is vital that the data from the reports and associated analytics is completely up to date based on the data in Amazon S3. Which solution meets these requirements?

A.
Create an AWS Glue Data Catalog to manage the Hive metadata. Create an AWS Glue crawler over Amazon S3 that runs when data is refreshed to ensure that data changes are updated. Create an Amazon EMR cluster and use themetadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
A.
Create an AWS Glue Data Catalog to manage the Hive metadata. Create an AWS Glue crawler over Amazon S3 that runs when data is refreshed to ensure that data changes are updated. Create an Amazon EMR cluster and use themetadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
Answers
B.
Create an AWS Glue Data Catalog to manage the Hive metadata. Create an Amazon EMR cluster with consistent view enabled. Run emrfs sync before each analytics step to ensure data changes are updated. Create an EMR clusterand use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
B.
Create an AWS Glue Data Catalog to manage the Hive metadata. Create an Amazon EMR cluster with consistent view enabled. Run emrfs sync before each analytics step to ensure data changes are updated. Create an EMR clusterand use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
Answers
C.
Create an Amazon Athena table with CREATE TABLE AS SELECT (CTAS) to ensure data is refreshed from underlying queries against the raw dataset. Create an AWS Glue Data Catalog to manage the Hive metadata over the CTAStable.Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
C.
Create an Amazon Athena table with CREATE TABLE AS SELECT (CTAS) to ensure data is refreshed from underlying queries against the raw dataset. Create an AWS Glue Data Catalog to manage the Hive metadata over the CTAStable.Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
Answers
D.
Use an S3 Select query to ensure that the data is properly updated. Create an AWS Glue Data Catalog to manage the Hive metadata over the S3 Select table. Create an Amazon EMR cluster and use the metadata in the AWS Glue DataCatalog to run Hive processing queries in Amazon EMR.
D.
Use an S3 Select query to ensure that the data is properly updated. Create an AWS Glue Data Catalog to manage the Hive metadata over the S3 Select table. Create an Amazon EMR cluster and use the metadata in the AWS Glue DataCatalog to run Hive processing queries in Amazon EMR.
Answers
Suggested answer: A

A company uses Amazon Redshift as its data warehouse. A new table includes some columns that contain sensitive data and some columns that contain non-sensitive data. The data in the table eventually will be referenced by several existing queries that run many times each day.

A data analytics specialist must ensure that only members of the company’s auditing team can read the columns that contain sensitive data. All other users must have read-only access to the columns that contain non-sensitive data. Which solution will meet these requirements with the LEAST operational overhead?

A.
Grant the auditing team permission to read from the table. Load the columns that contain non-sensitive data into a second table. Grant the appropriate users read-only permissions to the second table.
A.
Grant the auditing team permission to read from the table. Load the columns that contain non-sensitive data into a second table. Grant the appropriate users read-only permissions to the second table.
Answers
B.
Grant all users read-only permissions to the columns that contain non-sensitive data. Use the GRANT SELECT command to allow the auditing team to access the columns that contain sensitive data.
B.
Grant all users read-only permissions to the columns that contain non-sensitive data. Use the GRANT SELECT command to allow the auditing team to access the columns that contain sensitive data.
Answers
C.
Grant all users read-only permissions to the columns that contain non-sensitive data. Attach an IAM policy to the auditing team with an explicit. Allow action that grants access to the columns that contain sensitive data.
C.
Grant all users read-only permissions to the columns that contain non-sensitive data. Attach an IAM policy to the auditing team with an explicit. Allow action that grants access to the columns that contain sensitive data.
Answers
D.
Grant the auditing team permission to read from the table. Create a view of the table that includes the columns that contain non-sensitive data. Grant the appropriate users read-only permissions to that view.
D.
Grant the auditing team permission to read from the table. Create a view of the table that includes the columns that contain non-sensitive data. Grant the appropriate users read-only permissions to that view.
Answers
Suggested answer: D

Explanation:


Users with SELECT permission on a table can view the table data. Columns that are defined as masked, will display the masked data. Grant the UNMASK permission to a user to enable them to retrieve unmasked data from the columns for which masking is defined.

Reference: https://docs.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking?view=sql-server-ver15

Total 214 questions
Go to page: of 22