Amazon DAS-C01 Practice Test - Questions Answers, Page 21
List of questions
Related questions
A company is using an AWS Lambda function to run Amazon Athena queries against a cross-account AWS Glue Data Catalog. A query returns the following error:
HIVE METASTORE ERROR
The error message states that the response payload size exceeds the maximum allowed payload size. The queried table is already partitioned, and the data is stored in an
Amazon S3 bucket in the Apache Hive partition format.
Which solution will resolve this error?
A company's data science team is designing a shared dataset repository on a Windows server. The data repository will store a large amount of training data that the data science team commonly uses in its machine learning models. The data scientists create a random number of new datasets each day.
The company needs a solution that provides persistent, scalable file storage and high levels of throughput and IOPS. The solution also must be highly available and must integrate with Active Directory for access control.
Which solution will meet these requirements with the LEAST development effort?
A financial services firm is processing a stream of real-time data from an application by using Apache Kafka and Kafka MirrorMaker. These tools run on premises and stream data to Amazon Managed Streaming for Apache Kafka (Amazon MSK) in the us-east-1 Region. An Apache Flink consumer running on Amazon EMR enriches the data in real time and transfers the output files to an Amazon S3 bucket. The company wants to ensure that the streaming application is highly available across AWS Regions with an RTO of less than 2 minutes.
Which solution meets these requirements?
A company ingests a large set of sensor data in nested JSON format from different sources and stores it in an Amazon S3 bucket. The sensor data must be joined with performance data currently stored in an Amazon Redshift cluster.
A business analyst with basic SQL skills must build dashboards and analyze this data in Amazon QuickSight. A data engineer needs to build a solution to prepare the data for use by the business analyst. The data engineer does not know the structure of the JSON file. The company requires a solution with the least possible implementation effort.
Which combination of steps will create a solution that meets these requirements? (Select THREE.)
A network administrator needs to create a dashboard to visualize continuous network patterns over time in a company's AWS account. Currently, the company has VPC Flow Logs enabled and is publishing this data to Amazon CloudWatch Logs. To troubleshoot networking issues quickly, the dashboard needs to display the new data in near-real time.
Which solution meets these requirements?
A company wants to ingest clickstream data from its website into an Amazon S3 bucket. The streaming data is in JSON format. The data in the S3 bucket must be partitioned by product_id.
Which solution will meet these requirements MOST cost-effectively?
A company uses an Amazon Redshift provisioned cluster for data analysis. The data is not encrypted at rest. A data analytics specialist must implement a solution to encrypt the data at rest.
Which solution will meet this requirement with the LEAST operational overhead?
A company uses Amazon EC2 instances to receive files from external vendors throughout each day. At the end of each day, the EC2 instances combine the files into a single file, perform gzip compression, and upload the single file to an Amazon S3 bucket. The total size of all the files is approximately 100 GB each day.
When the files are uploaded to Amazon S3, an AWS Batch job runs a COPY command to load the files into an Amazon Redshift cluster.
Which solution will MOST accelerate the COPY process?
A company wants to use a data lake that is hosted on Amazon S3 to provide analytics services for historical data. The data lake consists of 800 tables but is expected to grow to thousands of tables. More than 50 departments use the tables, and each department has hundreds of users. Different departments need access to specific tables and columns.
Which solution will meet these requirements with the LEAST operational overhead?
A company has an application that ingests streaming data. The company needs to analyze this stream over a 5-minute timeframe to evaluate the stream for anomalies with Random Cut Forest (RCF) and summarize the current count of status codes. The source and summarized data should be persisted for future use.
Which approach would enable the desired outcome while keeping data persistence costs low?
Question