ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 190 - DAS-C01 discussion

Report
Export

A company collects and transforms data files from third-party providers by using an on-premises SFTP server. The company uses a Python script to transform the dat a.

The company wants to reduce the overhead of maintaining the SFTP server and storing large amounts of data on premises. However, the company does not want to change the existing upload process for the third-party providers.

Which solution will meet these requirements with the LEAST development effort?

A.
Deploy the Python script on an Amazon EC2 instance. Install a third-party SFTP server on the EC2 instance. Schedule the script to run periodically on the EC2 instance to perform a data transformation on new files. Copy the transformed files to Amazon S3.
Answers
A.
Deploy the Python script on an Amazon EC2 instance. Install a third-party SFTP server on the EC2 instance. Schedule the script to run periodically on the EC2 instance to perform a data transformation on new files. Copy the transformed files to Amazon S3.
B.
Create an Amazon S3 bucket that includes a separate prefix for each provider. Provide the S3 URL to each provider for its respective prefix. Instruct the providers to use the S3 COPY command to upload data. Configure an AWS Lambda function that transforms the data when new files are uploaded.
Answers
B.
Create an Amazon S3 bucket that includes a separate prefix for each provider. Provide the S3 URL to each provider for its respective prefix. Instruct the providers to use the S3 COPY command to upload data. Configure an AWS Lambda function that transforms the data when new files are uploaded.
C.
Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Schedule a Python shell job in AWS Glue to use the existing Python script to run periodically and transform the uploaded files.
Answers
C.
Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Schedule a Python shell job in AWS Glue to use the existing Python script to run periodically and transform the uploaded files.
D.
Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Use AWS Data Pipeline to schedule a transient Amazon EMR cluster with an Apache Spark step to periodically transform the files.
Answers
D.
Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Use AWS Data Pipeline to schedule a transient Amazon EMR cluster with an Apache Spark step to periodically transform the files.
Suggested answer: C

Explanation:

This solution meets the requirements because:

AWS Transfer Family is a fully managed service that enables secure file transfers to and from Amazon S3 or Amazon EFS using standard protocols such as SFTP, FTPS, and FTP1. By using AWS Transfer Family, the company can reduce the overhead of maintaining the on-premises SFTP server and storing large amounts of data on premises.

The company can create an SFTP-enabled server with a publicly accessible endpoint using AWS Transfer Family. This endpoint can be accessed by the third-party providers over the internet using their existing SFTP clients. The company can also change the server name to match the name of the on-premises SFTP server, so that the existing upload process for the third-party providers does not change. For more information, seeCreate an SFTP-enabled server.

The company can configure the new SFTP server to use Amazon S3 as the storage service. This way, the data files uploaded by the third-party providers will be stored in an Amazon S3 bucket. The company can also use AWS Identity and Access Management (IAM) roles and policies to control access to the S3 bucket and its objects. For more information, seeUsing Amazon S3 as your storage service.

The company can schedule a Python shell job in AWS Glue to use the existing Python script to run periodically and transform the uploaded files.AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics2.A Python shell job is a type of job that runs Python scripts in a managed Apache Spark environment3.The company can use AWS Glue triggers to schedule the Python shell job based on time or events4. For more information, seeWorking with Python shell jobs.

asked 16/09/2024
Muhammad Atif Tasneem
36 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first