A company analyzes data in a data lake every quarter to perform inventory assessments. A data engineer uses AWS Glue DataBrew to detect any personally identifiable information (PII) about customers within the data. The company's privacy policy considers some custom categories of information to be PII. However, the categories are not included in standard DataBrew data quality rules.
The data engineer needs to modify the current process to scan for the custom PII categories across multiple datasets within the data lake.
Which solution will meet these requirements with the LEAST operational overhead?

Question

A company analyzes data in a data lake every quarter to perform inventory assessments. A data engineer uses AWS Glue DataBrew to detect any personally identifiable information (PII) about customers within the data. The company's privacy policy considers some custom categories of information to be PII. However, the categories are not included in standard DataBrew data quality rules.

The data engineer needs to modify the current process to scan for the custom PII categories across multiple datasets within the data lake.

Which solution will meet these requirements with the LEAST operational overhead?

Tillmon, Quinton · Accepted Answer

Implement custom data quality rules in Data Brew. Apply the custom rules across datasets.

Tillmon, Quinton · Answer

Manually review the data for custom PII categories.

Tillmon, Quinton · Answer

Develop custom Python scripts to detect the custom PII categories. Call the scripts from DataBrew.

Tillmon, Quinton · Answer

Implement regex patterns to extract PII information from fields during extract transform, and load (ETL) operations into the data lake.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 112 - DEA-C01 discussion

Suggested answer: B

0 comments