ExamGecko
Question list
Search
Search

Question 32 - DSA-C02 discussion

Report
Export

Which of the following process best covers all of the following characteristics?

* Collecting descriptive statistics like min, max, count and sum.

* Collecting data types, length and recurring patterns.

* Tagging data with keywords, descriptions or categories.

* Performing data quality assessment, risk of performing joins on the data.

* Discovering metadata and assessing its accuracy.

Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.

A.
Data Visualization
Answers
A.
Data Visualization
B.
Data Virtualization
Answers
B.
Data Virtualization
C.
Data Profiling
Answers
C.
Data Profiling
D.
Data Collection
Answers
D.
Data Collection
Suggested answer: C

Explanation:

Data processing and analysis cannot happen without data profiling---reviewing source data for con-tent and quality. As data gets bigger and infrastructure moves to the cloud, data profiling is increasingly important.

What is data profiling?

Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.

Data profiling is a crucial part of:

* Data warehouse and business intelligence (DW/BI) projects---data profiling can uncover data quality issues in data sources, and what needs to be corrected in ETL.

* Data conversion and migration projects---data profiling can identify data quality issues, which you can handle in scripts and data integration tools copying data from source to target. It can also un-cover new requirements for the target system.

* Source system data quality projects---data profiling can highlight data which suffers from serious or numerous quality issues, and the source of the issues (e.g. user inputs, errors in interfaces, data corruption).

Data profiling involves:

* Collecting descriptive statistics like min, max, count and sum.

* Collecting data types, length and recurring patterns.

* Tagging data with keywords, descriptions or categories.

* Performing data quality assessment, risk of performing joins on the data.

* Discovering metadata and assessing its accuracy.

* Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.

asked 23/09/2024
Jonaid Alam
36 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first