ExamGecko
Home Home / IIBA / CBDA
Question list
Search
Search

List of questions

Search

Related questions











Question 74 - CBDA discussion

Report
Export

A data scientist at a consumer goods company, has been asked to do a detailed analysis on customer profiles. The Data Scientist has identified an external data source that carries valuable additional information on their customers. The data scientist also identifies the address column as the most reliable column to join the internal data source with the external data source. Addresses may appear in different formats for example:

File A = '13 Smith St'

File B = 'Unit 7, 13 Smith Street'

Which of the following techniques would be useful in this situation?

A.
Deterministic linkage
Answers
A.
Deterministic linkage
B.
Probabilistic linkage
Answers
B.
Probabilistic linkage
C.
Genetic linkage
Answers
C.
Genetic linkage
D.
Cuff linkage
Answers
D.
Cuff linkage
Suggested answer: B

Explanation:

Probabilistic linkage is a technique that uses statistical methods to match records from different data sources based on the similarity of key variables, such as name, address, date of birth, etc1.Probabilistic linkage can handle variations, errors, or missing values in the data, and assign a score or probability to each potential match2. Probabilistic linkage would be useful in this situation, as the address column may have different formats, spellings, or abbreviations in the internal and external data sources, and a deterministic linkage (which requires exact matches) might miss some valid matches or create false matches.

Deterministic linkage is a technique that uses predefined rules or criteria to match records from different data sources based on the exact agreement of key variables, such as identifiers, codes, or hashes3. Deterministic linkage would not be useful in this situation, as the address column may not have consistent or unique values in the internal and external data sources, and a probabilistic linkage (which allows for some variation or uncertainty) might find more accurate matches or avoid false matches.

Genetic linkage is a term used in genetics to describe the tendency of genes or DNA sequences that are located close together on a chromosome to be inherited together4. Genetic linkage is not relevant to this situation, as it has nothing to do with matching records from different data sources based on the address column.

Cuff linkage is a term used in sewing to describe the process of attaching a cuff to a sleeve by stitching or fastening.Cuff linkage is not relevant to this situation, as it has nothing to do with matching records from different data sources based on the address column.

asked 18/09/2024
Danilo Paolucci
42 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first