On Customer Data Deduplication: Lessons Learned from a R&D Project in the Financial Sector
[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ 2 ] Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee | [ DW ] applied doctorate phd student
2022
chapter in monograph / paper
english
- data quality
- data cleaning
- data deduplication pipeline
EN Despite the fact that financial institutions (FIs) apply data governance strategies and use the most advanced state-of-the-art data management and data engineering software and systems to support their day-to-day businesses, their databases are not free from some faulty data (dirty and duplicated). In this paper, we report some conclusions from an ongoing research and development project for a FI. The goal of this project is to integrate customers’ data from multiple data sources - clean, homogenize, and deduplicate them. This paper, in particular, focuses on findings from developing customers’ data deduplication process.
CC BY (attribution alone)
publisher's website
final published version
at the time of publication
5