Processing may take a few seconds...



On Customer Data Deduplication: Lessons Learned from a R&D Project in the Financial Sector


[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ 2 ] Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee | [ DW ] applied doctorate phd student

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication


Chapter type

chapter in monograph / paper

Publication language


  • data quality
  • data cleaning
  • data deduplication pipeline

EN Despite the fact that financial institutions (FIs) apply data governance strategies and use the most advanced state-of-the-art data management and data engineering software and systems to support their day-to-day businesses, their databases are not free from some faulty data (dirty and duplicated). In this paper, we report some conclusions from an ongoing research and development project for a FI. The goal of this project is to integrate customers’ data from multiple data sources - clean, homogenize, and deduplicate them. This paper, in particular, focuses on findings from developing customers’ data deduplication process.



Proceedings of the Workshops of the EDBT/ICDT 2022 Joint Conference, Edinburgh, UK, March 29, 2022

Presented on

Workshops of the EDBT/ICDT 2022 Joint Conference, 29.03.2022, Edinburgh, United Kingdom

License type

CC BY (attribution alone)

Open Access Mode

publisher's website

Open Access Text Version

final published version

Date of Open Access to the publication

at the time of publication

Points of MNiSW / chapter


This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.