On Customer Data Deduplication: Lessons Learned from a R&amp;D Project in the Financial Sector

Paweł Boiński; Mariusz Sienkiewicz; Bartosz Bębel; Robert Wrembel; Dariusz Gałęzowski; Waldemar Graniszewski

Scientific Information System of the Poznań University of Technology

PL EN

Main page / Publications / On Customer Data Deduplication: Lessons Learned from a R&D Project in the Financial Sector

Submit a comment

Chapter

Download BibTeX

Title

On Customer Data Deduplication: Lessons Learned from a R&D Project in the Financial Sector

Authors

Paweł Boiński (WIiT) ^{[ 1 ][ 2.3 ][ P ]}
Mariusz Sienkiewicz (WIiT) ^{[ 2 ][ 2.3 ][ DW ]}
Bartosz Bębel (WIiT) ^{[ 1 ][ P ]}
Robert Wrembel (WIiT) ^{[ 1 ][ 2.3 ][ P ]}
Dariusz Gałęzowski
Waldemar Graniszewski

^{[ 1 ]} Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | ^{[ 2 ]} Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | ^{[ P ]} employee | ^{[ DW ]} applied doctorate phd student

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2022

Chapter type

chapter in monograph / paper

Publication language

english

Keywords

EN

data quality
data cleaning
data deduplication pipeline

Abstract

EN Despite the fact that financial institutions (FIs) apply data governance strategies and use the most advanced state-of-the-art data management and data engineering software and systems to support their day-to-day businesses, their databases are not free from some faulty data (dirty and duplicated). In this paper, we report some conclusions from an ongoing research and development project for a FI. The goal of this project is to integrate customers’ data from multiple data sources - clean, homogenize, and deduplicate them. This paper, in particular, focuses on findings from developing customers’ data deduplication process.

URL

http://ceur-ws.org/Vol-3135/darliap_paper6.pdf

Book

Proceedings of the Workshops of the EDBT/ICDT 2022 Joint Conference, Edinburgh, UK, March 29, 2022

Presented on

Workshops of the EDBT/ICDT 2022 Joint Conference, 29.03.2022, Edinburgh, United Kingdom

License type

CC BY (attribution alone)

Open Access Mode

publisher's website

Open Access Text Version

final published version