Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Chapter

Download BibTeX

Title

On evaluating text similarity measures for customer data deduplication

Authors

[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ 2 ] Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee | [ SzD ] doctoral school student

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2023

Chapter type

chapter in monograph / paper

Publication language

english

Keywords
EN
  • data quality
  • entity resolution
  • data deduplication
  • text similarity measures
Abstract

EN In this paper, we summarize the results obtained while evaluating 44 similarity measures for text values, which represent real institutional customers data. These data come from a project conducted for a large financial institution in Poland. The similarity measures were assessed based on similarity values they returned and based on their execution times. To the best of our knowledge, it is the first report that evaluates such a large selection of different similarity measures.

Pages (from - to)

297 - 300

DOI

10.1145/3555776.3578724

Book

Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing SAC '23, March 27 - March 31, 2023, Tallinn, Estonia

Presented on

38th ACM/SIGAPP Symposium on Applied Computing (SAC '23), 27-31.03.2023, Tallinn, Estonia

Ministry points / chapter

20

Ministry points / conference (CORE)

20

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.