On evaluating text similarity measures for customer data deduplication
[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ 2 ] Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee | [ SzD ] doctoral school student
2023
chapter in monograph / paper
english
- data quality
- entity resolution
- data deduplication
- text similarity measures
EN In this paper, we summarize the results obtained while evaluating 44 similarity measures for text values, which represent real institutional customers data. These data come from a project conducted for a large financial institution in Poland. The similarity measures were assessed based on similarity values they returned and based on their execution times. To the best of our knowledge, it is the first report that evaluates such a large selection of different similarity measures.
297 - 300
20
20