Web search results clustering in Polish : experimental evaluation of Carrot

Dawid Weiss; Jerzy Stefanowski

doi:10.1007/978-3-540-36562-4_22

System Informacji Naukowej Politechniki Poznańskiej

PL EN

Strona główna / Publikacje / Web search results clustering in Polish : experimental evaluation of Carrot

Zgłoś uwagę

Rozdział

Pobierz BibTeX

Tytuł

Web search results clustering in Polish : experimental evaluation of Carrot

Autorzy

Dawid Weiss ^{[ 1 ][ P ]}
Jerzy Stefanowski ^{[ 1 ][ P ]}

^{[ 1 ]} Instytut Informatyki (II), Wydział Informatyki i Zarządzania, Politechnika Poznańska | ^{[ P ]} pracownik

Rok publikacji

2003

Typ rozdziału

referat

Język publikacji

angielski

Streszczenie

EN In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration — Suffix Tree Clustering has been acknowledged as being very efficient when applied to English. We present conclusions from its experimental application to Polish, demonstrating fragile areas of the algorithm related to rich inflection and certain properties of the input language. Our results indicate that the characteristics of produced clusters (number, distinctiveness), strongly depend on pre-processing phase. We also attempt to investigate the influence of two primary STC parameters: merge threshold and minimum base cluster score on the number and quality of results. Finally, we introduce two approaches to efficient, approximate conflation of Polish words: quasi-stemmer and an automaton-based lemmatization method.

Strony (od-do)

209 - 219

DOI

10.1007/978-3-540-36562-4_22

URL

https://link.springer.com/chapter/10.1007/978-3-540-36562-4_22

Książka

Intelligent Information Processing and Web Mining : proceedings of the International IIS : IIPWM'03 Conference held in Zakopane, Poland, June 2-5, 2003

Zaprezentowany na

Intelligent information processing and web mining IIPWM'03, 2-5.06.2003, Zakopane, Polska

System tworzony przez Politechnikę Poznańską oraz Poznańskie Centrum Superkomputerowo-Sieciowe

Zaloguj się przez eKonto, aby dodać do SIN