Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data
[ 1 ] Instytut Informatyki (II), Wydział Informatyki i Zarządzania, Politechnika Poznańska | [ P ] pracownik
2004
rozdział w monografii naukowej / referat
angielski
EN Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrow-topic document references and mix them into several multi-topic test sets for the algorithm We then compare the clusters acquired from Lingo to the expected set of ODP categories mixed in the input. Finally we discuss observations from the experiment, highlighting the algorithm’s strengths and weaknesses and conclude with research directions for the future.
369 - 377
International IIS: IIPWM‘04 Conference, 17-20.05.2004, Zakopane, Polska