Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Chapter

Download BibTeX

Title

Random Similarity Isolation Forests

Authors

[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ S ] student | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2025

Chapter type

chapter in monograph / paper

Publication language

english

Abstract

EN With predictive models becoming prevalent, companies are expanding the types of data they gather. As a result, the collected datasets consist not only of simple numerical features but also more complex objects such as time series, images, or graphs. Such multi-modal data have the potential to improve performance in predictive tasks like outlier detection, where the goal is to identify objects deviating from the main data distribution. However, current outlier detection algorithms are dedicated to individual types of data. Consequently, working with mixed types of data requires either fusing multiple data-specific models or transforming all of the representations into a single format, both of which can hinder predictive performance. In this paper, we propose a multi-modal outlier detection algorithm called Random Similarity Isolation Forest. Our method combines the notions of isolation and similarity-based projection to handle datasets with mixtures of features of arbitrary data types. Experiments performed on 47 benchmark datasets demonstrate that Random Similarity Isolation Forest outperforms five state-of-the-art competitors. Our study shows that the use of multiple modalities can indeed improve the detection of anomalies and highlights the need for new outlier detection benchmarks tailored for multi-modal algorithms.

Date of online publication

15.06.2025

Pages (from - to)

31 - 43

DOI

10.1007/978-981-96-8170-9_3

URL

https://link.springer.com/chapter/10.1007/978-981-96-8170-9_3

Book

Advances in Knowledge Discovery and Data Mining : 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10–13, 2025, Proceedings, Part V

Presented on

29th Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2025, 10-13.06.2025, Sydney, Australia

Ministry points / chapter

20

Ministry points / conference (CORE)

140

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.