Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Chapter

Download BibTeX

Title

Local Data Characteristics in Learning Classifiers from Imbalanced Data

Authors

[ 1 ] Instytut Informatyki, Wydział Informatyki, Politechnika Poznańska | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2018

Chapter type

chapter in monograph

Publication language

english

Abstract

EN Learning classifiers from imbalanced data is still one of challenging tasks in machine learning and data mining. Data difficulty factors referring to internal and local characteristics of class distributions deteriorate performance of standard classifiers. Many of these factors may be approximated by analyzing the neighbourhood of the learning examples and identifying different types of examples from the minority class. In this paper, we follow recent research on developing such methods for assessing the types of examples which exploit either k-nearest neighbours or kernels. We discuss the approaches to tune the size of both kinds of neighborhoods depending on the data set characteristics and evaluate their usefulness in series of experiments with real-world and synthetic data sets. Furthermore, we claim that the proper analysis of these neighborhoods could be the basis for developing new specialized algorithms for imbalanced data. To illustrate it, we study generalizations of over-sampling in pre-processing methods and neighbourhood based ensembles.

Pages (from - to)

51 - 85

DOI

10.1007/978-3-319-67946-4_2

URL

https://link.springer.com/chapter/10.1007/978-3-319-67946-4_2

Book

Advances in Data Analysis with Computational Intelligence Methods

Ministry points / chapter

20

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.