Application of Preprocessing Methods to Imbalanced Clinical Data: An Experimental Study

Szymon Wilk; Jerzy Stefanowski; Szymon Wojciechowski; Ken J. Farion; Wojtek Michalowski

doi:10.1007/978-3-319-39796-2_41

Scientific Information System of the Poznań University of Technology

PL EN

Main page / Publications / Application of Preprocessing Methods to Imbalanced Clinical Data: An Experimental Study

Submit a comment

Chapter

Download BibTeX

Title

Application of Preprocessing Methods to Imbalanced Clinical Data: An Experimental Study

Authors

Szymon Wilk (WI) ^{[ 1 ][ P ]}
Jerzy Stefanowski (WI) ^{[ 1 ][ P ]}
Szymon Wojciechowski (WBMiZ) ^{[ 2 ][ P ]}
Ken J. Farion
Wojtek Michalowski

^{[ 1 ]} Instytut Informatyki, Wydział Informatyki, Politechnika Poznańska | ^{[ 2 ]} Instytut Technologii Mechanicznej, Wydział Budowy Maszyn i Zarządzania, Politechnika Poznańska | ^{[ P ]} employee

Year of publication

2016

Chapter type

paper

Publication language

english

Keywords

EN

clinical data
class imbalance
data difficulty factors
preprocessing methods
classification performance

Abstract

EN In this paper we describe an experimental study where we analyzed data difficulty factors encountered in imbalanced clinical data sets and examined how selected data preprocessing methods were able to address these factors. We considered five data sets describing various pediatric acute conditions. In all these data sets the minority class was sparse and overlapped with the majority classes, thus difficult to learn. We studied five different preprocessing methods: random under- and oversampling, SMOTE, neighborhood cleaning rule and SPIDER2 that were combined with the following classifiers: k-nearest neighbors, decision trees and rules, naive Bayes, neural networks and support vector machines. Application of preprocessing always improved classification performance, and the largest improvement was observed for random undersampling. Moreover, naive Bayes was the best performing classifier regardless of a used preprocessing method.

Pages (from - to)

503 - 515

DOI

10.1007/978-3-319-39796-2_41

URL

https://link.springer.com/chapter/10.1007/978-3-319-39796-2_41

Book

Information Technologies in Medicine : 5th International Conference, ITIB 2016 Kamień Śląski, Poland, June 20 - 22, 2016 : Proceedings, Volume 1

Presented on

5th International Conference on Information Technologies in Biomedicine, ITIB 2016, 20-22.06.2016, Kamień Śląski, Polska

Publication indexed in

WoS (15)

System created by Poznań University of Technology and Poznan Supercomputing and Networking Center

Log in through eKonto to add to SIS