Comparision of Models Built Using AutoML and Data Fusion

Anam Haq; Szymon Wilk; Alberto Abelló

doi:10.1007/978-3-031-15740-0_22

System Informacji Naukowej Politechniki Poznańskiej

PL EN

Strona główna / Publikacje / Comparision of Models Built Using AutoML and Data Fusion

Zgłoś uwagę

Rozdział

Pobierz BibTeX

Tytuł

Comparision of Models Built Using AutoML and Data Fusion

Autorzy

Anam Haq (WIiT) ^{[ 1 ][ D ]}
Szymon Wilk (WIiT) ^{[ 1 ][ 2.3 ][ P ]}
Alberto Abelló

^{[ 1 ]} Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | ^{[ D ]} doktorant | ^{[ P ]} pracownik

Dyscyplina naukowa (Ustawa 2.0)

[2.3] Informatyka techniczna i telekomunikacja

Rok publikacji

2022

Typ rozdziału

rozdział w monografii naukowej / referat

Język publikacji

angielski

Słowa kluczowe

EN

Automated machine learning
AutoML tools
Auto-sklearn
Hyperparameter optimization
Data fusion
Combination of interpretation
Prediction models

Streszczenie

EN Automated machine learning (AutoML) has made life easier for data analysts or scientists by providing quick insights into data by building machine learning (ML) models. AutoML techniques are applied to vast areas from image processing, speech recognition, natural language processing reinforcement learning, and more. However, there is still room for many improvements. AutoML techniques focus only on problems related to predictive modeling, and most of them are designed to work with structured data. AutoML techniques are also time-consuming as they require time to select the appropriate ML pipeline. This paper presents an alternative time-efficient approach for mixed data (both categorical and numerical features obtained from UCI and Kaggle repository) using a data fusion process, which provides high macro average accuracy in less time as compared to AutoML. The AutoML tool considered here is autoscikit-learn (auto-sklearn). This specific library is built in Python using scikit-learn. The implementation of data fusion is also done in Python using scikit-learn. We conclude from the experimental analysis that the pipeline constructed provides better results than the auto-sklearn. This obtained conclusion is supported by a statistical test (Wilcoxon signed ranks test) based on macro average accuracy obtained for both approaches.

Data udostępnienia online

29.08.2022

Strony (od-do)

301 - 314

DOI

10.1007/978-3-031-15740-0_22

URL

https://link.springer.com/chapter/10.1007/978-3-031-15740-0_22

Książka

Advances in Databases and Information Systems : 26th European Conference, ADBIS 2022, Turin, Italy, September 5–8, 2022, Proceedings

Zaprezentowany na

26th European Conference on Advances in Databases and Information Systems ADBIS 2022, 5-8.09.2022, Turin, Italy