Comparison of machine learning algorithms used to classify the asteroids observed by all-sky surveys
[ 1 ] Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ 2 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] pracownik
2022
artykuł naukowy
angielski
- minor planets
- asteroids: general
- methods: data analysis
- methods: statistical
- surveys
EN Context. Multifilter photometry from large sky surveys is commonly used to assign asteroid taxonomic types and study various problems in planetary science. To maximize the science output of those surveys, it is important to use methods that best link the spectro-photometric measurements to asteroid taxonomy. Aims. We aim to determine which machine learning methods are the most suitable for the taxonomic classification for various sky surveys. Methods. We utilized five machine learning supervised classifiers: logistic regression, naive Bayes, support vector machines (SVMs), gradient boosting, and MultiLayer Perceptrons (MLPs). Those methods were found to reproduce the Bus-DeMeo taxonomy at various rates depending on the set of filters used by each survey. We report several evaluation metrics for a comprehensive comparison (prediction accuracy, balanced accuracy, F1 score, and the Matthews correlation coefficient) for 11 surveys and space missions. Results. Among the methods analyzed, multilayer perception and gradient boosting achieved the highest accuracy and naive Bayes achieved the lowest accuracy in taxonomic prediction across all surveys. We found that selecting the right machine learning algorithm can improve the success rate by a factor of >2. The best balanced accuracy (~85% for a taxonomic type prediction) was found for the Visible and Infrared Survey telescope for Astronomy (VISTA) and the ESA Euclid mission surveys where broadband filters best map the 1 µm and 2 µm olivine and pyroxene absorption bands. Conclusions. To achieve the highest accuracy in the taxonomic type prediction based on multifilter photometric measurements, we recommend the use of gradient boosting and MLP optimized for each survey. This can improve the overall success rate even when compared with naive Bayes. A merger of different datasets can further boost the prediction accuracy. For the combination of the Legacy Survey of Space and Time and VISTA survey, we achieved 90% for the taxonomic type prediction.
31.10.2023
A10-1 - A10-15
Article Number: A10
CC BY (uznanie autorstwa)
czasopismo hybrydowe
ostateczna wersja opublikowana
w momencie opublikowania
140
6,5