Automatic Task Classification of Software Projects for Planning and Simulation
[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee
2025
chapter in monograph / paper
english
EN Background: Information about project tasks stored in Issue tracking systems (ITS) can be used for project analytics or process simulation. However, such issues must be classified beforehand. Considering the number of tasks stored in ITS, this task shall be done automatically. Aims: Our research aims to build an automatic recurring Jira issue classification model based on types and textual descriptions to enable the practical application of the model for software project planning and management. Method: We study a dataset from six industrial projects containing 9.6K tasks and augment it with an additional dataset of 91K task descriptions from other industrial projects to up-sample minority classes during training. We labeled the data using a semi-manual, active-learning-based method. We perform ten runs of 10-fold cross-validation for each project and evaluate classifiers using a set of state-of-the-art prediction quality metrics, i.e., Accuracy, Precision, Recall, F1-score, and MCC. Our machine-learning pipeline includes a Transformer-based sentence embedder (‘mxbai-embed-large-v1’) and an XGBoost classifier. We also study the impact of task-classification errors on project staffing issues. Results: The model automatically classifies software process tasks into 14 classes with MCCs ranging from 0.69 to 0.88. We built a confusion matrix that showed the most frequently confused task classes. We analyzed the consequences of classification errors. Conclusions: The study’s results enable the practical application of the software process model to analyze, plan, and manage software development projects.
29.03.2025
30 - 63
20
70