On Identifying Similarities in Git Commit Trends - A Comparison Between Clustering and SimSAX

Mirosław Ochodek; Miroslaw Staron; Wilhelm Meding

doi:10.1007/978-3-030-35510-4_7

System Informacji Naukowej Politechniki Poznańskiej

PL EN

Strona główna / Publikacje / On Identifying Similarities in Git Commit Trends - A Comparison Between Clustering and SimSAX

Zgłoś uwagę

Rozdział

Pobierz BibTeX

Tytuł

On Identifying Similarities in Git Commit Trends - A Comparison Between Clustering and SimSAX

Autorzy

Mirosław Ochodek (WIiT) ^{[ 1 ][ 2.3 ][ P ]}
Miroslaw Staron
Wilhelm Meding

^{[ 1 ]} Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | ^{[ P ]} pracownik

Dyscyplina naukowa (Ustawa 2.0)

[2.3] Informatyka techniczna i telekomunikacja

Rok publikacji

2020

Typ rozdziału

rozdział w monografii naukowej / referat

Język publikacji

angielski

Streszczenie

EN Software products evolve increasingly fast as markets continuously demand new features and agility to customer’s need. This evolution of products triggers an evolution of software development practices in a different way. Compared to classical methods, where products were developed in projects, contemporary methods for continuous integration, delivery, and deployment develop products as part of continuous programs. In this context, software architects, designers, and quality engineers need to understand how the processes evolve over time since there is no natural start and stop of projects. For example, they need to know how similar two iterations of the same program or how similar two development programs are. In this paper, we compare three methods for calculating the degree of similarity between projects by comparing their Git commit series. We test three approaches—the DNA-motifs-inspired SimSAX measure and clustering of subsequences (k-Means and Hierarchical clustering). Our results show that the clustering algorithms are much more sensitive to parameters and often find similarities that are not correct. SimSAX, on the other hand, can be calibrated to find fewer similarities between the projects; the similarities are also more consistent for SimSAX than they are for the clustering. We conclude that it is better to use DNA-inspired motifs as they provide more accurate results.

Data udostępnienia online

09.12.2019

Strony (od-do)

109 - 120

DOI

10.1007/978-3-030-35510-4_7

URL

https://link.springer.com/chapter/10.1007/978-3-030-35510-4_7

Książka

Software Quality: Quality Intelligence in Software and Systems Engineering : 12th International Conference, SWQD 2020, Vienna, Austria, January 14–17, 2020 : Proceedings

Zaprezentowany na

12th International Conference on Software Quality SWQD 2020, 14-17.01.2020, Vienna, Austria