Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Article

Download BibTeX

Title

Design pattern recognition: a study of large language models

Authors

[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2025

Published in

Empirical Software Engineering

Journal year: 2025 | Journal volume: vol. 30 | Journal number: iss. 3

Article type

scientific article

Publication language

english

Keywords
EN
  • Large language model
  • Design pattern recognition
  • Software reengineering
  • Deep learning
Abstract

EN Context As Software Engineering (SE) practices evolve due to extensive increases in soft- ware size and complexity, the importance of tools to analyze and understand source code grows significantly. Objective This study aims to evaluate the abilities of Large Language Models (LLMs) in identifying DPs in source code, which can facilitate the development of better Design Pattern Recognition (DPR) tools. We compare the effectiveness of different LLMs in capturing semantic information relevant to the DPR task. Methods We studied Gang of Four (GoF) DPs from the P-MARt repository of curated Java projects. State-of-the-art language models, including Code2Vec, CodeBERT, CodeGPT, CodeT5, and RoBERTa, are used to generate embeddings from source code. These embed- dings are then used for DPR via a k-nearest neighbors prediction. Precision, recall, and F1-score metrics are computed to evaluate performance. Results RoBERTa is the top performer, followed by CodeGPT and CodeBERT, which showed mean F1 Scores of 0.91, 0.79, and 0.77, respectively. The results show that LLMs without explicit pre-training can effectively store semantics and syntactic information, which can be used in building better DPR tools. Conclusion The performance of LLMs in DPR is comparable to existing state-of-the-art methods but with less effort in identifying pattern-specific rules and pre-training. Factors influencing prediction performance in Java files/programs are analyzed. These findings can advance software engineering practices and show the importance and abilities of LLMs for effective DPR in source code.

Date of online publication

18.02.2025

Pages (from - to)

69-1 - 69-45

DOI

10.1007/s10664-025-10625-1

URL

https://link.springer.com/article/10.1007/s10664-025-10625-1

Comments

Article Number: 69

License type

CC BY (attribution alone)

Open Access Mode

czasopismo hybrydowe

Open Access Text Version

final published version

Date of Open Access to the publication

in press

Ministry points / journal

140

Impact Factor

3,5 [List 2023]

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.