Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Chapter

Download BibTeX

Title

Three Ways of Using Large Language Models to Evaluate Chat

Authors

[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2023

Chapter type

chapter in monograph / paper

Publication language

english

Abstract

EN This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition. We present three different approaches to predicting turn-level qualities of chatbot responses based on large language models (LLMs). We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT. We also analyze the performance of the other two approaches and report needed improvements for future work. We developed the three systems over just two weeks, showing the potential of LLMs for this task. An ablation study conducted after the challenge deadline shows that the new Llama 2 models are closing the performance gap between ChatGPT and open-source LLMs. However, we find that the Llama 2 models do not benefit from few-shot examples in the same way as ChatGPT.

Pages (from - to)

113 - 122

URL

https://aclanthology.org/2023.dstc-1.14/

Book

Proceedings of the Eleventh Dialog System Technology Challenge

Presented on

11th Dialog System Technology Challenge DSTC11 (at the SIGdial-INLG 2023 joint conference), 11.09.2023, Prague, Czech Republic

License type

CC BY (attribution alone)

Open Access Mode

publisher's website

Open Access Text Version

final published version

Date of Open Access to the publication

at the time of publication

Ministry points / chapter

5

Ministry points / conference (CORE)

70

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.