FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes

Dawid Wiśniewski; Zofia Rostek; Artur Nowakowski

Scientific Information System of the Poznań University of Technology

PL EN

Main page / Publications / FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes

Submit a comment

Chapter

Download BibTeX

Title

FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes

Authors

Dawid Wiśniewski (WIiT) ^{[ 1 ][ 2.3 ][ P ]}
Zofia Rostek
Artur Nowakowski

^{[ 1 ]} Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | ^{[ P ]} employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2024

Chapter type

chapter in monograph / paper

Publication language

english

Keywords

EN

machine translation
natural language processing

Abstract

EN People use language for various purposes. Apart from sharing information, individuals may use it to express emotions or to show respect for another person. In this paper, we focus on the formality level of machine-generated translations and present FAME-MT – a dataset consisting of 11.2 million translations between 15 European source languages and 8 European target languages classified to formal and informal classes according to target sentence formality. This dataset can be used to fine-tune machine translation models to ensure a given formality level for 8 European target languages considered. We describe the dataset creation procedure, the analysis of the dataset’s quality showing that FAME-MT is a reliable source of language register information, and we construct a publicly available proof-of-concept machine translation model that uses the dataset to steer the formality level of the translation. Currently, it is the largest dataset of formality annotations, with examples expressed in 112 European language pairs. The dataset is made available online.

Pages (from - to)

164 - 180

URL

https://aclanthology.org/2024.eamt-1.16/

Book

Proceedings of the 25th Annual Conference of the European Association for Machine Translation. Volume 1: Research and Implementations & Case Studies, June 24-27, 2024, Sheffield, United Kingdom

Presented on

25th Annual Conference of the European Association for Machine Translation EAMT 2024, 24-27.06.2024, Sheffield, United Kingdom

License type

CC BY-NC-ND (attribution - noncommercial - no derivatives)

Open Access Mode

publisher's website