Biomedical Semantic Textual Similarity: Evaluation of Sentence Representations Enhanced with Principal Component Reduction and Word Frequency Weighting

Klaudia Kantor; Mikołaj Morzy

doi:10.1007/978-3-031-09342-5_39

Scientific Information System - PSNC

PL EN

Main page / Publications / Biomedical Semantic Textual Similarity: Evaluation of Sentence Representations Enhanced with Principal Component Reduction and Word Frequency Weighting

Submit a comment

Chapter

Download BibTeX

Title

Biomedical Semantic Textual Similarity: Evaluation of Sentence Representations Enhanced with Principal Component Reduction and Word Frequency Weighting

Authors

Klaudia Kantor (WIiT) ^{[ 1 ][ 2.3 ][ SzD ]}
Mikołaj Morzy (WIiT) ^{[ 2 ][ 2.3 ][ P ]}

^{[ 1 ]} Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | ^{[ 2 ]} Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | ^{[ SzD ]} doctoral school student | ^{[ P ]} employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2022

Chapter type

chapter in monograph / paper

Publication language

english

Abstract

EN Biomedical texts encode semantics in domain vocabulary, extensive use of acronyms, proper nouns, named entities, and numerical values with implied meaning. This information is absent from the surface text form, making semantic textual similarity challenging for models trained on the general English language. This paper evaluates different techniques of sentence embedding in semantic textual similarity search in the biomedical domain. We compare static embeddings, transformer-based representations (focusing on models fine-tuned to the biomedical domain), and sentence transformers. We also introduce two auxiliary techniques: principal component reduction and word frequency embedding weighting. To gain better insights into the latent properties of sentence embeddings, we perform directional expectation tests. We conduct our experiments on two benchmark datasets: the BIOSSES and the Clinical Outcomes. We find that sentence transformers are surprisingly effective, outperforming fine-tuned transformer-based models. Initial experiments also suggest the efficacy of principal component reduction and embedding weighting by word frequency.

Date of online publication

09.07.2022

Pages (from - to)

393 - 403

DOI

10.1007/978-3-031-09342-5_39

URL

https://link.springer.com/chapter/10.1007/978-3-031-09342-5_39

Book

Artificial Intelligence in Medicine : 20th International Conference on Artificial Intelligence in Medicine, AIME 2022, Halifax, NS, Canada, June 14–17, 2022 : Proceedings

Presented on

20th International Conference on Artificial Intelligence in Medicine AIME 2022, 14-17.06.2022, Halifax, Canada