Acoustic model for the classification of Polish vowels

Karolina Pondel-Sycz

doi:10.21008/j.0860-6897.2024.1.01

System Informacji Naukowej Politechniki Poznańskiej

PL EN

Strona główna / Publikacje / Acoustic model for the classification of Polish vowels

Zgłoś uwagę

Artykuł

Pobierz plik Pobierz BibTeX

Tytuł

Acoustic model for the classification of Polish vowels

Autorzy

Karolina Pondel-Sycz

Rok publikacji

2024

Opublikowano w

Vibrations in Physical Systems

Rocznik: 2024 | Tom: vol. 35 | Numer: no. 1

Typ artykułu

artykuł naukowy

Język publikacji

angielski

Słowa kluczowe

EN

ASR
MFCC
PNCC
HMM
SVM
ANN
k-NN

Streszczenie

EN The study explored the performance of vowel recognition using an acoustic model built on Audio Fingerprint techniques [1]. The research compares the performance of Support Vector Machines (SVMs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and k-Nearest Neighbours (k-NN) classifiers in the recognition of isolated and within-word vowels and investigates the importance of different types of acoustic speech features in this process. Temporal, spectral, cepstral, formant, LPC and perceptual features of speech were examined. Importance of features was tested using a random forest classifier. Vowel classification was tested at three confidence levels for feature importance: 90%, 95% and 99%. Two author databases consisting of a total of 1,200 samples from 20 speakers, recorded under household conditions, were used. The classifiers were evaluated by confusion matrix, accuracy, precision, sensitivity and F1 score. A segmentation of words into speech sounds was carried out using a tool based on BiLSTM recurrent neural networks and the BIC criterion. Three most important features were determined: power spectral density, spectral cut-off, and Power-Normalised Cepstral Coefficients. In the isolated vowel recognition task, the SVM classifier was the most effective with a feature significance confidence level of 95% obtaining accuracy = 81%, precision = 81%, sensitivity = 81%, F1 score = 80%. In the task of recognising a vowel within a word, it was verified if the algorithm detected the presence of vowels in the correct segment and if it recognised the correct vowel within it. The best results were obtained by the k-NN classifier (statistical confidence level of feature importance of 99.9%). However, these results were low, correct recognition of the vowel in the word: A, E, U: 20%, I, O: 7%, Y: 23%. This indicates strong influence of the neighbourhood of other speech sounds in speech on the acoustic model of vowels and their recognition.

Strony (od-do)

2024101-1 - 2024101-11

DOI

10.21008/j.0860-6897.2024.1.01

URL

https://vibsys.put.poznan.pl/_journal/2024-35-1/articles/vps_2024101.pdf

Uwagi

Article number: 2024101

Typ licencji

CC BY (uznanie autorstwa)

Tryb otwartego dostępu

otwarte czasopismo