Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Article

Download file Download BibTeX

Title

Acoustic model for the classification of Polish vowels

Authors

Year of publication

2024

Published in

Vibrations in Physical Systems

Journal year: 2024 | Journal volume: vol. 35 | Journal number: no. 1

Article type

scientific article

Publication language

english

Keywords
EN
  • ASR
  • MFCC
  • PNCC
  • HMM
  • SVM
  • ANN
  • k-NN
Abstract

EN The study explored the performance of vowel recognition using an acoustic model built on Audio Fingerprint techniques [1]. The research compares the performance of Support Vector Machines (SVMs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and k-Nearest Neighbours (k-NN) classifiers in the recognition of isolated and within-word vowels and investigates the importance of different types of acoustic speech features in this process. Temporal, spectral, cepstral, formant, LPC and perceptual features of speech were examined. Importance of features was tested using a random forest classifier. Vowel classification was tested at three confidence levels for feature importance: 90%, 95% and 99%. Two author databases consisting of a total of 1,200 samples from 20 speakers, recorded under household conditions, were used. The classifiers were evaluated by confusion matrix, accuracy, precision, sensitivity and F1 score. A segmentation of words into speech sounds was carried out using a tool based on BiLSTM recurrent neural networks and the BIC criterion. Three most important features were determined: power spectral density, spectral cut-off, and Power-Normalised Cepstral Coefficients. In the isolated vowel recognition task, the SVM classifier was the most effective with a feature significance confidence level of 95% obtaining accuracy = 81%, precision = 81%, sensitivity = 81%, F1 score = 80%. In the task of recognising a vowel within a word, it was verified if the algorithm detected the presence of vowels in the correct segment and if it recognised the correct vowel within it. The best results were obtained by the k-NN classifier (statistical confidence level of feature importance of 99.9%). However, these results were low, correct recognition of the vowel in the word: A, E, U: 20%, I, O: 7%, Y: 23%. This indicates strong influence of the neighbourhood of other speech sounds in speech on the acoustic model of vowels and their recognition.

Pages (from - to)

2024101-1 - 2024101-11

DOI

10.21008/j.0860-6897.2024.1.01

URL

https://vibsys.put.poznan.pl/_journal/2024-35-1/articles/vps_2024101.pdf

Comments

Article number: 2024101

License type

CC BY (attribution alone)

Open Access Mode

open journal

Open Access Text Version

final published version

Full text of article

Download file

Access level to full text

public

Ministry points / journal

70

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.