W zależności od ilości danych do przetworzenia generowanie pliku może się wydłużyć.

Jeśli generowanie trwa zbyt długo można ograniczyć dane np. zmniejszając zakres lat.

Artykuł

Pobierz plik Pobierz BibTeX

Tytuł

Acoustic model for the classification of Polish vowels

Autorzy

Rok publikacji

2024

Opublikowano w

Vibrations in Physical Systems

Rocznik: 2024 | Tom: vol. 35 | Numer: no. 1

Typ artykułu

artykuł naukowy

Język publikacji

angielski

Słowa kluczowe
EN
  • ASR
  • MFCC
  • PNCC
  • HMM
  • SVM
  • ANN
  • k-NN
Streszczenie

EN The study explored the performance of vowel recognition using an acoustic model built on Audio Fingerprint techniques [1]. The research compares the performance of Support Vector Machines (SVMs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and k-Nearest Neighbours (k-NN) classifiers in the recognition of isolated and within-word vowels and investigates the importance of different types of acoustic speech features in this process. Temporal, spectral, cepstral, formant, LPC and perceptual features of speech were examined. Importance of features was tested using a random forest classifier. Vowel classification was tested at three confidence levels for feature importance: 90%, 95% and 99%. Two author databases consisting of a total of 1,200 samples from 20 speakers, recorded under household conditions, were used. The classifiers were evaluated by confusion matrix, accuracy, precision, sensitivity and F1 score. A segmentation of words into speech sounds was carried out using a tool based on BiLSTM recurrent neural networks and the BIC criterion. Three most important features were determined: power spectral density, spectral cut-off, and Power-Normalised Cepstral Coefficients. In the isolated vowel recognition task, the SVM classifier was the most effective with a feature significance confidence level of 95% obtaining accuracy = 81%, precision = 81%, sensitivity = 81%, F1 score = 80%. In the task of recognising a vowel within a word, it was verified if the algorithm detected the presence of vowels in the correct segment and if it recognised the correct vowel within it. The best results were obtained by the k-NN classifier (statistical confidence level of feature importance of 99.9%). However, these results were low, correct recognition of the vowel in the word: A, E, U: 20%, I, O: 7%, Y: 23%. This indicates strong influence of the neighbourhood of other speech sounds in speech on the acoustic model of vowels and their recognition.

Strony (od-do)

2024101-1 - 2024101-11

DOI

10.21008/j.0860-6897.2024.1.01

URL

https://vibsys.put.poznan.pl/_journal/2024-35-1/articles/vps_2024101.pdf

Uwagi

Article number: 2024101

Typ licencji

CC BY (uznanie autorstwa)

Tryb otwartego dostępu

otwarte czasopismo

Wersja tekstu w otwartym dostępie

ostateczna wersja opublikowana

Pełny tekst artykułu

Pobierz plik

Poziom dostępu do pełnego tekstu

publiczny

Punktacja Ministerstwa / czasopismo

70

Ta strona używa plików Cookies, w celu zapamiętania uwierzytelnionej sesji użytkownika. Aby dowiedzieć się więcej przeczytaj o plikach Cookies i Polityce Prywatności.