Speech enhancement using U-nets with wide-context units

Tomasz Grzywalski; Szymon Drgas

doi:10.1007/s11042-022-12632-6

System Informacji Naukowej Politechniki Poznańskiej

PL EN

Strona główna / Publikacje / Speech enhancement using U-nets with wide-context units

Zgłoś uwagę

Artykuł

Pobierz BibTeX

Tytuł

Speech enhancement using U-nets with wide-context units

Autorzy

Tomasz Grzywalski
Szymon Drgas (WARiE) ^{[ 1 ][ 2.2 ][ P ]}

^{[ 1 ]} Instytut Automatyki i Robotyki, Wydział Automatyki, Robotyki i Elektrotechniki, Politechnika Poznańska | ^{[ P ]} pracownik

Dyscyplina naukowa (Ustawa 2.0)

[2.2] Automatyka, elektronika, elektrotechnika i technologie kosmiczne

Rok publikacji

2022

Opublikowano w

Multimedia Tools and Applications

Rocznik: 2022 | Tom: vol. 81 | Numer: iss. 13

Typ artykułu

artykuł naukowy

Język publikacji

angielski

Słowa kluczowe

EN

speech enhancement
U-nets
DNN

Streszczenie

EN In this article a new neural network for speech enhancement is proposed where single-channel noisy speech is processed in order to improve its intelligibility and quality. It is based on the U-net architecture, i.e. it is composed of two main blocks: encoder and decoder. Some of the corresponding layers in the encoder and decoder are connected with skip connections. In most of the encoder-decoder neural networks for speech enhancement known from the literature, the time-frequency resolution of the hidden feature maps is reduced. The main strategy in the presented approach is to maintain the time-frequency resolution of feature maps at all levels of the network while having large receptive field at the same time. In order to obtain features dependent on wide context we propose neural network units based on recurrent cells or dilated convolutions. The proposed neural network was evaluated using WSJ0 and TIMIT speech data mixed with noises from Noisex, DCASE and field recordings from Freesound online database. The results showed improvement over the baseline networks based on gated dilated convolutions or long-short term memory (LSTM) in terms of scale-independent speech-to-distortion ratio (SI-SDR), spectro-temporal objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) measures.

Data udostępnienia online

09.03.2022

Strony (od-do)

18617 - 18639

DOI

10.1007/s11042-022-12632-6

URL

https://link.springer.com/article/10.1007/s11042-022-12632-6

Punktacja Ministerstwa / czasopismo

70

Impact Factor

3,6

System tworzony przez Politechnikę Poznańską oraz Poznańskie Centrum Superkomputerowo-Sieciowe

Zaloguj się przez eKonto, aby dodać do SIN