Abstract
This paper presents the development of our systems for the Interspeech 2020 Far-Field Speaker Verification Challenge (FFSVC). Our focus is the task 2 of the challenge, which is to perform far-field text-independent speaker verification using a single microphone array. The FFSVC training set provided by the challenge is augmented by pre-processing the far-field data with both beamforming, voice channel switching, and a combination of weighted prediction error (WPE) and beamforming. Two open-access corpora, CHData in Mandarin and VoxCeleb2 in English, are augmented using multiple methods and mixed with the augmented FFSVC data to form the final training data. Four different model structures are used to model speaker characteristics: ResNet, extended time-delay neural network (ETDNN), Transformer, and factorized TDNN (FTDNN), whose output values are pooled across time using the self-attentive structure, the statistic pooling structure, and the GVLAD structure. The final results are derived by fusing the adaptively normalized scores of the four systems with a two-stage fusion method, which achieves a minimum of the detection cost function (minDCF) of 0.3407 and an equal error rate (EER) of 2.67% on the development set of the challenge.
| Original language | English |
|---|---|
| Title of host publication | Interspeech 2020 |
| Publisher | International Speech Communication Association |
| Pages | 3476-3480 |
| Number of pages | 5 |
| ISBN (Print) | 9781713820697 |
| DOIs | |
| Publication status | Published - 2020 |
| Externally published | Yes |
| Event | 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China Duration: 25 Oct 2020 → 29 Oct 2020 |
Publication series
| Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
|---|---|
| Volume | 2020-October |
| ISSN (Print) | 2308-457X |
| ISSN (Electronic) | 1990-9772 |
Conference
| Conference | 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 |
|---|---|
| Country/Territory | China |
| City | Shanghai |
| Period | 25/10/20 → 29/10/20 |
Bibliographical note
Publisher Copyright:© 2020 ISCA
Keywords
- Data augmentation
- Deep neural network
- Score normalization
- Speaker verification