Speaker verification is the task of examining a speech signal to authenticate the claimed identity of a speaker as true or false. In order to deal with utterances having different lengths, and to accumulate information along the time dimension, different temporal aggregators have been proposed inside speaker verification pipelines. In this paper we investigate the behavior of five different temporal aggregators in the state of art, namely Temporal Average Pooling (TAP), Global Statistical Pooling (GSP), Self-Attentive Pooling (SAP), Attentive Statistical Pooling (ASP), and Vector of Locally Aggregated Descriptors (VLAD) at varying lengths of the two utterances. Starting from a speaker verification method in the state of the art, the experimental results on the VoxCeleb2 dataset show that there is a sweet spot for utterance length where speaker verification performance is higher independently from the temporal aggregator used.

Piccoli, F., Olearo, L., Bianco, S. (2022). A comparison of temporal aggregators for speaker verification. In IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin (pp.1-6). IEEE Computer Society [10.1109/ICCE-Berlin56473.2022.9937132].

A comparison of temporal aggregators for speaker verification

Piccoli F.;Bianco S.
2022

Abstract

Speaker verification is the task of examining a speech signal to authenticate the claimed identity of a speaker as true or false. In order to deal with utterances having different lengths, and to accumulate information along the time dimension, different temporal aggregators have been proposed inside speaker verification pipelines. In this paper we investigate the behavior of five different temporal aggregators in the state of art, namely Temporal Average Pooling (TAP), Global Statistical Pooling (GSP), Self-Attentive Pooling (SAP), Attentive Statistical Pooling (ASP), and Vector of Locally Aggregated Descriptors (VLAD) at varying lengths of the two utterances. Starting from a speaker verification method in the state of the art, the experimental results on the VoxCeleb2 dataset show that there is a sweet spot for utterance length where speaker verification performance is higher independently from the temporal aggregator used.
slide + paper
Speaker verification; temporal aggregation;
English
12th IEEE International Conference on Consumer Electronics, ICCE-Berlin 2022 - 2 September 2022 through 6 September 2022
2022
IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin
978-1-6654-5676-0
2022
2022-
1
6
none
Piccoli, F., Olearo, L., Bianco, S. (2022). A comparison of temporal aggregators for speaker verification. In IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin (pp.1-6). IEEE Computer Society [10.1109/ICCE-Berlin56473.2022.9937132].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/398215
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact