We present a multi-task learning-based convolutional neural network (MTL-CNN) able to estimate multiple tags describing face images simultaneously. In total, the model is able to estimate up to 74 different face attributes belonging to three distinct recognition tasks: age group, gender and visual attributes (such as hair color, face shape and the presence of makeup). The proposed model shares all the CNN’s parameters among tasks and deals with task-specific estimation through the introduction of two components: (i) a gating mechanism to control activations’ sharing and to adaptively route them across different face attributes; (ii) a module to post-process the predictions in order to take into account the correlation among face attributes. The model is trained by fusing multiple databases for increasing the number of face attributes that can be estimated and using a center loss for disentangling representations among face attributes in the embedding space. Extensive experiments validate the effectiveness of the proposed approach.
Celona, L., Bianco, S., Schettini, R. (2018). Fine-grained face annotation using deep Multi-Task CNN. SENSORS, 18(8) [10.3390/s18082666].
Fine-grained face annotation using deep Multi-Task CNN
Celona, L
;Bianco, S;Schettini, R
2018
Abstract
We present a multi-task learning-based convolutional neural network (MTL-CNN) able to estimate multiple tags describing face images simultaneously. In total, the model is able to estimate up to 74 different face attributes belonging to three distinct recognition tasks: age group, gender and visual attributes (such as hair color, face shape and the presence of makeup). The proposed model shares all the CNN’s parameters among tasks and deals with task-specific estimation through the introduction of two components: (i) a gating mechanism to control activations’ sharing and to adaptively route them across different face attributes; (ii) a module to post-process the predictions in order to take into account the correlation among face attributes. The model is trained by fusing multiple databases for increasing the number of face attributes that can be estimated and using a center loss for disentangling representations among face attributes in the embedding space. Extensive experiments validate the effectiveness of the proposed approach.File | Dimensione | Formato | |
---|---|---|---|
10281-204618.pdf
accesso aperto
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Dimensione
1.43 MB
Formato
Adobe PDF
|
1.43 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.