Learning Representations for Facial Actions From Unlabeled Videos

Facial actions are usually encoded as anatomy-based action units (AUs), the labelling of which demands expertise and thus is time-consuming and expensive. To alleviate the labelling demand, we propose to leverage the large number of unlabelled videos by proposing a twin-cycle autoencoder (TAE) to le...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 1 vom: 15. Jan., Seite 302-317
1. Verfasser:	Li, Yong (VerfasserIn)
Weitere Verfasser:	Zeng, Jiabei, Shan, Shiguang
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2022
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000naa a22002652 4500
001	NLM313260680
003	DE-627
005	20231225150210.0
007	cr uuu---uuuuu
008	231225s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2020.3011063 \|2 doi
028	5	2	\|a pubmed24n1044.xml
035			\|a (DE-627)NLM313260680
035			\|a (NLM)32750828
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Li, Yong \|e verfasserin \|4 aut
245	1	0	\|a Learning Representations for Facial Actions From Unlabeled Videos
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 10.01.2022
500			\|a Date Revised 10.01.2022
500			\|a published: Print-Electronic
500			\|a Citation Status MEDLINE
520			\|a Facial actions are usually encoded as anatomy-based action units (AUs), the labelling of which demands expertise and thus is time-consuming and expensive. To alleviate the labelling demand, we propose to leverage the large number of unlabelled videos by proposing a twin-cycle autoencoder (TAE) to learn discriminative representations for facial actions. TAE is inspired by the fact that facial actions are embedded in the pixel-wise displacements between two sequential face images (hereinafter, source and target) in the video. Therefore, learning the representations of facial actions can be achieved by learning the representations of the displacements. However, the displacements induced by facial actions are entangled with those induced by head motions. TAE is thus trained to disentangle the two kinds of movements by evaluating the quality of the synthesized images when either the facial actions or head pose is changed, aiming to reconstruct the target image. Experiments on AU detection show that TAE can achieve accuracy comparable to other existing AU detection methods including some supervised methods, thus validating the discriminant capacity of the representations learned by TAE. TAE's ability in decoupling the action-induced and pose-induced movements is also validated by visualizing the generated images and analyzing the facial image retrieval results qualitatively and quantitatively
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Zeng, Jiabei \|e verfasserin \|4 aut
700	1		\|a Shan, Shiguang \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 44(2022), 1 vom: 15. Jan., Seite 302-317 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:44 \|g year:2022 \|g number:1 \|g day:15 \|g month:01 \|g pages:302-317
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2020.3011063 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 44 \|j 2022 \|e 1 \|b 15 \|c 01 \|h 302-317