Multi-dimensional Edge Feature-based AU Relation Graph
for Facial Action Unit Recognition
International Joint Conference on Artificial Intelligence (IJCAI)
Cheng Luo1,2,3 Siyang Song4 Weicheng Xie1,2,3* Linlin Shen1,2,3 Hatice Gunes4
3Shenzhen Institute of Artificial Intelligence and Robotics for Society
4Guangdong Key Laboratory of Intelligent Information Processing
Figure 1: Comparison between our approach with existing AU graph-based approaches: (a) pre-defined AU graphs that use a single topology to define AU association for all facial displays; (b) Facial display-specific AU graphs that assign a unique topology to define AU association for each facial display. Both (a) and (b) use a single value as an edge feature; (c) Our approach encodes a unique AU association pattern for each facial display in node features, and additionally describes the relationship between each pair of AUs using a pair of multi-dimensional edge features.
The activations of Facial Action Units (AUs) mutually influence one another. While the relationship between a pair of AUs can be complex and unique, existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display. This paper proposes an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs of the target facial display. Our approach first encodes each AU’s activation status and its association with other AUs into a node feature. Then, it learns a pair of multi-dimensional edge features to describe multiple task-specific relationship cues between each pair of AUs. During both node and edge feature learning, our approach also considers the influence of the unique facial display on AUs’relationship by taking the full face representation as an input. Experimental results on BP4D and DISFA datasets show that both node and edge feature learning modules provide large performance improvements for CNN and transformer-based backbones, with our best systems achieving the state-of-the-art AU recognition results. Our approach not only has a strong capability in modelling relationship cues for AU recognition but also can be easily incorporated into various backbones. Our PyTorch code is made available at https://github.com/CVI-SZU/ME-GraphAU.
Figure 2: The pipeline of the proposed AU relationship modelling approach. It takes the full face representation X as the input, and the AFG block that is jointly trained with the FGG block, firstly provides a vector as a node feature to describe each AU’s activation as well as its association with other AUs (Sec. 2.1). Then, the MEFL module learns a pair of vectors as multi-dimensional edge features to describe taskspecific relationship cues between each pair of AUs (Sec. 2.2). The AU relation graph produced by our approach is then fed to a GatedGCN for AU recognition. Only the modules and blocks contained within the blue dashed lines are used at the inference stage.
Figure 3: Illustration of the MEFL module. The FAM first independently locates activation cues related to ith and jth AU-specific feature maps Ui and Uj in the full face representation X (activated face areas are depicted in red and yellow). Then, the ARM further extracts cues related to both Ui and Uj (depicted in white), based on which multi-dimensional edge features ei,j and ej,i are produced.
Figure 4: Visualization of association cues encoded in node features (only systems of the last two columns encode such cues). We connect each node to its K nearest neighbours, where nodes of activated AUs usually have more connections than nodes of inactivated AUs. Systems used such relationship cues have enhanced AU recognition results (predictions of the column 3 is better than the column 2).
Table 1: F1 scores (in %) achieved for 12 AUs on BP4D dataset, where the three methods (SRERL, UGN-B and HMP-PS) listed in the middle of the table are also built with graphs. The best, second best, and third best results of each column are indicated with brackets and bold font, brackets alone, and underline, respectively.