RegularFace: Deep Face Recognition via Exclusive Regularization

## Proposed method

In this paper, we propose the ‘exclusive regularization’ to enlarge the distance between samples of different classes, to improve feature discriminability.

Suppose $W \in \mathbb{R}^{D\times C}$ is the weights of classification layer that maps $D$ dimensional features to $C$ dimensional class confidence scores. $W^i$ and $W^j$ are the $i$-th and $j$-th column of $W$. The exclusive loss can be denoted as: $$$$\mathcal{L}_{r}(W) = \frac{1}{C}\sum_i \max_{j\neq i} \frac{W_i \cdot W_j}{||W_i||_2^2 \cdot ||W_j||_2^2}. \label{eq:l-exc}$$$$

Interestingly, the same idea has been adopted in two other concurrent works in CVPR2019:

• UniformFace: Learning Deep Equidistributed Representation for Face Recognition
• Unequal-Training for Deep Face Recognition With Long-Tailed Noisy Data.

## Illustration:

As depicted in above figure, our method "pushes" representations of different identities away from others, improving the "inter-class separability".

## An example implementation in PyTorch:

import torch
class ExclusiveLinear(nn.Module):

def __init__(self, feat_dim=512, num_class=10572, norm_data=True, radius=20):
super(ExclusiveLinear, self).__init__()
self.num_class = num_class
self.feat_dim = feat_dim
self.norm_data = norm_data
self.weight = nn.Parameter(torch.randn(self.num_class, self.feat_dim))
self.reset_parameters()

def reset_parameters(self):
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.data.uniform_(-stdv, stdv)

def forward(self, x):

weight_norm = torch.nn.functional.normalize(self.weight, p=2, dim=1)
cos = torch.mm(weight_norm, weight_norm.t())
cos.clamp(-1, 1)

cos1 = cos.detach()
cos1.scatter_(1, torch.arange(self.num_class).view(-1, 1).long().cuda(), -100)

_, indices = torch.max(cos1, dim=0)

if self.norm_data:
x = torch.nn.functional.normalize(x, p=2, dim=1)



## Merit of our method:

• Easily improve inter-class separability and feature discriminability without hyper-parameter tuning.
• Computationally lite (with small identities). On CASIA-WebFace, the extra overhead our method brings about is negligible.
• Performance improvements on Sphereface[2] and centerloss[1].
• Easy to implement and has straight-forward interpretability.

## Weakness of our method:

• Inefficient and memory-consumptive on large datasets with large numbers of identities. The exclusive loss is calculated from a $C\times C$ cosine similarity matrix ("cos" in above code). For a dataset with large number of identities ($C$), the computation is memory memory-consumptive and inefficient.
• Brings insignificant improvement based on ArcFace[3]. ArcFace introduces several hyper-parameters (additive and multiplicative, $m1, m2$ and $m3$ in Eq.4 of original paper) to control the margin between different identities. Well tuned decision margin can also lead to inter-class separability, especially when the number of classes (identities) is large enough (See the figure above).
• 为了计算公式$\ref{eq:l-exc}$中的exclusive loss，我们要维护一个$C\times C$ 的余弦相似度矩阵（代码中的cos）。 其中 $cos_{i,j}$ 表示 $W_i$ 和 $W_j$ 的余弦相似度。 当数据集中的 identity 个数很多的时候，这个矩阵会很大，因此计算 exclusive loss 效率会比较低，而且消耗内存。
• 在 ArcFace[3] 上性能不理想。 一个可能的原因是：ArcFace引入了多个超参数来控制损失函数的决策边界 （原文Eq.4 中的 $m_1, m_2, m_3$）， 当决策边界控制得比较好的时候，类别间的离散度也会随之变大，特别是当数据集的 identity 数目很多的时候。 与之相比，Sphereface 仅有一个乘性的参数 $m$ 来决定类别间的决策边界的margin，并且 $m$ 只能是整数。

## Citation:

@InProceedings{zhao2019regularface,
author = {Zhao, Kai and Xu, Jingyi and Cheng, Ming-Ming},
title = {RegularFace: Deep Face Recognition via Exclusive Regularization},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

## Reference:

[1] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision., pages 499–515. Springer, 2016.

[2] Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. In IEEE conf Comput Vis Pattern Recog., volume 1, 2017.

[3] Deng, Jiankang and Guo, Jia and Niannan, Xue and Zafeiriou, Stefanos. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In IEEE conf Comput Vis Pattern Recog., volume 1, 2019.

(The comment system is provided by Disqus that is blocked by the GFW. For users from mainland China you may need a VPN.)