RegularFace: Deep Face Recognition via Exclusive Regularization

Proposed method

In this paper, we propose the ‘exclusive regularization’ to enlarge the distance between samples of different classes, to improve feature discriminability.

Suppose $W \in \mathbb{R}^{D\times C}$ is the weights of classification layer that maps $D$ dimensional features to $C$ dimensional class confidence scores. $W^i$ and $W^j$ are the $i$-th and $j$-th column of $W$. The exclusive loss can be denoted as: $$ \begin{equation} \mathcal{L}_{r}(W) = \frac{1}{C}\sum_i \max_{j\neq i} \frac{W_i \cdot W_j}{||W_i||_2^2 \cdot ||W_j||_2^2}. \label{eq:l-exc} \end{equation} $$

Interestingly, the same idea has been adopted in two other concurrent works in CVPR2019:


pressue plot
Illustration of face embeddings trained under various loss functions, points in color indicate different identities. (a) Softmax loss learns separable decision boundaries. (b) Angular softmax loss learns angularly separable decision boundaries. (c) Center loss[1] ‘pulls’ embeddings of the same identity towards their center, in order to obtain compact and discriminative representations. (d) SphereFace[2] (A-Softmax loss) proposes the ‘angular margin’ to clamp representations within a narrow angle. (e) Our proposed RegularFace introduces ‘inter-class push force’ that explicitly ‘pushes’ representations of different identities far way.
As depicted in above figure, our method "pushes" representations of different identities away from others, improving the "inter-class separability".

A demonstrative implementation in PyTorch:

import torch
class ExclusiveLinear(nn.Module):
  def __init__(self, feat_dim=512, num_class=10572, norm_data=True, radius=20):
    super(ExclusiveLinear, self).__init__()
    self.num_class = num_class
    self.feat_dim = feat_dim
    self.norm_data = norm_data
    self.radius = float(radius)
    self.weight = nn.Parameter(torch.randn(self.num_class, self.feat_dim))

  def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1)), stdv)

  def forward(self, x):

    weight_norm = torch.nn.functional.normalize(self.weight, p=2, dim=1)
    cos =, weight_norm.t())
    cos.clamp(-1, 1)

    cos1 = cos.detach()
    cos1.scatter_(1, torch.arange(self.num_class).view(-1, 1).long().cuda(), -100)

    _, indices = torch.max(cos1, dim=0)
    mask = torch.zeros((self.num_class, self.num_class)).cuda()
    mask.scatter_(1, indices.view(-1, 1).long(), 1)
    exclusive_loss =, mask.view(mask.numel())) / self.num_class
    if self.norm_data:
      x = torch.nn.functional.normalize(x, p=2, dim=1)
      x = x * self.radius

    return torch.nn.functional.linear(x, weight_norm), exclusive_loss

Merit of our method:

Weakness of our method:


If our method is helpful to your research, please kindly consider to cite:
  author = {Zhao, Kai and Xu, Jingyi and Cheng, Ming-Ming},
  title = {RegularFace: Deep Face Recognition via Exclusive Regularization},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2019}


[1] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision., pages 499–515. Springer, 2016.

[2] Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. In IEEE conf Comput Vis Pattern Recog., volume 1, 2017.

[3] Deng, Jiankang and Guo, Jia and Niannan, Xue and Zafeiriou, Stefanos. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In IEEE conf Comput Vis Pattern Recog., volume 1, 2019.

(The comment system is provided by Disqus that is blocked by the GFW. For users from mainland China you may need a VPN.)