## 1. Abstract

Label distribution learning (LDL) is a general learning framework, which assigns
to an instance a distribution over a set of labels rather than a single label or multiple
labels. Current LDL methods have either restricted assumptions on the expression
form of the label distribution or limitations in representation learning, e.g., to
learn deep features in an end-to-end manner. This paper presents label distribution
learning forests (LDLFs) - a novel label distribution learning algorithm based on
differentiable decision trees, which have several advantages: 1) Decision trees
have the potential to model any general form of label distributions by a mixture
of leaf node predictions. 2) The learning of differentiable decision trees can be
combined with representation learning. We define a distribution-based loss function
for a forest, enabling all the trees to be learned jointly, and show that an update
function for leaf node predictions, which guarantees a strict decrease of the loss
function, can be derived by variational bounding. The effectiveness of the proposed
LDLFs is verified on several LDL tasks and a computer vision application, showing
significant improvements to the state-of-the-art LDL methods.

## 2. The Algorithm

### 2.1 Label Distribution Learning

Label distribution learning is a learning framework to deal with problems of label ambuguity. Unlike single-label learning and multi-label learning, which assume an instance is assigned to a single labek or multiple labels, LDL aims at learning the relative importance of each label involved in the description of an instance, i.e., a distribution over the set of labels.

The real-world data which are suitable to be modeled by label distribution learning: 1)Estimated facial ages (a unimodal distribution). 2) Rating distribution of crowd opinion on a movie (a multimodal distribution).

### 2.2 Observation and Motivation

Decision trees have the potential to model any general form of label distributions by mixture of the leaf node predictions, which avoid making strong assumption on the form of the label distributions. Convolutional Neural Networks(CNNs) have the strong ability of learning representation. The split node parameters in differentiable decision trees can be learned by back-propagation(BP), which enables a combination of tree learning and representation learning in an end-to-end manner.

### 2.3 The Proposed Method

Illustration of proposed LDLForests, top red circles denote the output of function $\mathbf{f}(\cdot,\Theta)$ parameterized by $\Theta$, $\mathbf{f}$ is a convolution neural network. Green circles on the bottom represent leaf nodes of decision trees, each of which contains a distribution.

## 3. Experimental Results

### 3.1 Comparison of LDLFs to Stand-alone LDL Methods

We compare our proposed LDLFs with other state-of-art LDL methods on 3 popular LDL datasets, as we can see bellow, our method perform best on all of the six measures:

Comparison results on three LDL datasets, $\uparrow$ and $\downarrow$ indicate the larger and the smaller
the better, respectively.

### 3.2 Evaluation of LDLFs on Facial Age Estimation

We conduct facial age estimation experiments on Morph dataset, we compare LDLFs with other deeplearning-based and non-deeplearning facial age estimation methods, results are summarized as bellow, which shows LDLFs achieve the state-of-the-art
performance on Morph

MAE of age estimation comparison on Morph.

## 4. Code and Data

### 4.1 Code and data

### 4.2 Pretrained models

## 5. Citation and Contact

### If you use our code, please consider citing following paper:

@article{shen2017label,
title={Label Distribution Learning Forests},
author={Shen, Wei and Zhao, Kai and Guo, Yilu and Yuille, Alan},
journal={Proceedings of Advances in neural information processing systems},
year={2017}
}

### For any questions, please leave comments bellow!