DeepSkeleton: Learning Multi-task Scale-associated Deep Side Outputs for Object Skeleton Extraction in Natural Images

Wei Shen1,2, Kai Zhao1, Yuan Jiang1, Yan Wang2, Xiang Bai3, Alan Yuille2

1 Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Shanghai University.

2 Department of Computer Science, Johns Hopkins University.

3 School of Electronic Information and Communications, Huazhong University of Science and Technology.


In this paper(arXiv:1609.03659) we present a noval fully convolutional network with multiple scale-associated side outputs to address the skeleton detection problem. By observing the relationship between the receptive field sizes of the different layers in the network and the skeleton scales they can capture, we introduce two scale-associated side outputs to each stage of the network. The network is trained by multi-task learning, where one task is skeleton localization to classify whether a pixel is a skeleton pixel or not, and the other is skeleton scale prediction to regress the scale of each skeleton pixel. Our method achieves promising results on two skeleton extraction datasets, and significantly outperforms other competitors.

The Algorithm:

The proposed network architecture for skeleton extraction, which is converted from VGG16 net. Our network has 4 stages with Scale-associated Side-Output layers connected to the convolutional layers. Each of these SSOs can simutaneously detec object skeleton and regress the skeleton scales.

The proposed algorithm is inspired by a simple observation: a neuron can only detect skeleton with scale less than its receptive field. We designed SSO(Scale-associated Side-Output) to detect object skeleton with various scales at different convolution stages. Further more, we developed a muti-task learning paradigm to detect object skeleton and predict skeleton scale at the same time.

Figure below illustrates multi-task SSO at stage 2: the left blocks represent the skeleton detection SSO, right block represents the scale regression ScalePred-SSO. $a_{jk}^{(i)}$ indicates how likely pixel $j$ belongs to skeleton type $k$ at stage $i$, the skeleton types are defined according to their scales; $\hat{S}_j^i$ is predicted skeleton scale of pixel $j$ at stage $i$.

Performance Evaluation:

1. Skeleton detection

a): Qualitative illustration:

We show some detected results on SK-LARGE for several selected images, which shows our method outperform other competitors with a significat margin.

Illustration of skeleton extraction results on SK-LARGE for several selected images. The groundtruth skeletons are in yellow and the thresholded extraction results are in red. Thresholds were optimized over the whole dataset.

b): Quantitative evaluation:

We evaluate the skeleton detection results with the widely applied F-measure$=\frac{2 \cdot \text{Precision} \cdot\text{Recall}}{\text{Precision} + \text{Recall}}$

Performance comparison on SK-LARGE dataset, LMSDS is our newest method with multi-task multi side-output network mentioned in TIP paper, FSDS is the algorithm mentioned in our CVPR paper.

2. Object segmentation:

Once we get the detected skeleton and associated scale, we can easily recover object segmentation by drawing inscribed circles centered at skeleton point and using skeleton scale as circle radius, this procedure can be formulated as follows: $$ \text{seg}_i = \begin{cases} 1 \ \text{if} \ \text{distance}({x_i, \text{sk}_i}) < \hat{s}_i \\ 0 \ \text{others} \\ \end{cases} $$

Where $\text{sk}_i$ is the nearest skeleton point to pixel $x_i$, $\text{distance}(x_i, x_j)$ is the euclidean distance between two points, $\hat{s}_i$ is the predicted skeleton scale fo pixel $x_i$.

Recover object segmentation from detected skeleton and skeleton scale.

Figure below shows some object segmentation results on SK-LARGE dataset.

Illustration of object segmentation on SK-LARGE for sevaral selected images.


We've released full tool chain for skeleton detection and performance evaluation, you can access them on github:

How to use the code:

  1. Download SK-LARGE dataset and do data augmentation, refer to SK-LARGE for deteailed steps;
  2. Clone the source code: 'git clone', and customize your own Makefile.config to build the source (Do Not Use The Official Caffe because there are new implemented layers which offical caffe doesn't include), this code is based on a old version of caffe, hence you may have some problems building it, I suggest you to opt off CUDNN due to compatibility issues.
  3. Start your training by run '', make sure the augmented data is put in proper folder, see `train_val.prototxt' for details.

See FAQ for Installation and Compilation issues.

Data & Datasets:

1: Datasets:

To the best of our knowledge, the skeleton detection related datasets are listed as below:

Please consider to cite our paper if you use our released datasets.

2: PR-CURVE data

All the PR-curve data used to plot pr-curves in our paper are available here (md5:6bf393652023c8d260d44bdc6597e06).

3: Pretrained model

  1. Pretrained model on SK-LARGE dataset (MD5SUM: 0e29ddaba2af86feaa155097e50d54fa).
  2. Pretrained model on WH-SYMMAX dataset.


If our method is helpful to your research, please kindly consider to cite:
  title={Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs},
  author={Shen, Wei and Zhao, Kai and Jiang, Yuan and Wang, Yan and Zhang, Zhijiang and Bai, Xiang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  howpublished = "\url{}"
  title={DeepSkeleton: Learning Multi-task Scale-associated Deep Side Outputs for Object Skeleton Extraction in Natural Images},
  author={Shen, Wei and Zhao, Kai and Jiang, Yuan and Wang, Yan and Bai, Xiang and Yuille, Alan},
  journal={IEEE Transactions on Image Processing},
  howpublished = "\url{}"

Frequently Asked Questions:

  1. How to install and compile the code:

    The associated code is a modified version of official caffe with customized layers. The dependencies are the same with that of caffe, please refer to for how to install and compile. And I suggest you to compile the official caffe first. If you successfully compile the official caffe, you may easily compile this code. But if you fail to compile official caffe, you will definitely fail to compile neither.

  2. Unknown Layer Type: "ImageLabelmapDataLayer"

    You should use DeepSkeleton associated caffe rather than the official repository BVLC/caffe, 'ImageLabelmapDataLayer' is a customized layer which official caffe does not contain.

  3. How to evaluate the detected results?

    Use skeval to quantitatively evaluate the detections. It will benchmark the detections in terms of F-measure and precision-recall curve.