Abstract:
In this paper(arXiv:1609.03659) we present a noval fully convolutional network with multiple scale-associated side outputs to address the skeleton detection problem. By observing the relationship between the receptive field sizes of the different layers in the network and the skeleton scales they can capture, we introduce two scale-associated side outputs to each stage of the network. The network is trained by multi-task learning, where one task is skeleton localization to classify whether a pixel is a skeleton pixel or not, and the other is skeleton scale prediction to regress the scale of each skeleton pixel. Our method achieves promising results on two skeleton extraction datasets, and significantly outperforms other competitors.
The Algorithm:
The proposed algorithm is inspired by a simple observation: a neuron can only detect skeleton with scale less than its receptive field. We designed SSO(Scale-associated Side-Output) to detect object skeleton with various scales at different convolution stages. Further more, we developed a muti-task learning paradigm to detect object skeleton and predict skeleton scale at the same time.
Figure below illustrates multi-task SSO at stage 2: the left blocks represent the skeleton detection SSO, right block represents the scale regression ScalePred-SSO. $a_{jk}^{(i)}$ indicates how likely pixel $j$ belongs to skeleton type $k$ at stage $i$, the skeleton types are defined according to their scales; $\hat{S}_j^i$ is predicted skeleton scale of pixel $j$ at stage $i$.
Performance Evaluation:
1. Skeleton detection
a): Qualitative illustration:
We show some detected results on SK-LARGE for several selected images, which shows our method outperform other competitors with a significat margin.b): Quantitative evaluation:
We evaluate the skeleton detection results with the widely applied F-measure$=\frac{2 \cdot \text{Precision} \cdot\text{Recall}}{\text{Precision} + \text{Recall}}$2. Object segmentation:
Once we get the detected skeleton and associated scale, we can easily recover object segmentation by drawing inscribed circles centered at skeleton point and using skeleton scale as circle radius, this procedure can be formulated as follows: $$ \text{seg}_i = \begin{cases} 1 \ \text{if} \ \text{distance}({x_i, \text{sk}_i}) < \hat{s}_i \\ 0 \ \text{others} \\ \end{cases} $$Where $\text{sk}_i$ is the nearest skeleton point to pixel $x_i$, $\text{distance}(x_i, x_j)$ is the euclidean distance between two points, $\hat{s}_i$ is the predicted skeleton scale fo pixel $x_i$.
Figure below shows some object segmentation results on SK-LARGE dataset.
Code:
We've released full tool chain for skeleton detection and performance evaluation, you can access them on github:- DeepSkeleton: Skeleton detection(both FSDS and LMSDS)
- skeval: Evaluation protocol.
How to use the code:
- Download SK-LARGE dataset and do data augmentation, refer to SK-LARGE for deteailed steps;
- Clone the source code: 'git clone https://github.com/zeakey/DeepSkeleton', and customize your own Makefile.config to build the source (Do Not Use The Official Caffe because there are new implemented layers which offical caffe doesn't include), this code is based on a old version of caffe, hence you may have some problems building it, I suggest you to opt off CUDNN due to compatibility issues.
- Start your training by run 'solve.py', make sure the augmented data is put in proper folder, see `train_val.prototxt' for details.
See FAQ for Installation and Compilation issues.
Data & Datasets:
1: Datasets:
To the best of our knowledge, the skeleton detection related datasets are listed as below:
- Our released SK-LARGE and SK-SMALL;
- SymPASCAL (arXiv1703.02243) which is selected from VOC2011, the original SymPASCAL deesn't contain edge ground-truth therefore hard to get the scale of skeleton, here is our extended version which has extra edge ground-truth;
- WH-SYMMAX dataset, you can find this dataset originally from Wei Shen's homepage, or alternatively directly download;
- *BMAX500 dataset mentioned in arXiv1703.08626;
- *SYMMAX300 dataset, Github;
Please consider to cite our paper if you use our released datasets.
2: PR-CURVE data
All the PR-curve data used to plot pr-curves in our paper are available here (md5:6bf393652023c8d260d44bdc6597e06).
3: Pretrained model
- Pretrained model on SK-LARGE dataset (MD5SUM: 0e29ddaba2af86feaa155097e50d54fa).
- Pretrained model on WH-SYMMAX dataset.
Citation:
If our method is helpful to your research, please kindly consider to cite:@InProceedings{shen2016object,
title={Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs},
author={Shen, Wei and Zhao, Kai and Jiang, Yuan and Wang, Yan and Zhang, Zhijiang and Bai, Xiang},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2016},
pages={222-230},
publisher={IEEE},
howpublished = "\url{http://kaizhao.net/deepsk}"
}
@article{shen2017deepskeleton,
title={DeepSkeleton: Learning Multi-task Scale-associated Deep Side Outputs for Object Skeleton Extraction in Natural Images},
author={Shen, Wei and Zhao, Kai and Jiang, Yuan and Wang, Yan and Bai, Xiang and Yuille, Alan},
journal={IEEE Transactions on Image Processing},
volume={26},
number={11},
pages={5298-5311},
year={2017},
publisher={IEEE},
howpublished = "\url{http://kaizhao.net/deepsk}"
}
Frequently Asked Questions:
-
How to install and compile the code:
The associated code is a modified version of official caffe with customized layers. The dependencies are the same with that of caffe, please refer to http://caffe.berkeleyvision.org/installation.html for how to install and compile. And I suggest you to compile the official caffe first. If you successfully compile the official caffe, you may easily compile this code. But if you fail to compile official caffe, you will definitely fail to compile neither.
-
Unknown Layer Type: "ImageLabelmapDataLayer"
You should use DeepSkeleton associated caffe rather than the official repository BVLC/caffe, 'ImageLabelmapDataLayer' is a customized layer which official caffe does not contain.
-
How to evaluate the detected results?
Use skeval to quantitatively evaluate the detections. It will benchmark the detections in terms of F-measure and precision-recall curve.