Pre-training without Natural Images


Asian Conference on Computer Vision (ACCV) 2020
Best Paper Honorable Mention Award
Oral Presentation, The paper got 3 strong accepts


Hirokatsu Kataoka1Kazushige Okayasu1,2   Asato Matsumoto1,3   Eisuke Yamagata4
Ryosuke Yamada1,2Nakamasa Inoue4Akio Nakamura2Yutaka Satoh1,3
1: AIST   2: TDU   3: Univ. of Tsukuba   4: TITech

Paper Code Dataset Oral Poster Supp. Mat. Related Work


Abstract

Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the use of automatically generated images instead of natural images in the pre-training phase allows us to generate an infinite scale dataset of labeled images. Although the models pre-trained with the proposed Fractal DataBase (FractalDB), a database without natural images, does not necessarily outperform models pre-trained with human annotated datasets at all settings, we are able to partially surpass the accuracy of ImageNet/Places pre-trained models. The image representation with the proposed FractalDB captures a unique feature in the visualization of convolutional layers and attentions.


Framework

Proposed pre-training without natural images based on fractals, which is a natural formula existing in the real world (Formula-driven Supervised Learning). We automatically generate a large-scale labeled image dataset based on an iterated function system (IFS). (Bottom-left image) The pre-training framework with Fractal geometry for feature representation learning. We can enhance natural image recognition by pre-training without natural images. (Bottom-right image) Accuracy transition among ImageNet-1k, FractalDB-1k and training from scratch.





Experimental Results

We compared Scratch from random parameters, Places-30/365, ImageNet-100/1k (ILSVRC’12), and FractalDB-1k/10k in the following table. Since our implementation is not completely the same as a representative learning configuration, we implemented the framework fairly with the same parameters and compared the proposed method (FractalDB-1k/10k) with a baseline (Scratch, DeepCluster-10k, Places-30/365, and ImageNet-100/1k). The proposed FractalDB pre-trained model recorded several good performance rates. We respectively describe them by comparing our Formula-driven Supervised Learning with Scratch, Self-supervised and Supervised Learning.



Visual Results

The figures show the activation of the 1st convolutional layer on ResNet-50 at each pre-training model.

Citation

@inproceedings{KataokaACCV2020,
 author = {Kataoka, Hirokatsu and Okayasu, Kazushige and Matsumoto, Asato and Yamagata, Eisuke and Yamada, Ryosuke and Inoue, Nakamasa and Nakamura, Akio and Satoh, Yutaka},
 title = {Pre-training without Natural Images},
 journal = {Asian Conference on Computer Vision (ACCV)},
 year = {2020}
}

Dataset Download



Acknowledgement