Saving data by adding visual knowledge priors to Deep Learning

Call for papers

We invite researchers to submit their recent work on data-efficient computer vision.

Accepted works are included in the program.

Present a poster

We invite researchers to present their recent papers on data-efficient computer vision.

Accepted posters are included in the program.

VIPriors challenges

Including data-efficient action recognition, classification, detection and segmentation.

Final rankings are out now.

Watch the recording

Tweets by VIPriorsECCV

Thanks to all for attending

We enjoyed two interesting workshop sessions at ECCV 2020. We thank all presenters for their efforts and all participants for their attention. This website will keep a record of all presented materials. We hope to see you all next year (venue TBD) for the next workshop!

About the workshop

This workshop focuses on how to pre-wire deep networks with generic visual inductive innate knowledge structures, which allows to incorporate hard won existing generic knowledge from physics such as light reflection or geometry. Visual inductive priors are data efficient: What is built-in no longer has to be learned, saving valuable training data.

Data is fueling deep learning. Data is costly to gather and expensive to annotate. Training on massive datasets has a huge energy consumption adding to our carbon footprint. In addition, there are only a select few deep learning behemoths which have billions of data points and thousands of expensive deep learning hardware GPUs at their disposal. This workshop aims beyond the few very large companies to the long tail of smaller companies and universities with smaller datasets and smaller hardware clusters. We focus on data efficiency through visual inductive priors.

Excellent recent research investigates data efficiency in deep networks by exploiting other data sources such as unsupervised learning, re-using existing datasets, or synthesizing artificial training data. Not enough attention is given on how to overcome the data dependency by adding prior knowledge to deep nets. As a consequence, all knowledge has to be (re-)learned implicitly from data, making deep networks hard to understand black boxes which are susceptible to dataset bias requiring huge data and compute resources. This workshop aims to remedy this gap by investigating how to flexibly pre-wire deep networks with generic visual innate knowledge structures, which allows to incorporate hard won existing knowledge from physics such as light reflection or geometry.

The great power of deep neural networks is their incredible flexibility to learn. The direct consequence of such power, is that small datasets can simply be memorized and the network will likely not generalize to unseen data. Regularization aims to prevent such over-fitting by adding constraints to the learning process. Much work is done on regularization of internal network properties and architectures. In this workshop we focus on regularization methods based on innate priors. There is strong evidence that an innate prior benefits deep nets: Adding convolution to deep networks yields a convolutional deep neural network (CNN) which is hugely successful and has permeated the entire field. While convolution was initially applied on images, it is now generalized to graph networks, speech, language, 3D data, video, etc. Convolution models translation invariance in images: an object may occur anywhere in the image, and thus instead of learning parameters at each location in the image, convolution allows to only consider local relations, yet, share parameters over all image locations, and thus saving a huge number of parameters to learn, allowing a strong reduction in the number of examples to learn from. This workshop aims to further the great success of convolution, exploiting innate regularizing structures yielding a significant reduction of training data.

Workshop program

Our live program featured a panel discussion with our invited speakers, as well as playback of recorded talks for all presentations and live Q&A. All keynotes, papers and presentations were made available through the ECCV Workshops and Tutorials website.

Time (UTC+1)
8:00 / 18:00	Keynote session	Panel discussion with invited speakers + Q&A
8:40 / 18:40	Break
8:45 / 18:45	Oral session	Oral presentations
9:10 / 19:10		Q&A
9:25 / 19:25	Challenges	Awards & winners presentations
9:35 / 19:35	Poster session	Poster presentations
9:45 / 19:45		Q&A (for posters & challenges)
9:50 / 19:50		External poster presentations
9:55 / 19:55	Closing

Oral session

Lightweight Action Recognition in Compressed Videos.
Yuqi Huo, Xiaoli Xu, Yao Lu, Yulei Niu, Mingyu Ding, Zhiwu Lu, Tao Xiang, Ji-Rong Wen
On sparse connectivity, adversarial robustness, and a novel model of the artificial neuron.
Sergey Bochkanov
Injecting Prior Knowledge into Image Caption Generation.
Arushi Goel, Basura Fernando, Thanh-Son Nguyen, Hakan Bilen
Learning Temporally Invariant and Localizable Features via Data Augmentation for Video Recognition.
Taeoh Kim, Hyeongmin Lee, MyeongAh Cho, Hoseong Lee, Dong heon Cho, Sangyoun Lee
Unsupervised Learning of Video Representations via Dense Trajectory Clustering.
Pavel Tokmakov, Martial Hebert, Cordelia Schmid

Poster session

Distilling Visual Priors from Self-Supervised Learning.
Bingchen Zhao, Xin Wen
Unsupervised Image Classification for Deep Representation Learning.
Weijie Chen, Shiliang Pu, Di Xie, Shicai Yang, Yilu Guo, Luojun Lin
TDMPNet: Prototype Network with Recurrent Top-Down Modulation for Robust Object Classification under Partial Occlusion.
Mingqing Xiao, Adam Kortylewski, Ruihai Wu, Siyuan Qiao, Wei Shen, Alan Yuille
What leads to generalization of object proposals?.
Rui Wang, Dhruv Mahajan, Vignesh Ramanathan
A Self-Supervised Framework for Human Instance Segmentation.
Yalong Jiang, Wenrui Ding, Hongguang Li, Hua Yang, Xu Wang
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering.
Tuong Do, Binh Nguyen, Huy Tran, Erman Tjiputra, Quang Tran, Thanh Toan Do
A visual inductive priors framework for data-efficient image classification.
Pengfei Sun, Xuan Jin, Wei Su, Yuan He, Hui Xue’, Quan Lu

External poster session

Select to Better Learn: Fast and Accurate Deep Learning using Data Selection from Nonlinear Manifolds.
Mohsen Joneidi, Saeed Vahidian, Ashkan Esmaeili, Weijia Wang, Nazanin Rahnavard, Bill Lin, and Mubarak Shah
Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion.
Adam Kortylewski, Ju He, Qing Liu, Alan Yuille
On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location.
Osman Semih Kayhan, Jan C. van Gemert

VIPriors Challenges

We presented the “Visual Inductive Priors for Data-Efficient Computer Vision” challenges. We offered four challenges, where models are to be trained from scratch, and we reduced the number of training samples to a fraction of the full set.

Please see the challenges page for the results of the challenges.