Learning Semantic Correspondence with Sparse Annotations
ECCV 2022

1 University of Maryland, College Park   2 Shanghai AI Laboratory   3 ShanghaiTech University, China

Abstract

Finding dense semantic correspondence is a fundamental problem in computer vision, which remains challenging in complex scenes due to background clutter, extreme intra-class variation, and a severe lack of ground truth. In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations. To this end, we first propose a teacher-student learning paradigm for generating dense pseudo-labels and then develop two novel strategies for denoising pseudo-labels. In particular, we use spatial priors around the sparse annotations to suppress the noisy pseudo-labels. In addition, we introduce a loss-driven dynamic label selection strategy for label denoising. We instantiate our paradigm with two variants of learning strategies: a single offline teacher setting, and mutual online teachers setting. Our approach achieves notable improvements on three challenging benchmarks for semantic correspondence and establishes the new state-of-the-art. Project page: https://shuaiyihuang.github.io/publications/SCorrSAN.

Motivation

Due to the high cost of dense annotation, the semantic correspondence task only provides sparse keypoint annotations in the supervised setting. In this paper, we are motivated by how to better utilize the limited supervision. Specifically, we explore the techniques to generate pseudo-labels. However, due to the inevitably noisy effect of pseudo-labels, filtering out noisy pseudo-labels remains a challenging problem. Our key observation is that sparse keypoint annotations and their neighborhood encode rich semantic information. By utilizing this spatial prior, one can seek reliable pseudo-labels that are more likely in the foreground region of interest.

Method overview

We introduce a novel teacher-student learning paradigm to enrich the supervision guidance when only sparse annotations are available. Two key techniques are a novel spatial-prior based label filtering and a loss-driven dynamic label selection strategy for high-quality pseudo-label generation.

Model overview

We instantiate our novel learning strategy based on our proposed simple, yet effective network architecture for semantic correspondence without any transformer or 4D-conv for correlation refinement. The key ingredients are an efficient spatial context encoder and a high-resolution loss. The proposed network comprises three modules:

  • a feature extractor equipped with our efficient spatial context encoder
  • a parameter-free correlation map module
  • a flow estimator with high-resolution loss

Experiments

SPair-71k Dataset

PF-PASCAL Dataset & PF-WILLOW Dataset

Qualitative


Citation

@inproceedings{huang2022learning,
  title = {Learning Semantic Correspondence with Sparse Annotations},
  author = {Huang, Shuaiyi and Yang, Luyu and He, Bo and Zhang, Songyang and He, Xuming and Shrivastava, Abhinav},
  booktitle = {Proceedings of the European Conference on Computer Vision(ECCV)},
  year = {2022},
}

The website template was borrowed from Ben Mildenhall.