Shuaiyi Huang

I'm a PhD student in the department of Computer Science at University of Maryland, College Park, advised by Prof. Abhinav Shrivastava.

I obtained my M.S. degree in Computer Science from ShanghaiTech University in 2020. I received my B.Eng. degree in Software Engineering from Tongji University in 2017.

My research interests lie in Computer Vision and Autonomous Agents, focusing on solving problems with limited or noisy supervision. I aim to enable AI agents to understand the visual world better and to develop multi-modal AI systems that integrate vision, language, and action understanding.

Email  /  CV  /  Scholar  /  LinkedIn  /  Github

profile photo

Publications

TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
Shuaiyi Huang, Mara Levy, Anubhav Gupta, Daniel Ekpo, Ruijie Zheng, Abhinav Shrivastava
Under Review, 2024

Preference feedback collected by human or VLM annotators is often noisy, presenting a significant challenge for preference-based reinforcement learning. To address this challenge, we propose TREND, a novel framework that integrates few-shot expert demonstrations with a tri-teaching strategy for effective noise mitigation.

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Ruijie Zheng*, Yongyuan Liang*, Shuaiyi Huang, Jianfeng Gao, Hal Daumé III, Andrey Kolobov, Furong Huang, Jianwei Yang
Under Review, 2024

In this work, we introduce visual trace prompting, a simple yet effective approach to facilitate Vision Language Action Models' spatial- temporal awareness for action prediction. visually.

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
Xiyang Wu*, Tianrui Guan*, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, Tianyi Zhou, Dinesh Manocha
EMNLP Findings, 2024
arxiv / code

A benchmark framework that automatically generates hallucination cases in Vision-Language models to evaluate their robustness and accuracy.

ARDuP: Active Region Video Diffusion for Universal Policies
Shuaiyi Huang, Mara Levy, Zhenyu Jiang, Anima Anandkumar, Yuke Zhu, Linxi Fan, De-An Huang, Abhinav Shrivastava
IROS, 2024   (Oral Presentation)
arxiv / code

We propose a novel method for universal policy learning via active region video diffusion models, focusing on task-critical regions in videos.

What is Point Supervision Worth in Video Instance Segmentation?
Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar
CVPR Workshop on Learning With Limited Labelled Data for Image and Video Understanding, 2024
arxiv

This work explores the impact of point-level supervision in the context of video instance segmentation, offering insights into its effectiveness.

UVIS: Unsupervised Video Instance Segmentation
Shuaiyi Huang, Saksham Suri, Kamal Gupta, Sai Saketh Rambhatla, Ser-nam Lim, Abhinav Shrivastava
CVPR Workshop on Learning With Limited Labelled Data for Image and Video Understanding, 2024
arxiv

We propose an unsupervised approach for video instance segmentation, leveraging self-supervised learning methods to improve object instance tracking and segmentation across video frames.

Towards Scalable Neural Representation for Diverse Videos
Bo He, Xitong Yang, Hanyu Wang, Zuxuan Wu, Hao Chen, Shuaiyi Huang,
Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava
CVPR, 2023
project page / arxiv / code

We propose D-NeRV, a novel implicit neural representation based framework designed to encode large-scale and diverse videos. It achieves state-of-the-art performances on video compression.

Learning Semantic Correspondence with Sparse Annotations
Shuaiyi Huang, Luyu Yang, Bo He, Songyang Zhang, Xuming He, Abhinav Shrivastava
ECCV, 2022
project page / arxiv / code

We address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations. We first propose a teacher-student learning paradigm for generating dense pseudo-labels and then develop two novel strategies for denoising pseudo-labels.

Confidence-aware Adversarial Learning for Self-supervised Semantic Matching
Shuaiyi Huang, Qiuyue Wang, Xuming He
PRCV, 2020
arxiv / code

This paper explores a confidence-aware adversarial learning framework to enhance self-supervised semantic matching with improved robustness and accuracy.

Dehazing Evaluation: Real-World Benchmark Datasets, Criteria, and Baselines
Shiyu Zhao, Lin Zhang, Shuaiyi Huang, Ying Shen, Shengjie Zhao
TIP, 2020
paper / code

This work presents real-world benchmark datasets, evaluation criteria, and baseline approaches for assessing dehazing methods in image processing.

Dynamic Context Correspondence Network for Semantic Alignment
Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, Xuming He
ICCV, 2019
arxiv / code

We introduce a Dynamic Context Correspondence Network (DCCN) to improve semantic alignment by leveraging dynamic feature contexts across images.

Evaluation of Defogging: A Real-world Benchmark Dataset, A New Criterion and Baselines
Shiyu Zhao, Lin Zhang, Shuaiyi Huang, Ying Shen, Shengjie Zhao, Yukai Yang
ICME, 2019
paper / code

This paper provides a real-world benchmark dataset for defogging, a new evaluation criterion, and baseline approaches to assess defogging techniques.

Structured Attentions for Visual Question Answering
Chen Zhu, Yanpeng Zhao, Shuaiyi Huang, Kewei Tu, Yi Ma
ICCV, 2017
paper / code

This work proposes structured attention mechanisms for visual question answering, enabling more precise reasoning over complex visual scenes.

Services

  • Reviewers: CVPR, ECCV, ICCV, IJCV, TIP, WACV, ACCV, ICRA

Experiences


Thank Dr. Jon Barron for sharing the source code of his personal page.