Siteng Huang 黄思腾

Hi! I am Siteng Huang (黄思腾 in Chinese). I am a fifth-year Ph.D. candidate at Zhejiang University, advised by Prof. Donglin Wang. Additionally, I am involved in a joint program with Westlake University as a member of Machine Intelligence Laboratory (MiLAB). Currently, I am also a research intern at Alibaba Group. Prior to my Ph.D. career, I received my B.Eng. Degree from School of Computer Science, Wuhan University in 2019.

Expected to obtain the Ph.D. degree in July, 2024.

Research Interests

Currently, My research has centered on multi-modal large models, especially vision-language models (VLMs), including

  • Generation/AIGC: text-to-image/video (T2I/V) generationADI,SimM, customized & controllable generationADI,SimM, test-time diffusion interventionSimM,VGDiffZero, multi-modal large language models (MLLMs)Cobra
  • Understanding: text-video retrieval (TVR)VoP, compositional zero-shot learning (CZSL)Troika, few-shot learning (FSL)AGAM,HTS, visual groundingVGDiffZero,DARA
  • Transfer: parameter-efficient fine-tuning (PEFT/PETL)VoP,DARA,Sparse-Tuning, meta-learningMRN, domain adaptationPDA
  • Embodied AI: foundation models for roboticsQUAR-VLA

I am always looking for related collaborations, and most of them have produced top-level publications. Feel free to drop me an email if you are interested!

News

  • [June 4, 2024] I have passed my Ph.D. defense. Thanks to all the members of the defense committee.
  • [May 5, 2024] Our Cobra was selected for VALSE 2024 Annual Progress Representation. Thanks to all the committee for the approval!
  • [March 29, 2024] Troika got accepted as VALSE 2024 Poster!
  • [March 21, 2024] Cobra, an efficient multi-modal large language model, was released. Project page has been available. The paper has been featured by Hugging Face Daily Papers! Demo has been available!
  • [March 13, 2024] One paper about parameter-efficient tuning for visual grounding got accepted for ICME 2024 (Oral).
  • [February 27, 2024] Awarded as Zhejiang University 2024 Outstanding Graduates!
  • [February 27, 2024] Three papers (ADI, Troika, SimM) as first/co-first author got accepted for CVPR 2024. Congratulations to all collaborators!
  • [December 13, 2023] The paper of VGDiffZero on diffusion model-based zero-shot visual grounding got accepted for ICASSP 2024. Congratulations to all collaborators!
  • [December 9, 2023] One paper on VLM-based unsupervised domain adaptation got accepted for AAAI 2024.
  • [July 24, 2023] 2023 Scholar Metrics was released by Google Scholar. Our paper “DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting” ranked 8th of the CIKM 2019 conference according to the citations, and 26th within five years.
  • [April 2, 2023] The paper of RL-CZSL about reference-limited compositional learning got accepted for ICMR 2023. Congratulations to all collaborators!
  • [February 28, 2023] The paper of VoP about parameter-efficient text-video retrieval got accepted for CVPR 2023. Congratulations to all collaborators!

Publications

Google Scholar †: Equal contribution

Peer-reviewed Conference

Ting Liu†, Xuyang Liu†, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu, "DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding". In Proceedings of the IEEE Conference on Multimedia Expo 2024. [arXiv] [github]

Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang, "Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024. [arXiv] [project page] [poster (CVPR 2024)]

Biao Gong†, Siteng Huang†, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu, "Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024. [arXiv] [project page] [poster (CVPR 2024)]

Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, Donglin Wang, "Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024. [arXiv] [project page] [github] [poster (CVPR 2024)] [poster (VALSE 2024)]

Xuyang Liu†, Siteng Huang†, Yachen Kang, Honggang Chen, Donglin Wang, "VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders". In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. [arXiv] [code] [poster]

Shuanghao Bai, Min Zhang, Wanqi Zhou, Siteng Huang, Zhirong Luan, Donglin Wang, Badong Chen, "Prompt-based Distribution Alignment for Unsupervised Domain Adaptation". In Proceedings of the 38th AAAI Conference on Artificial Intelligence. [arXiv]

Siteng Huang, Biao Gong, Yulin Pan, Jianwen Jiang, Yiliang Lv, Yuyuan Li, Donglin Wang, "VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval". In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023. [project page] [arXiv] [open access] [video (Youtube)] [github] [ModelScope] [poster] [slide]

Siteng Huang, Qiyao Wei, Donglin Wang, "Reference-Limited Compositional Zero-Shot Learning". In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. [project page] [arXiv] [video (Google Drive)] [github] [slide]

Min Zhang, Siteng Huang, Wenbin Li, Donglin Wang, "Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation". In Proceedings of the European Conference on Computer Vision 2022. [arXiv] [Chinese intro] [github]

Min Zhang, Siteng Huang, Donglin Wang, "Domain Generalized Few-shot Image Classification via Meta Regularization Network". In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. [pdf] [github]

Zifeng Zhuang, Xintao Xiang, Siteng Huang, Donglin Wang, "HINFShot: A Challenge Dataset for Few-Shot Node Classification in Heterogeneous Information Network". In Proceedings of the 2021 ACM International Conference on Multimedia Retrieval. [pdf]

Zhengyu Chen, Jixie Ge, Heshen Zhan, Siteng Huang, Donglin Wang, "Pareto Self-Supervised Training for Few-Shot Learning". In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [arXiv] [open access]

Siteng Huang, Min Zhang, Yachen Kang, Donglin Wang, "Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition". In Proceedings of the 35th AAAI Conference on Artificial Intelligence. [project page] [arXiv] [code] [poster] [slide]

Siteng Huang, Donglin Wang, Xuehan Wu, Ao Tang, "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting". In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. [project page] [pdf] [code] [poster] [slide]

Preprints & Under Submission

Ting Liu, Xuyang Liu, Liangtao Shi, Zunnan Xu, Siteng Huang, Yi Xin, Quanjun Yin, "Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference". arXiv preprint arXiv:2405.14700. [pdf] [github]

Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang, "Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference". arXiv preprint arXiv:2403.14520. [pdf] [project page] [Chinese intro (Zhihu)] [github] [demo] [video (Youtube)] [机器之心] [Twitter@AK]

Experience

Services

Journal Reviewer

  • IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
  • ACM Transactions on Intelligent Systems and Technology (ACM TIST)
  • Concurrency and Computation: Practice and Experience (CPE)

Conference Reviewer

  • IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • IEEE/CVF International Conference on Computer Vision (ICCV)
  • European Conference on Computer Vision (ECCV)
  • AAAI Conference on Artificial Intelligence (AAAI)
  • International Joint Conference on Artificial Intelligence (IJCAI)
  • IEEE International Conference on Multimedia and Expo (ICME)
  • ACM International Conference on Multimedia Retrieval (ICMR)
  • Asian Conference on Computer Vision (ACCV)
  • International Conference on Pattern Recognition (ICPR)

Program Committee for Conferences and Workshops

  • Session Chair, The First Westlake Robot Learning Symposium

Misc

Welcome to follow my Zhihu and Chinese blog.