🔬 Research Interests

Currently, My research has centered on multi-modal large models, especially vision-language models (VLMs), including

Generation/AIGC: text-to-image/video (T2I/V) generation^ADI,SimM, customized & controllable generation^ADI,SimM, test-time diffusion intervention^{SimM,VGDiffZero}, multi-modal large language models (MLLMs)^{Cobra, PiTe}
Understanding: text-video retrieval (TVR)^VoP, compositional zero-shot learning (CZSL)^Troika, few-shot learning (FSL)^AGAM,HTS, visual grounding^{VGDiffZero,DARA}
Transfer: parameter-efficient fine-tuning (PEFT/PETL)^{VoP,DARA,Sparse-Tuning}, meta-learning^MRN, domain adaptation^PDA
Embodied AI: vision-language-action models (VLAs)^QUAR-VLA, foundation models for robotics

🤝 I am always looking for related collaborations, and most of them have produced top-level publications. Feel free to drop me an email if you are interested!

Siteng Huang (黄思腾)

🔬 Research Interests