🔬 Research Interests

Currently, My research has centered on multi-modal large models, especially vision-language models (VLMs), including

  • Generation/AIGC: text-to-image/video (T2I/V) generationADI,SimM, customized & controllable generationADI,SimM, test-time diffusion interventionSimM,VGDiffZero, multi-modal large language models (MLLMs)Cobra, PiTe
  • Understanding: text-video retrieval (TVR)VoP, compositional zero-shot learning (CZSL)Troika, few-shot learning (FSL)AGAM,HTS, visual groundingVGDiffZero,DARA
  • Transfer: parameter-efficient fine-tuning (PEFT/PETL)VoP,DARA,Sparse-Tuning, meta-learningMRN, domain adaptationPDA
  • Embodied AI: vision-language-action models (VLAs)QUAR-VLA, foundation models for robotics

🤝 I am always looking for related collaborations, and most of them have produced top-level publications. Feel free to drop me an email if you are interested!