📢 News

  • 2025/11/08 [AAAI’26] 4 papers got accepted for AAAI 2026! They included training-free MLLM inference acceleration methods FiCoCo and GlobalCom2, dexterous grasping policy AffordDex, and tiny-scale VLA VLA-Adapter.
  • 2025/10/14 [Preprint] We released RoboSimGS, a novel Real2Sim2Real framework that converts multi-view real-world images into scalable, highfidelity, and physically interactive simulation environments for robotic manipulation! An overview video can be found in Project page!
  • 2025/09/19 [NeurIPS’25] SSR got accepted for NeurIPS 2025! The work transforms raw depth data into structured, interpretable textual CoT, enhancing spatial reasoning capabilities of MLLMs. See Project page and Github!
  • 2025/09/12 [Preprint] We released VLA-Adapter, which reduces reliance on large-scale VLMs and extensive pre-training by using a lightweight Policy module with Bridge Attention, achieving state-of-the-art performance and fast inference speed with minimal computational resources! Checkpoint has been available! See Project page for more details. Got #1 Paper of the day on huggingface papers! 2025/11/08 VLA-Adapter got accepted for AAAI 2026 Oral!
  • 2025/08/13 [Preprint] We released AffordDex, a universal grasping policy for dexterous hands with an inherent understanding of both motion priors and object affordances! Grasping videos can be found in Project page! 2025/11/08 AffordDex got accepted for AAAI 2026!
  • 2025/08/08 [DAMO RynnBot] We open-sourced RynnEC: a video MLLM for embodied cognition tasks, RynnVLA-001: a VLA model based on pretrained video generation model, RynnRCP: a complete set of robot service agreements and frameworks! 2025/08/11 We released the technical blog for RynnVLA-001! 2025/09/19 We released the technical report for RynnVLA-001!
  • 2025/08/02 [CoRL’25] Long-VLA, a novel framework designed to enhance VLA models for challenging long-horizon robotic manipulation tasks, got accepted for CoRL 2025!
  • 2025/07/24 [DAMO RynnBot] We released RynnBot PlayGround Beta, a platform that provides data management, SOTA VLA models, model training and validation, cloud-edge collaborative deployment, and so on! Welcome to follow our continuous progress!
  • 2025/06/27 [Preprint] We released WorldVLA, an autoregressive action world model that unifies action and image understanding and generation! Code has been available!
  • 2025/06/26 [ICCV’25] CARP, Coarse-to-fine AutoRegressive Prediction for visuomotor policy learning, got accepted for ICCV 2025! The approach produces highly accurate and smooth robot actions, achieving up to a 10% improvement of success rates, and delivers 10x faster inference compared to state-of-the-art policies. Paper, code and cool videos can be found in Project page!
  • 2025/05/22 [Preprint] We released VARD, a novel RL fine-tuning method on diffusion-based generative models for both protein structure and text-to-image synthesis, enhancing sample quality with improved efficiency, effective mitigation of reward hacking, and broad applicability.
  • 2025/05/07 [Preprint] We released OpenHelix, a low-cost open-source dual-system VLA with systematic empirical evaluations on the core design elements. Code and List of papers have been available!
  • 2025/03/31 [Preprint] We released Unicorn to explore the question: can high-quality multimodal training data be synthesized purely from text?
  • 2025/03/28 [Survey Preprint] We released Exploring the Evolution of Physics Cognition in Video Generation: A Survey, which dives deep into the development of physics cognition in video generation, from basic perception to active cognition! List of papers has been available!
  • 2025/03/11 [TCSVT’25] M2IST, a novel Multi-Modal Interactive Side-Tuning method that effectively addresses the challenges of insufficient multi-modal interaction and high GPU memory consumption, got accepted for IEEE Transactions on Circuits and Systems for Video Technology! Code has been available!
  • 2025/02/24 [Preprint] We released Humanoid-VLA, a novel framework that integrates language understanding, egocentric scene perception, and motion control, enabling universal humanoid control!
  • 2025/01/28 [ICRA’25] QUART-Online, a novel latency-free quadruped MLLM model that achieves real-time inference while boosting the success rate across various tasks by 65%, got accepted for ICRA 2025! See Project page.
  • 2025/01/23 [ICLR’25] ToCa, a token-wise feature caching method that achieves a 2x acceleration for PixArt-α, OpenSora, and DiT while maintaining nearly lossless generation quality, got accepted for ICLR 2025! Code has been available!
  • 2025/01/10 [Preprint] We released GlobalCom2, a “global-to-local” approach for training-free acceleration of high-resolution MLLMs with AnyRes strategy. Code has been available! 2025/11/08 GlobalCom2 got accepted for AAAI 2026!