Wei Zhan serves as a Co-Director of Berkeley DeepDrive, one of the leading research centers in the field of AI for autonomy and mobility involving many Berkeley faculty and industrial partners. He is an Assistant Professional Researcher at UC Berkeley leading a team of Ph.D. students and Postdocs conducting research. His research is focused on AI for autonomous systems leveraging control, robotics, computer vision and machine learning techniques to tackle challenges with sophisticated dynamics, interactive human behavior and complex scenes in a scalable way. He also teaches AI for Autonomy at UC Berkeley.

He is also Chief Scientist of Applied Intuition, leading AI research efforts towards next-generation autonomy and its development toolchain. He is actively hiring Research Scientists, Research Engineers and Research Interns.

He received his Ph.D. degree from UC Berkeley. His publications received the Best Student Paper Award in IV’18, Best Paper Award – Honorable Mention of IEEE Robotics and Automation Letters, and Best Paper Award of ICRA’24. One of his publications also got ICLR’23 notable top 5% oral presentation. He led the construction of INTERACTION dataset and the organization of its prediction challenges in NeurIPS’20 and ICCV’21.

Research

Current

Policy Customization and Multi-Agent Reinforcement Learning

  • Residual Q-Learning – offline and online policy customization without value: NeurIPS’23, arxiv, Website, Code
  • Residual-MPPI – Online Policy Customization for Continuous Control: ICLR’25, arxiv, Website
  • Multi-agent RL cost-efficient generalization: RLC’24, arxiv

Generative Data Synthesis and Generalizable Gaussian

  • X-Drive – Cross-modality Consistent Data Generation with Diffusion: ICLR’25, arxiv
  • DrivingRecon – Feed-forward 4D Gaussian generation: arxiv
  • PixelGaussian – Generalizable feed-forward Gaussian: arxiv

3D Surface Reconstruction with Unsupervised Decomposition

  • DeSiRe-GS – 4D Gaussians for Decomposition and Mesh: CVPR’25, arxiv
  • Q-SLAM – quadric representations for monocular SLAM: CoRL’24, RA-Letters ’23
  • S3 Gaussian – Self-Supervised Street Gaussian: arxiv, Code

Interaction-Aware 3D Generation and Manipulation

  • CompGS – Compositional Text-to-3D Gaussians: CVPR’25, arxiv
  • DexHandDiff – Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation: CVPR’25, arxiv
  • PhyGrasp – generalizing grasping with physics-informed large multimodal models: arxiv, Website

Cross-Embodiment and Generalization for Robot Learning

  • Representation learning from general human demonstrations to robot manipulation: RSS’24, arxiv, Website
  • Open X-Embodiment – Robotic Learning Datasets and RT-X Models: ICRA’24 (Best Paper Award), arxiv, Blog, Dataset, Website, Code
  • Embodiment-Agnostic Action Planning: ICRA’25, arxiv

Efficient Diffusion Policy for Robot Learning and Autonomy

  • Efficient Diffusion Models for Prediction and Controllable Generation: ECCV’24, arxiv
  • Sparse Diffusion Policy – A Sparse, Reusable, and Flexible Policy for Robot Learning with Mixture of Experts (MoE): CoRL’24, arxiv

Language Modality and Reasoning for Autonomy

  • WOMD-Reasoning – A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning: arxiv, Website
  • Code diagnosis and repair of motion planners by LLM: RA-Letters’24, arxiv
  • LanguageMPC – LLMs as decision makers: arxiv, Website

Autonomous Racing – Learning to Plan and Control at the Limits

Recent and Continued

3D Perception with Temporal, Multi-View, and Multi-Modal Fusion

Efficient, Automated Data Engine and Training Pipeline

Self-Supervised Learning for Differentiable Autonomy Stack

  • Cohere3D – temporal coherence for self-supervision of perception, prediction and planning: ICRA’25, arxiv
  • Prediction with synthetic data pretraining: IROS’24, arxiv
  • PreTraM – self-supervision connecting trajectory and map: ECCV’22, arxiv, Code

Behavior and Scenario Generation for Closed-Loop Simulation

Past

INTERACTION Dataset and Benchmark

  • INTERACTION dataset with critical scenes and densely interactive behavior: Website, arxiv, IROS’19
  • INTERPRET challenge benchmarking conditional, multi-agent prediction: Website

LiDAR-based and 2D Perception

Generalizable, Multi-Agent, Interactive Prediction

Decision, Planning, Control and Behavior Design