Jieneng Chen

Jieneng Chen
pronounce Jieneng: jee-eh-nung
email: jienengc [at] stanford.edu

I am a postdoctoral researcher at Stanford SVL. I received my PhD in Computer Science at Johns Hopkins University in 2026, working with Alan L. Yuille and Rama Chellappa.

I study intelligence in the physical world by structuring raw observations into spatial code. My works laid the foundation, spanning vision encoders and world models. I am best known for TransUNet, which unifies local and global context and has over 10,000 citations.

I am a Siebel scholar for contributions to bioengineering, a Kempner fellow, and a recipient of several best paper and young investigator awards.

Email · CV · Scholar · Github · LinkedIn

News

New paper: Thinking with Spatial Code — new SoTA on VSI-bench for 3D/4D reasoning.
Co-organizing the 1st CVPR 2026 workshop on World Models Meet Active Sensing and Embodied Planning. Try our benchmark World-in-World (ICLR 2026 Oral).
Co-organizing the 4th CVPR 2026 workshop on Generative Models for Computer Vision.
Research opportunities: feel free to email me. I host students through CCVL as well as SVL.

Awards and Honors

Siebel Scholar Award, 2025
MICCAI Best Paper Award (runner-up, 2 / 1,027 accepted papers), 2025
MICCAI Doctoral Consortium Thesis Award, 2025
Young Investigator Best Paper Award — KDD Health Day and CCC, 2025
Visionary Award, Large Language Model Hackason for Material Science, 2025
CVPR Doctoral Consortium, 2025
JHU Provost Thesis Award, 2026
RSNA Certificate of Merit Award (16 / 1,951), 2025
Kempner Research Fellowship, 2026
NVIDIA 2025 Academic Grant, 2025
DAAD AInet Fellowship, 2022
#1 most downloaded article on ScienceDirect; among the most cited in MedIA, 2026
#1 most cited among all ECCV publications in past five years (Google Metrics), 2026

Recent Projects

Publications from the past two years. Full list on Google Scholar.

Research Highlights

Closed-loop world model — generation, perception & action in the physical world.

Turn a single image into an explorable 3D world. Agents navigate generated environments.

Object-centric causal spatial reasoning benchmark with physics-aware world model evaluation.

Medical World Model — generative tumor evolution simulation for personalised treatment planning.

CVPR'25 Highlight

Compound 3D-informed design for spatially-intelligent large multimodal models.

Closed-loop world model — generation, perception & action in the physical world.

Turn a single image into an explorable 3D world. Agents navigate generated environments.

Object-centric causal spatial reasoning benchmark with physics-aware world model evaluation.

Medical World Model — generative tumor evolution simulation for personalised treatment planning.

CVPR'25 Highlight

Compound 3D-informed design for spatially-intelligent large multimodal models.

Freely reconstructing animatable 3D animals from monocular video.

	World-in-World: World Models in a Closed-Loop World Jiahan Zhang, Muqing Jiang, Nanru Dai, TaiMing Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal Patel, Paul Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, Jieneng Chen. ICLR, 2026. Oral (top 1%). World models live and die by their closed-loop success, not flawless generated visuals. Paper \| OpenReview \| Project \| Leaderboard \| Demo \| Code
	Fast Generative DeOcclusion for Visual Geometry and Robotics Jieneng Chen, Tiezheng Zhang, Xiwei Xuan, Ju He, Yifan Yin, Haojun Shi, Suyu Ye, Xinyi Li, Ruisheng Yuan, Tianmin Shu, Alan Yuille. CVPR, Findings, 2026
	GenEx: Generating an Explorable World TaiMing Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen. ICLR, 2025. Turn a single image into a 3D world adventure. Embodied agents refine beliefs by predicting unseen parts of the physical world. JHU News \| Paper \| Blog \| Project \| Code
	CausalSpatial: A Comprehensive Benchmark for Object-Centric Causal Spatial Reasoning Wenxin Ma, Chenlong Wang, Ruisheng Yuan, Hao Chen, Nanru Dai, S. Kevin Zhou, Yijun Yang, Alan Yuille, Jieneng Chen. ICLR* Workshop on World Models, 2026. A benchmark and early causal world model for grounded causal reasoning in space. Paper \| HuggingFace Dataset
	EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory Jiahao Wang, Luoxin Ye, TaiMing Lu, Junfei Xiao, Jiahan Zhang, Yuxiang Guo, Xijun Liu, Rama Chellappa, Cheng Peng, Alan Yuille, Jieneng Chen. ICLR Workshop on World Models, 2026. Bridge generative world models with 3D vision. Paper \| Code
	Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning Yijun Yang, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan Yuille, Lei Zhu, Yu-Dong Zhang, Jieneng Chen. ICCV, 2025. Precision medicine via generative world modeling. Paper \| Code \| Project
	Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso Miguel de Melo, Jieneng Chen†, Alan Yuille†. CVPR, 2025. Highlight. Paper \| Code \| HuggingFace
	SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models Wufei Ma, Luoxin Ye, Nessa McWeeney, Celso Miguel de Melo, Alan Yuille, Jieneng Chen. CVPR, 2025. Highlight. Paper
	4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos Shanshan Zhong, Jiawei Peng, Zehan Zheng, Zhongzhan Huang, Wufei Ma, Guofeng Zhang, Qihao Liu, Alan Yuille, Jieneng Chen. WACV, 2026. Paper \| Code
	VM-Gait: Multi-Modal 3D Representation Based on Virtual Marker for Gait Recognition Zhao-Yang Wang, Jiang Liu, Jieneng Chen, Rama Chellappa. WACV, 2025. Paper
	ViTamin: Designing Scalable Vision Models in the Vision-Language Era Jieneng Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen. CVPR, 2024. First vision-centric encoder design for LMMs; SoTA on 60+ benchmarks in 2024. Paper \| Code \| HuggingFace \| timm \| open_clip
	LLaVolta: Efficient Large Multi-modal Models via Visual Context Compression Jieneng Chen, Luoxin Ye, Ju He, Zhaoyang Wang, Daniel Khashabi, Alan Yuille. NeurIPS, 2024. Paper \| Code \| Project
	TransUNet: Rethinking the U-Net Architecture Design for Medical Image Segmentation through the Lens of Transformers Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, Matthew P Lungren, Shaoting Zhang, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou. Medical Image Analysis, Oct 2024. Top-15 cited 2021 paper in all AI fields. Most downloaded on ScienceDirect. Most cited in MedIA. ICML-W 2021 \| Journal \| Code

Talks

Invited talk at NSF IAIFI on physics & AI, Boston.
Lab seminar at Stanford.
Vision seminar at UIUC.
Guest lecture at Rice University.
Lab seminars at Harvard / MIT / HMS / MGH / Dana-Farber.
Talk at ICLR 2025 Workshop on Embodied Intelligence with LLMs in Open City Environment (slides).
Talks at JHU: ChemBE, Cognitive Science, CLSP, MINDS, AIEM.

Teaching

Instructor: Machine Imagination (EN.601.208), JHU, 2025 & 2026.

Service

Reviewer: CVPR, ICCV, ECCV, WACV, NeurIPS, ICML, ICLR, AAAI, IJCV, TPAMI, TMI, MICCAI, CogSci.
Workshop co-organizer: ICCV, CVPR, MICCAI.
JHU CS mentor hours.
Lecture for JHU WSE Pre-College Program 2025.

Mentoring

I am fortunate to have collaborated with talented students at JHU.

TaiMing Lu, JHU Undergraduate → Princeton CS PhD
1 publication on GenEx. Michael J. Muuss Research Award; finalist for CRA Outstanding Undergraduate Researcher Award.
Shanshan Zhong, SYSU MS → CMU LTI PhD
1 publication on 4D-Animal.

Acknowledgement

My doctoral research was made possible through the generous support of ARL, IARPA, NSF, NIH, ONR, Lambda, NVIDIA, Google Cloud, JHU, Stanford, Harvard, the Siebel Foundation, the Patrick J. McGovern Foundation, and the Lustgarten Foundation.