Jieneng Chen

Email: jchen293 [at] jh.edu

I'm a final-year Ph.D. candidate in Computer Science at Johns Hopkins University, advised by Prof. Alan L. Yuille and Prof. Rama Chellappa. I am also named a Siebel Scholar, the highest distinction for Ph.D in Bioengineering at JHU. I'm best known for my neural architecture TransUNet, with over 8,000 citations.

I am fascinated by how intelligence can operate in the real world. My research builds scalable, structured world models that connect artificial and natural intelligence, enabling new forms of reasoning and interaction across computer vision, robotics, and healthcare.

I love mentoring and teaching undergraduates—several of my mentees have been recognized with top CS research honors. I also teach Machine Imagination (EN.601.208) at Johns Hopkins in 2025/2026.


Email / CV / Google Scholar / Github / Linkedin

profile photo
Recent Awards
  • Siebel Scholar Award, Class 2025.
  • MICCAI 2025 Doctoral Thesis Award Runner-Up (1 of top 3 worldwide).
  • MICCAI 2025 Best Paper Award Runner-Up (top 0.1%).
  • KDD 2025 CCC Best Paper Award.
  • NVIDIA logo NVIDIA 2025 Academic Grant Award.
  • NSF Travel Award, CVPR 2025 Doctoral Consortium.
  • 2025 Visionary Award, LLM for Material Science.
  • An undergraduate mentee received an Honorable Mention for the CRA Outstanding Undergraduate Researcher Award. Congrats, Arda!
  • An undergraduate mentee won Michael J. Muuss Research Award and was a finalist (1 of 24 nationwide) for the CRA Outstanding Undergraduate Researcher Award. Congrats, TaiMing!

Research Areas

Over the next decade, my research aims to answer a central question: how can we bring intelligence into the real world to meaningfully benefit humanity?

This pursuit is structured across three pillars:

  • Building Foundation Neural Architectures to learn scalable representations from raw sensory data.
  • Establishing Predictive Visual Modeling grounded in human-like mental models to achieve closed-loop embodiment.
  • Developing Proactive Biomedical Systems via medical world models to reduce cancer mortality and enhance human life.

Predictive Modeling for Vision and Embodiment
Human-level 3D Mental Models
▸ Generative worlds
(GenEx, ICLR'25)
▸ 4D analysis-by-synthsis
(4D-Animal, WACV'26)
▸ 3D/4D spatial reasoning
(SpatialLLM, CVPR'25)
Closed-Loop Embodiment
Generation, perception, and action within physical world
(World-in-World, ICLR'26 under review)
Proactive Biomedical Systems
Scalable Early Diagnosis
▸ Scaling eight-major cancer AI
(CancerUnit, ICCV'23)
▸ Scaling cancer AI with reports
(R-Super, MICCAI'25)
Treatment Discovery
Personalized treatment
planning via simulation
(Medical World Model, ICCV'25)
Foundation Neural Architecture
Visual Dense Learning
TransUNet: the first scalable Transformer architecture that fuse global attention with U-Net's local comprehension.
Swin-Unet: upgrade TransUNet through pure attention.
TransFG: introduced novel part-level attention.
Multimodal Learning
▸ Visual Encoder (ViTamin, CVPR'24)
▸ Visual representation in language models (LLaVolta, NeurIPS'24)
Recent Projects

Full list on Google Scholar Profile. ☆ denotes visiting undergraduate / graduate mentees.

GenEx: Generating an Explorable World.

TaiMing Lu ☆, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen.

ICLR, 2025

Turn a single image into a 3D world adventure.

Embodied agents refine their beliefs by predicting unseen parts of the physical world.


World-in-World: World Models in a Closed-Loop World.

Jiahan Zhang*☆, Muqing Jiang*☆, Nanru Dai☆, TaiMing Lu☆, Arda Uzunoglu☆, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal Patel, Paul Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, Jieneng Chen.

Technical report, 2025.

Under ICLR 2026 OpenReview (top 1.3% in the initial round).

World models live and die by their closed-loop success, not flawless generated visuals.

Paper | Project | Leaderboard | Interactive Demo | Code

Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning.

Yijun Yang ☆, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan Yuille, Lei Zhu, Yu-Dong Zhang, Jieneng Chen.

ICCV, 2025.

Envision precision medicine via generative world modeling.

Paper | Code | Project

EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory.

Jiahao Wang*, Luoxin Ye*, TaiMing Lu, Junfei Xiao, Jiahan Zhang, Yuxiang Guo, Xijun Liu, Rama Chellappa, Cheng Peng, Alan Yuille, Jieneng Chen.

Technical report, 2025.

Bridge generative world models with 3D vision.

Paper | Code

4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos.
Shanshan Zhong ☆, Jiawei Peng, Zehan Zheng, Zhongzhan Huang, Wufei Ma, Guofeng Zhang, Qihao Liu, Alan Yuille, Jieneng Chen, 🐕 🐴

WACV, 2026
Paper | Code
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models.
Tiezheng Zhang, Yitong Li, Yu-Cheng Chou, Jieneng Chen, Alan Yuille, Chen Wei, Junfei Xiao.

NeurIPS, 2025.
Paper | Project | Code | HuggingFace Data Card
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models.
Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso Miguel de Melo, Jieneng Chen†, Alan Yuille†.

CVPR, Highlight (top 3%), 2025.
Paper camera ready | Code | HuggingFace Data Card
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models.
Wufei Ma, Luoxin Ye, Nessa McWeeney, Celso Miguel de Melo, Alan Yuille, Jieneng Chen.

CVPR, Highlight (top 3%), 2025.
Paper camera ready
LLaVolta: Efficient Large Multi-modal Models via Visual Context Compression.
Jieneng Chen, Luoxin Ye, Ju He, Zhaoyang Wang, Daniel Khashabi, Alan Yuille.

NeurIPS, 2024.
Paper | Code | Project
ViTamin: Designing Scalable Vision Models in the Vision-Language Era.
Jieneng Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen.

CVPR, 2024.
The first vision-centric design for LMM encoder, with SoTA performance on 60+ multimodal tasks in 2024.
Paper | Code | 🤗 HuggingFace | timm GitHub Stars Badge | open_clip GitHub Stars Badge
TransUNet: Rethinking the U-Net Architecture Design for Medical Image Segmentation through the Lens of Transformers.
Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, Matthew P Lungren, Shaoting Zhang, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou.

Medical Image Analysis (MedIA), 2024.

ICML-W 2021 | Journal | Code | GitHub Stars Badge
Top ScienceDirect downloaded article 🏆, published all time.
Top 15 cited 2021 paper in all AI fields, 6000 citations.
Talks
Teaching
  • Instructor: I designed and taught the undergraduate course Machine Imagination, EN.601.208, at JHU in 2025 and 2026 (starting Jan. 2026).
Service
  • Invited reviewers: CVPR, ICCV, ECCV, WACV, NeurIPS, ICML, ICLR, AAAI, IJCV, TPAMI, TMI, MICCAI and CogSci.
  • Workshop co-organizer for ICCV, CVPR and MICCAI.
  • JHU CS mentor hours.
  • Lecture for JHU WSE Pre-College Program 2025.
Mentoring

I am fortunate to have collaborated with super talented students at JHU.

Acknowledgement
    My doctoral research was made possible through the generous support of ARL, IARPA, NSF, NIH, ONR, Lambda, NVIDIA, Johns Hopkins University, the Siebel Foundation, the Patrick J. McGovern Foundation, and the Lustgarten Foundation. I am deeply grateful for the resources provided to me and my advisors.