Xiaokang Chen / 陈小康
Researcher at DeepSeek-AI
pkucxk@pku.edu.cn

Xiaokang Chen

I am currently a researcher at DeepSeek AI. I obtained my Ph.D degree at Peking University (PKU) in 2024, supervised by Professor Gang Zeng. Before that, I received my Bachelor’s degree at Peking University in July 2019.


My research interests are in Computer Vision and Multi-Modal Learning, including Visual Pretraining, Scene Understanding (Detection and Segmentation), and Multi-Modal Large Language Models.


🔥🔥🔥 Highlighted projects: Unified Multimodal understanding and generation: [🔥 Janus-Pro]; Multimodal Large Language Models: [DeepSeek-VL2]

-->
News
  • 2025.01   Release Janus-Pro for unified multimodal understanding and generation, advanced version of previous work Janus. Code and model are available!
  • 2024.12   Release DeepSeek-VL2, a Mixture-of-Experts based Vision-Language Models. Code and model are available!
  • 2024.10   Release Janus, a simple, unified and flexible model for multimodal understanding and generation. Code and model are available!
  • 2024.05   I successfully defended my Ph.D thesis!
  • 2024.04   One paper on DETR distillation is accepted by IJCAI 2024.
  • 2024.01   One paper on Responsible AI (BiasNeuron) is accepted by ICLR 2024.
  • 2023.09   Our Multi-Modal Large Language Model VisionLLM and a work on Responsible AI (CodeBias) are accepted by NeurIPS 2023.
  • 2023.07   Our Group DETR and NeRF2Mesh are accepted by ICCV 2023. Code is released.
  • 2022.09   Our Compressible-Composable NeRF (CC-NeRF) is accepted by NeurIPS 2022. Code is available here.
  • 2022.09   I am selected as the representative of freshmen to participate in the symposium held by Peking University.
  • 2022.07   One paper on Instance Mesh Reconstruction (DIMR) is accepted by ECCV 2022. Code is available here.
  • 2022.02   Please check our CAE, a novel MIM approach for self-supervised learning.
  • 2021.12   Joined Baidu as a research intern.
  • 2021.12   One paper is accepted by AAAI 2022!
  • 2021.07   Our Conditional DETR is accepted by ICCV 2021! Code is available here.
  • 2021.07   I am selected as "Top 10 Outstanding Researcher" (学术十杰), EECS of Peking University.
  • 2021.07   One paper is accepted by ACM MM 2021!
  • 2021.06   I have released the code and data for our CVPR 2021 paper CPS. Please check here.
  • 2021.03   One paper is accepted by CVPR 2021!
  • 2020.07   One paper is accepted by ECCV 2020!
  • 2020.06   Joined MSRA as a research intern.
  • 2020.05   One paper is accepted by ICIP 2020!
  • 2020.03   One paper is accepted by CVPR 2020!

Representative Works ( Google Scholar)

🚀 Janus-Series: Unified Multimodal Understanding and Generation Models
Project lead and core contributor. Work done at DeepSeek AI.
Technical Report 2025
[Paper: Janus-Pro] [Paper: Janus (CVPR 2025)] [Paper: JanusFlow (CVPR 2025)]
[🔥 Code (16k stars)] [Huggingface Model] [Online Demo]
[🔥 Twitter] [机器之心] [量子位] [新智元]
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Core contributor.
Technical Report 2025
[Paper] [Code] [官方介绍]
CAE: Context Autoencoder for Self-Supervised Representation Learning
Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang
International Journal of Computer Vision (IJCV), 2023
[Paper] [Code] [Code2] [中文解读]
Conditional DETR for Fast Training Convergence
Xiaokang Chen*, Depu Meng*, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun and Jingdong Wang (*: Equal Contribution)
International Conference on Computer Vision (ICCV), 2021
[Paper] [Code] [中文解读]
CPS: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
Xiaokang Chen, Yuhui Yuan, Gang Zeng and Jingdong Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Paper] [Code] [Poster] [Slides] [Video (YouTube)] [中文解读]
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao and Jifeng Dai
(*: Equal Contribution)
Neural Information Processing Systems (NeurIPS), 2023
[Paper] [Code] [Demo]

Other Publications

2024

D3ETR: Decoder Distillation for Detection Transformer
Xiaokang Chen, Jiahui Chen, Yan Liu, Jiaxiang Tang and Gang Zeng
International Joint Conference on Artificial Intelligence (IJCAI), 2024
[Paper]
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng and Ziwei Liu
Arxiv Preprint, 2024
[Paper] [Code] [Project Page]
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models
Yan Liu, Yu Liu, Xiaokang Chen, Pin-Yu Chen, Daoguang Zan, Min-Yen Kan and Tsung-Yi Ho

International Conference on Learning Representations (ICLR), 2024

Improving Long Text Understanding with Knowledge Distilled from Summarization Model
Yan Liu, Yazheng Yang, Xiaokang Chen
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
[Paper]

2023

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
Qiang Chen*, Xiaokang Chen*, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Gang Zeng and Jingdong Wang (*: Equal Contribution)
International Conference on Computer Vision (ICCV), 2023
[Paper] [中文解读]
Uncovering and Quantifying Social Biases in Code Generation
Yan Liu, Xiaokang Chen 💌, Yan Gao, Zhe Su, Fengji Zhang, Daoguang Zan, Jian-Guang LOU, Pin-Yu Chen, Tsung-Ti Ho (💌: Corresponding author)
Neural Information Processing Systems (NeurIPS), 2023
[Paper]
Parallel Sentence-Level Explanation Generation For Real-World Low-Resource Scenarios
Yan Liu, Xiaokang Chen, and Qi Dai
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
[Paper]
Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement
Jiaxiang Tang, Hang Zhou, Xiaokang Chen, Tianshu Hu, Errui Ding, Jingdong Wang and Gang Zeng
International Conference on Computer Vision (ICCV), 2023
[Paper] [Code]
Uncovering and Categorizing Social Biases in Text-to-SQL
Yan Liu, Yan Gao, Zhe Su, Xiaokang Chen, Elliott Ash and Jian-Guang LOU
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
[Paper]
Understanding Self-Supervised Pretraining with Part-Aware Representation Learning
Jie Zhu*, Jiyang Qi*, Mingyu Ding*, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang and Jingdong Wang
Transactions on Machine Learning Research (TMLR), 2023
[Paper] [Code]
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
Transactions on Machine Learning Research (TMLR), 2023
[Paper]

2022

Not All Voxels Are Equal: Semantic Scene Completion from the Point-Voxel Perspective
Xiaokang Chen, Jiaxiang Tang, Jingbo Wang and Gang Zeng
AAAI Conference on Artificial Intelligence (AAAI), 2022
[Paper]
Compressible-composable NeRF via Rank-residual Decomposition
Jiaxiang Tang, Xiaokang Chen, Jingbo Wang and Gang Zeng
Neural Information Processing Systems (NeurIPS), 2022
[Paper] [Code]
Point Scene Understanding via Disentangled Instance Mesh Reconstruction
Jiaxiang Tang, Xiaokang Chen, Jingbo Wang and Gang Zeng
European Conference on Computer Vision (ECCV), 2022
[Paper]
MaskGroup: Hierarchical Point Grouping and Masking for 3D Instance Segmentation
Min Zhong, Xinghao Chen, Xiaokang Chen, Gang Zeng, Yunhe Wang
IEEE International Conference on Multimedia and Expo (ICME), 2022
[Paper]

2021

Joint Implicit Image Function for Guided Depth Super-Resolution
Jiaxiang Tang, Xiaokang Chen and Gang Zeng
ACM Multimedia (ACM MM), 2021
[Paper] [Code]

2020

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation
Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian, Hongsheng Li, and Gang Zeng
European Conference on Computer Vision (ECCV), 2020
[Paper] [Code]
3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior
Xiaokang Chen, Kwan-Yee Lin, Chen Qian, Gang Zeng and Hongsheng Li
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
[Paper] [Code] [Supplementary Material] [Demo Video]
Real-time Semantic Scene Completion Via Feature Aggregation and Conditioned Prediction
Xiaokang Chen, Yajie Xing and Gang Zeng
International Conference on Image Processing (ICIP), 2020
[Paper]

2019

2.5D Convolution for RGB-D Semantic Segmentation
Yajie Xing, Jingbo Wang, Xiaokang Chen and Gang Zeng
International Conference on Image Processing (ICIP), 2019
Coupling Two-Stream RGB-D Semantic Segmentation Network by Idempotent Mappings
Yajie Xing, Jingbo Wang, Xiaokang Chen and Gang Zeng
International Conference on Image Processing (ICIP), 2019

Preprints

Interactive Segment Anything NeRF with Feature Imitation
Xiaokang Chen*, Jiaxiang Tang*, Diwen Wan, Jingbo Wang and Gang Zeng
(*: Equal Contribution, Xiaokang is the project leader)
Arxiv Preprint, 2023
[Paper] [Project Page]
Conditional DETR V2: Efficient Detection Transformer with Box Queries
Xiaokang Chen, Fangyun Wei, Gang Zeng and Jingdong Wang
Arxiv Preprint, 2022
[Paper]
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Qiang Chen, Jian Wang, Chuchu Han, Shan Zhang, Zexian Li, Xiaokang Chen, Jiahui Chen, Xiaodi Wang, Shuming Han, Gang Zhang, Haocheng Feng, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
Arxiv Preprint, 2022 64.5 mAP on COCO test set!
[Paper]

Education

  • [2019.09-2024.07]   Phd. student at Key Laboratory of Perception (MoE), School of AI, Peking University.
  • [2015.09-2019.07]   Bachelor of Science at School of EECS, Peking University.

Experiences

Selected Honors

  • Outstanding Graduate, Peking University, 2024
  • Top Minds Program (Highest-Tier, Huawei), Ali-Star (Alibaba), RED-Star (RED), Qingyun Plan (Tencent), Beidou Program (Meituan), 2023
  • National Scholarship, (Ministry of Education, People's Republic of China), 2021, 2022, 2023
  • Merit Student of Peking University, PKU, 2020, 2021, 2022, 2023
  • Award for Academic Innovation, PKU, 2021
  • Top 10 Outstanding Researcher (学术十杰), EECS of Peking University, 2021
  • Huawei Scholarship, PKU, 2021
  • Schlumberger Scholarship, PKU, 2020
  • Award for Excellent Research, PKU, 2019
  • Award for Academic Excellents, PKU, 2018

Invited Talks

  • Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. Hosted by Huawei, 2023.11
  • Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision. Hosted by Microsoft Research, 2021.05

Academic Activities

  • Conference reviewer of: CVPR (2022,2023,2024), ECCV (2022,2024), ICCV (2021,2023), NeurIPS (2022,2023,2024), ICML (2022), AAAI (2022,2023)
  • Journal reviewer of: IJCV (2021,2022,2023), TPAMI (2021,2023), TIP (2022), TCSVT (2022), Neurocomputing (2022), CVIU (2022).