Wenhai Wang (王文海)
Affiliation: MMLab, The Chinese University of Hong Kong
Address: Room 703, Ho Sin Hang Engineering Building, The Chinese University of Hong Kong, Shatin, N.T. Hong Kong
Email: wangwenhai362[at]{163.com, gmail.com}, whwang[at]ie.cuhk.edu.hk

About Me ([GitHub] [Google Scholar])

I am currently a Postdoctoral Researcher at MMLab, The Chinese University of Hong Kong, and also collaborated with Prof. Jifeng Dai and Prof. Yu Qiao at Shanghai AI Laboratory.

Previously, I obtained the Ph.D. degree from Department of Computer Science and Technology, Nanjing University (NJU) in 2021. My academic supervisor is Prof. Tong Lu. I received my B.E degree from Nanjing University of Science and Technology (NUST) in 2016. I work very close with my friends Dr. Enze Xie and Prof. Xiang Li. I was fortunate to work with Prof. Ping Luo and Prof. Chunhua Shen.

My recent works are mainly on:

News

Experience

Recent Works ([Full List])

(* Equal contribution, † Interns, # Corresponding authors)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai#
CVPR, 2024
[Paper] [Code] [中文解读] [BibTex]
Scaling up the ViT to 6B parameters and aligning it with LLM.
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen#, Yu Qiao, Jifeng Dai, Wenhai Wang#
Technical Report, 2023
[Paper] [Code] [BibTex]
Searching a good solution on tool-resource graphs.
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang*, Jiangwei Xie*, ChuanYang Hu*, Haoming Zou*, Jianan Fan*, Wenwen Tong*, Yang Wen*, Silei Wu*, Hanming Deng*, Zhiqi Li*, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai#
Technical Report, 2023
[Paper] [Code] [BibTex]
Multi-Modal Large Language Models can be a good driver.
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang*, Min Shi*, Qingyun Li*, Wenhai Wang*, Zhenhang Huang*, Linjie Xing*, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai#, Yu Qiao
ICLR, 2024
[Paper] [Code] [BibTex]
Recognizing and understanding all things in open world.
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai#
NeurIPS, 2023
[Paper] [Code] [BibTex]
Now you can customize vision tasks just like language tasks.

Selected Works ([Full List])

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang*, Jifeng Dai*, Zhe Chen*†, Zhenhang Huang* Zhiqi Li*†, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#
CVPR, 2023 (Highlight Paper (2.5%))
[Paper] [Code] [BibTex]
A strong large-scale CNN-based fondamention model.
Vision Transformer Adapter for Dense Predictions
Zhe Chen*†, Yuchen Duan*†, Wenhai Wang#, Junjun He, Tong Lu#, Jifeng Dai, Yu Qiao
ICLR, 2023 (Spotlight Paper (8.0%))
[Paper] [Code] [BibTex]
We design a ViT adapter for dense prediction tasks.
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Zhiqi Li*†, Wenhai Wang*, Hongyang Li*, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai#
ECCV, 2022
[Paper] [Code] [BibTex]
[ECCV 2022' Top-10 Influential Papers]
[100 Most Cited AI Papers in 2022]
A versatile camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang#, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
CVMJ, 2021 (ESI Highly Cited Paper (1%), ESI Hot Paper (0.1%))
[Paper] [Code] [中文解读] [Report] [Talk] [BibTex]
[CNKI's Academic Essentials]
[CVMJ 2022 Honorable Mention Award]
A better PVT.
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan#, Kaitao Song, Ding Liang, Tong Lu#, Ping Luo, Ling Shao
ICCV, 2021 (Oral Presentation (3.4%))
[Paper] [Code] [中译版] [中文解读] [Report] [Talk] [BibTex]
[ICCV21' Top-10 Influential Papers]
A pure Transformer backbone for dense prediction, such as object detection and semantic segmentation.
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond
Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo#
TPAMI, 2021
[Paper] [Code] [BibTex]
[CVPR 2020 Top-10 Influential Papers]
We extend PolarMask (CVPR 2020 Oral Presentation (5.7%)) to several instance-level detection tasks.
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text
Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu#, Chunhua Shen
TPAMI, 2021
[Paper] [Code1] [Code2] [BibTex]
We extend PSENet (CVPR 2019) and PAN (ICCV 2019) to a text spotting system.

Honors and Awards

Invited Talk

Academic Services

Workshop (Co-)Organizer
Associate Editor
Senior Program Committee Member
Journal Reviewer
Program Committee Member/Conference Reviewer