Wenhai Wang (王文海)
Affiliation: MMLab, The Chinese University of Hong Kong
Address: Room 703, Ho Sin Hang Engineering Building, The Chinese University of Hong Kong, Shatin, N.T. Hong Kong
Email: wangwenhai362[at]{163.com, gmail.com}, whwang[at]ie.cuhk.edu.hk

About Me ([GitHub] [Google Scholar])

I am currently a Postdoctoral Researcher at MMLab, The Chinese University of Hong Kong, and also collaborated with Prof. Jifeng Dai and Prof. Yu Qiao at Shanghai AI Laboratory.

Previously, I obtained the Ph.D. degree from Department of Computer Science and Technology, Nanjing University (NJU) in 2021. My academic supervisor is Prof. Tong Lu. I received my B.E degree from Nanjing University of Science and Technology (NUST) in 2016. I work very close with my friends Dr. Enze Xie and Prof. Xiang Li. I was fortunate to work with Prof. Ping Luo and Prof. Chunhua Shen.

My recent works are mainly on:

News

Experience

Recent Works ([Full List])

(* Equal contribution, † Interns, # Corresponding authors)
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang*, Min Shi*, Qingyun Li*, Wenhai Wang*, Zhenhang Huang*, Linjie Xing*, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai#, Yu Qiao
Technical Report, 2023
[Paper] [Code] [Demo of AS-1B] [Demo of ASM] [BibTex]
Recognizing and understanding all things in open world.
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai#
Technical Report, 2023
[Paper] [Code] [BibTex]
Now you can customize vision tasks just like language tasks.
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Zhaoyang Liu*, Yinan He*, Wenhai Wang*, Weiyun Wang*, Yi Wang*, Shoufa Chen*, Qinglong Zhang*, Yang Yang*, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, Limin Wang, Ping Luo, Jifeng Dai, Yu Qiao
Technical Report, 2023
[Paper] [Code] [Demo] [BibTex]
InternGPT allows you to interact with ChatGPT by clicking, dragging and drawing using a pointing device.
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang*, Jifeng Dai*, Zhe Chen*†, Zhenhang Huang* Zhiqi Li*†, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#
CVPR, 2023 (Highlight Paper (2.5%))
[Paper] [Code] [BibTex]
A strong large-scale CNN-based fondamention model.
Vision Transformer Adapter for Dense Predictions
Zhe Chen*†, Yuchen Duan*†, Wenhai Wang#, Junjun He, Tong Lu#, Jifeng Dai, Yu Qiao
ICLR, 2023 (Spotlight Paper (8.0%))
[Paper] [Code] [BibTex]
We design a ViT adapter for dense prediction tasks.

Selected Works ([Full List])

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Zhiqi Li*†, Wenhai Wang*, Hongyang Li*, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai#
ECCV, 2022
[Paper] [Code] [BibTex]
[ECCV 2022' Top-10 Influential Papers]
[100 Most Cited AI Papers in 2022]
A versatile camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang#, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
CVMJ, 2021 (ESI Highly Cited Paper (1%))
[Paper] [Code] [中文解读] [Report] [Talk] [BibTex]
[CVMJ 2022 Honorable Mention Award]
A better PVT.
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan#, Kaitao Song, Ding Liang, Tong Lu#, Ping Luo, Ling Shao
ICCV, 2021 (Oral Presentation (3.4%))
[Paper] [Code] [中译版] [中文解读] [Report] [Talk] [BibTex]
[ICCV21' Top-10 Influential Papers]
A pure Transformer backbone for dense prediction, such as object detection and semantic segmentation.
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond
Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo#
TPAMI, 2021
[Paper] [Code] [BibTex]
[CVPR 2020 Top-10 Influential Papers]
We extend PolarMask (CVPR 2020 Oral Presentation (5.7%)) to several instance-level detection tasks.
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text
Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu#, Chunhua Shen
TPAMI, 2021
[Paper] [Code1] [Code2] [BibTex]
We extend PSENet (CVPR 2019) and PAN (ICCV 2019) to a text spotting system.

Honors and Awards

Invited Talk

Academic Services

Workshop (Co-)Organizer
Senior Program Committee Member
Journal Reviewer
Program Committee Member/Conference Reviewer