Wenhai Wang

Wenhai Wang (王文海)
Affiliation: MMLab, The Chinese University of Hong Kong
Address: Room 703, Ho Sin Hang Engineering Building, The Chinese University of Hong Kong, Shatin, N.T. Hong Kong
Email: wangwenhai362[at]{163.com, gmail.com}, whwang[at]ie.cuhk.edu.hk

About Me ([GitHub] [Google Scholar])

I am currently a Postdoctoral Researcher at MMLab, The Chinese University of Hong Kong, and also collaborated with Prof. Jifeng Dai and Prof. Yu Qiao at Shanghai AI Laboratory.

Previously, I obtained the Ph.D. degree from Department of Computer Science and Technology, Nanjing University (NJU) in 2021. My academic supervisor is Prof. Tong Lu. I received my B.E degree from Nanjing University of Science and Technology (NUST) in 2016. I work very close with my friends Dr. Enze Xie and Prof. Xiang Li. I was fortunate to work with Prof. Ping Luo and Prof. Chunhua Shen.

My recent works are mainly on:

Large-Scale Foundation Models
Object Detection & Segmentation
Autonomous Driving Perception
Optical Character Recognition

News

2023/10: I am selected as World’s Top 2% Scientists by Stanford University.
2023/10: PVT v2 is selected as one paper of CNKI's Academic Essentials Database.
2023/09: InternImage is selected as one of CVPR 2023 Top-10 Influential Papers.

2023/08: PVT wins the 2023 World Artificial Intelligence Conference Youth Outstanding Paper Award.
2023/08: GFL is selected as one of ESI Highly Cited Papers (1%).
2023/06: UniAD wins the Best Paper Award of CVPR 2023.
2023/05: BEVFormer is mentioned in NVIDIA Keynote at COMPUTEX 2023 (1:31:48).
2023/04: PVT v2 wins CVMJ 2022 Honorable Mention Award.
2023/04: SegFormer is included in the NVIDIA AI Models Recap 2022, and is commented as visionary research for world-class image control.
2023/01: BEVFormer is selected as one of ECCV 2022 Top-10 Influential Papers and the 100 most cited AI papers in 2022.
2023/01: PVT v2 is selected as one of ESI Highly Cited Paper (1%) and ESI Hot Papers (0.1%).
2022/06: Our team wins the champion of Waymo 2022 3D Camera-Only Detection Task (15,000 USD Bonus).
2022/06: I wins the Outstanding Doctoral Thesis Award of Nanjing University.
2022/04: I am selected as one of TechBeat 2022 Most Popular Speakers.
2022/02: PVT is selected as one of ICCV 2021 Top-10 Influential Papers (Rank 2), and SegFormer is selected as one of NeurIPS 2021 Top-10 Influential Papers (Rank 3).
2021/02: PolarMask is selected as one of CVPR 2020 Top-10 Influential Papers.
2020/12: Our team wins the champion of NAIC 2020 Remote Sensing Semantic Segmentation Task (1,000,000 RMB bonus).

Experience

2023/04 - Present: Postdoctoral Researcher at MMLab, The Chinese University of Hong Kong, led by Prof. Xiaoou Tang
2021/09 - 2023/04: Research Scientist at Shanghai AI Laboratory, collaborated with Prof. Jifeng Dai and Prof. Yu Qiao
2019/10 - 2020/03: Research Assistant at HKU-MMLab, The University of Hong Kong (HKU), led by Prof. Ping Luo
2019/08 - 2020/03: Research Intern at SenseTime Group Limited, led by Xuebo Liu and Ding Liang
2018/06 - 2018/12: Research Intern at Momenta, led by Prof. Xiang Li

Recent Works ([Full List])

(* Equal contribution, † Interns, # Corresponding authors)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai#
CVPR, 2024
[Paper] [Code] [中文解读]

[BibTex]
Scaling up the ViT to 6B parameters and aligning it with LLM.

ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen#, Yu Qiao, Jifeng Dai, Wenhai Wang#
Technical Report, 2023
[Paper] [Code]

[BibTex]
Searching a good solution on tool-resource graphs.

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang*, Jiangwei Xie*, ChuanYang Hu*, Haoming Zou*, Jianan Fan*, Wenwen Tong*, Yang Wen*, Silei Wu*, Hanming Deng*, Zhiqi Li*, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai#
Technical Report, 2023
[Paper] [Code]

[BibTex]
Multi-Modal Large Language Models can be a good driver.

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang*, Min Shi*, Qingyun Li*, Wenhai Wang*, Zhenhang Huang*, Linjie Xing*, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai#, Yu Qiao
ICLR, 2024
[Paper] [Code]

[BibTex]
Recognizing and understanding all things in open world.

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai#
NeurIPS, 2023
[Paper] [Code]

[BibTex]
Now you can customize vision tasks just like language tasks.

Selected Works ([Full List])

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang*, Jifeng Dai*, Zhe Chen*†, Zhenhang Huang* Zhiqi Li*†, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#
CVPR, 2023 (Highlight Paper (2.5%))
[Paper] [Code]

[BibTex]
A strong large-scale CNN-based fondamention model.

Vision Transformer Adapter for Dense Predictions
Zhe Chen*†, Yuchen Duan*†, Wenhai Wang#, Junjun He, Tong Lu#, Jifeng Dai, Yu Qiao
ICLR, 2023 (Spotlight Paper (8.0%))
[Paper] [Code]

[BibTex]
We design a ViT adapter for dense prediction tasks.

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Zhiqi Li*†, Wenhai Wang*, Hongyang Li*, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai#
ECCV, 2022
[Paper] [Code]

[BibTex]
[ECCV 2022' Top-10 Influential Papers]
[100 Most Cited AI Papers in 2022]
A versatile camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.

PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang#, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
CVMJ, 2021 (ESI Highly Cited Paper (1%), ESI Hot Paper (0.1%))
[Paper] [Code]

[中文解读] [Report] [Talk] [BibTex]
[CNKI's Academic Essentials]
[CVMJ 2022 Honorable Mention Award]
A better PVT.

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan#, Kaitao Song, Ding Liang, Tong Lu#, Ping Luo, Ling Shao
ICCV, 2021 (Oral Presentation (3.4%))
[Paper] [Code]

[中译版] [中文解读] [Report] [Talk] [BibTex]
[ICCV21' Top-10 Influential Papers]
A pure Transformer backbone for dense prediction, such as object detection and semantic segmentation.

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond
Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo#
TPAMI, 2021
[Paper] [Code]

[BibTex]
[CVPR 2020 Top-10 Influential Papers]
We extend PolarMask (CVPR 2020 Oral Presentation (5.7%)) to several instance-level detection tasks.

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text
Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu#, Chunhua Shen
TPAMI, 2021
[Paper] [Code1]

[Code2]

[BibTex]
We extend PSENet (CVPR 2019) and PAN (ICCV 2019) to a text spotting system.

Honors and Awards

2023/12: 2023 CSIG Excellent Doctoral Dissertation Award Honorable Mention Award
2023/08: 2023 World Artificial Intelligence Conference Youth Outstanding Paper Award
2023/06: CVPR 2023 Best Paper Award
2023/04: CVMJ 2022 Best Paper Honorable Mention Award
2022/06: Waymo 2022 3D Camera-Only Detection Task, 1st Place (15,000 USD Bonus)
2022/06: Outstanding Doctoral Thesis Award, Nanjing University
2022/04: Most Popular Speakers in TechBeat 2022
2021/04: Outstanding Graduate, Nanjing University
2020/12: National Artificial Intelligence Challenge (NAIC) 2020, Remote Sensing Semantic Segmentation Task, 1st Place (1,000,000 RMB Bonus)
2019/12: China National Scholarship
2019/09: ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text, Task1, 1st Place
2019/09: ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling, Task1, 2nd Place
2018/12: AI Challenger 2018 Autonomous Driving Perception Task, 2nd Place (40,000 RMB Bonus)
2015/11: ACM-ICPC Asia Regional Contest, Silver Medal

Invited Talk

2023/10: Preliminary Study on "Large-Scale Visual Foundation Model + LLM in Open-World Application", PRCV Talk.
2023/08: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, WAIC Youth Outstanding Paper Award Talk
2023/07-08: Preliminary Study on "Large-Scale Visual Foundation Model + LLM", Zhidx/Huawei Noah's Ark Lab/Tencent Youtu Lab/Fudan University Talk
2023/06: Study and Application of Large-scale Foundation Models in Open World Tasks, VALSE Talk
2023/05: InternImage: A Large-Scale Generic Vision Model, SenseTime Talk
2022/11-12: Study and Application of Multi-Task Generic Perception Model, AITIME (2:31:30)/Tsinghua University Talk
2022/07: Transformer-based Vision Perception, ChinaMM Talk
2021/07: Application of Transformer in Detection and Segmentation Tasks, TechBeat Talk

Academic Services

Workshop (Co-)Organizer

Vision and Language Collision: Synergy between Language Model and Vision Ecology (视言碰撞：语言模型与视觉生态协同) at PRCV 2023
Challenges and Opportunities of Large Models for CV/PR (大模型对CV/PR的挑战与机会) at VALSE 2023

Associate Editor

Visual Intelligence

Senior Program Committee Member

International Joint Conference on Artificial Intelligence (IJCAI), 2021

Journal Reviewer

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
International Journal of Computer Vision (IJCV)
IEEE Transactions on Image Processing (TIP)
IEEE Transactions on Multimedia (TMM)
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
Computational Visual Media Journal (CVMJ)
Pattern Recognition (PR)

Program Committee Member/Conference Reviewer

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2021, 2022, 2023
Neural Information Processing Systems (NeurIPS), 2020, 2021, 2023
International Conference on Machine Learning (ICML), 2021, 2022
International Conference on Learning Representations (ICLR), 2021
IEEE International Conference on Computer Vision (ICCV), 2021
European Conference on Computer Vision (ECCV), 2022
AAAI Conference on Artificial Intelligence (AAAI), 2022
International Joint Conference on Artificial Intelligence (IJCAI), 2022
IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
Asian Conference on Computer Vision (ACCV), 2020