Wenhai Wang (王文海)
Fundamental Vision Department, Shanghai AI Laboratory
Address: 701 Yunjin Road, Xuhui District, Shanghai, China
Email: wangwenhai362[at]{163.com, smail.nju.edu.cn} wangwenhai[at]pjlab.org.cn

About Me ([GitHub] [Google Scholar] [CV])

I am a Research Scientist at Shanghai AI Laboratory, led by Dr. Jifeng Dai and Prof. Yu Qiao

Previously, I obtained the Ph.D. degree from Department of Computer Science and Technology, Nanjing University (NJU) in 2021. My academic supervisor is Prof. Tong Lu. I received my M.S degree from Nanjing University (NJU) in 2018, and received my B.E degree from Nanjing University of Science and Technology (NUST) in 2016.
I work very close with my friends Enze Xie and Xiang Li. I was fortunate to work with Prof. Ping Luo and Prof. Chunhua Shen.

My recent works are mainly on:
  • CNN/Transformer Backbone
  • Object Detection & Semantic/Instance/Panoptic Segmentation
  • Vision-Language Model
  • Autonomous Driving Perception
  • Scene Text Detection & Recognition

The fundamental vision department at Shanghai AI Laboratory is now hiring. If you are interested in internship/researcher positions related to computer vision, please feel free to contact me through the email.



  • Sep. 2021 - Present, Research Scientist at Shanghai AI Laboratory, led by Dr. Jifeng Dai and Prof. Yu Qiao
  • Oct. 2019 - Mar. 2020, Research Assistant at the University of Hong Kong (HKU), led by Prof. Ping Luo
  • Aug. 2019 - Mar. 2020, Research Intern at SenseTime Group Limited, led by Xuebo Liu and Ding Liang
  • Jun. 2018 - Dec. 2018, Research Intern at Momenta, led by Xiang Li

Selected Publications ([Full List])

(* indicates equal contribution, # corresponding author)
PVTv2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang#, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
Technical Report, 2021
[Paper] [Code] [中文解读] [Report] [Talk] [BibTex]
A better PVT.
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan#, Kaitao Song, Ding Liang, Tong Lu#, Ping Luo, Ling Shao
in ICCV, 2021 (oral presentation)
[Paper] [Code] [中文解读] [Report] [Talk] [BibTex]
A pure Transformer backbone for dense prediction, such as object detection and semantic segmentation.
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text
Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu#, Chunhua Shen
TPAMI, 2021
[Paper] [Code] [BibTex]
We extend PSENet (CVPR'19) and PAN (ICCV'19) to a text spotting system.
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu#, Chunhua Shen, Ping Luo
in ECCV, 2020
[Paper] [Dataset] [Code] [BibTex]
We introduce linguistic information to eliminate the ambiguity in text detection.
Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
Wenhai Wang*, Enze Xie*, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu#, Gang Yu, Chunhua Shen
in ICCV, 2019
[Paper] [Poster] [Code] [BibTex]
We propose an efficient method for arbitrary-shaped text detection.
Shape Robust Text Detection with Progressive Scale Expansion Network
Wenhai Wang*, Enze Xie*, Xiang Li, Wenbo Hou, Tong Lu#, Gang Yu, Shuai Shao
in CVPR, 2019
[Paper] [Poster] [Code] [BibTex]
We proposed a segmentation-based text detector that can precisely detect text instances with arbitrary shapes.
Mixed Link Networks
Wenhai Wang*, Xiang Li*, Jian Yang#, Tong Lu#
in IJCAI, 2018
[Paper] [Poster] [Code] [BibTex]
We proposed an parameter-efficient convolutional neural networks for image classification.
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond
Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo#
TPAMI, 2021
[Paper] [Code] [CVPR20' Top-10 Influential Papers] [BibTex]
We extend PolarMask(CVPR'20) to several instance-level detection tasks.
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie, Wenhai Wang, Zhiding Yu#, Anima Anandkuma, Jose M. Alvarez, Ping Luo#
NeurIPS, 2021
[Paper] [Code] [中文解读] [Demo] [BibTex]
A simple and effective Transformer-based semantic segmentation framework.
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang#
in NeurIPS, 2020
[Paper] [Code] [BibTex]
We propose the generalized focal loss for learning the improved representations of dense object detector.
Selective Kernel Networks
Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang#
in CVPR, 2019
[Paper] [Code] [BibTex]
We proposed a dynamic selection mechanism in convolutional neural networks.


  • National Artificial Intelligence Challenge (NAIC) 2020, Remote Sensing Semantic Segmentation Task, 1st Place (1,000,000 RMB Bonus).
  • ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text, Task1, 1st Place.
  • ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling, Task1, 2nd Place.
  • AI Challenger 2018 Autonomous Driving Perception Task, 2nd Place (40,000 RMB Bonus)
  • ACM-ICPC Asia Regional Contest, Silver Medal

Review Services

Journal Reviewer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
IEEE Transactions on Image Processing (TIP)
IEEE Transactions on Multimedia (TMM)
Computational Visual Media Journal (CVM)

(Senior) Program Committee Member/Conference Reviewer
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2021, 2022
Neural Information Processing Systems (NeurIPS), 2020, 2021
International Conference on Machine Learning (ICML), 2021
International Conference on Learning Representations (ICLR), 2021
IEEE International Conference on Computer Vision (ICCV), 2021
AAAI Conference on Artificial Intelligence (AAAI), 2022
International Joint Conference on Artificial Intelligence (IJCAI), 2021, 2022
IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
Asian Conference on Computer Vision (ACCV), 2020