About Me ([GitHub] [Google Scholar] [CV])

I am a Research Scientist at Shanghai AI Laboratory, led by Prof. Jifeng Dai and Prof. Yu Qiao

Previously, I obtained the Ph.D. degree from Department of Computer Science and Technology, Nanjing University (NJU) in 2021. My academic supervisor is Prof. Tong Lu. I received my B.E degree from Nanjing University of Science and Technology (NUST) in 2016.
I work very close with my friends Dr. Enze Xie and Prof. Xiang Li. I was fortunate to work with Prof. Ping Luo and Prof. Chunhua Shen.

My recent works are mainly on:
The fundamental vision department at Shanghai AI Laboratory is now hiring. If you are interested in internship/researcher positions related to computer vision, please feel free to contact me through the email.



Recent Works ([Full List])

(* Equal contribution, † Interns, # Corresponding authors)
Vision Transformer Adapter for Dense Predictions
Zhe Chen*†, Yuchen Duan*†, Wenhai Wang*, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao#
arXiv, 2022
[Paper] [Code] Star [BibTex]
We design a ViT adapter for dense prediction tasks.
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Zhiqi Li*†, Wenhai Wang*, Hongyang Li*, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai#
ECCV, 2022
[Paper] [Code] Star [BibTex]
A versatile camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
Changyao Tian*†, Wenhai Wang*, Xizhou Zhu*, Xiaogang Wang, Jifeng Dai#, Yu Qiao
ECCV, 2022
[Paper] [Code] Star [BibTex]
We design a vision-language-based framework for long-tailed recognition.
Generalized Focal Loss: Towards Efficient Representation Learning for Dense Object Detection
Xiang Li, Chengqi Lv, Wenhai Wang, Gang Li, Lingfeng Yang, Jian Yang#
TPAMI, 2022
[Paper] [Code] Star [BibTex]
An extended version of GFLv1/v2.
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
Zhiqi Li†, Wenhai Wang#, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Tong Lu#, Ping Luo
CVPR, 2022
[Paper] [Code] Star [BibTex]
We proposed a transformer-based panoptic segmentation framework.
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang#, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
CVMJ, 2021
[Paper] [Code] Star [中文解读] [Report] [Talk] [BibTex]
A better PVT.
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization
Zhe Chen†, Wenhai Wang#, Enze Xie, Tong Lu#, Ping Luo
AAAI, 2022
[Paper] [Code] Star [BibTex]
We proposed a neural style transfer framework for arbitrary ultra-resolution images.

Selected Works ([Full List])

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan#, Kaitao Song, Ding Liang, Tong Lu#, Ping Luo, Ling Shao
ICCV, 2021 (oral presentation)
[Paper] [Code] Star [中译版] [中文解读] [Report] [Talk] [BibTex]
[ICCV21' Top-10 Influential Papers]
A pure Transformer backbone for dense prediction, such as object detection and semantic segmentation.
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text
Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu#, Chunhua Shen
TPAMI, 2021
[Paper] [Code] Star [BibTex]
We extend PSENet (CVPR'19) and PAN (ICCV'19) to a text spotting system.
Shape Robust Text Detection with Progressive Scale Expansion Network
Wenhai Wang*, Enze Xie*, Xiang Li, Wenbo Hou, Tong Lu#, Gang Yu, Shuai Shao
CVPR, 2019
[Paper] [Poster] [Code] Star [BibTex]
We proposed a segmentation-based text detector that can precisely detect text instances with arbitrary shapes.
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond
Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo#
TPAMI, 2021
[Paper] [Code] Star [BibTex]
[CVPR20' Top-10 Influential Papers]
We extend PolarMask(CVPR'20) to several instance-level detection tasks.
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie, Wenhai Wang, Zhiding Yu#, Anima Anandkuma, Jose M. Alvarez, Ping Luo#
NeurIPS, 2021
[Paper] [Code] Star [中文解读] [Demo] [BibTex]
[NeurIPS21' Top-10 Influential Papers]
A simple and effective Transformer-based semantic segmentation framework.
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang#
NeurIPS, 2020
[Paper] [Code] Star [BibTex]
We propose the generalized focal loss for learning the improved representations of dense object detector.
Selective Kernel Networks
Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang#
CVPR, 2019
[Paper] [Code] Star [BibTex]
We proposed a dynamic selection mechanism in convolutional neural networks.

Honors and Awards

Review Services

Senior Program Committee Member
International Joint Conference on Artificial Intelligence (IJCAI), 2021

Journal Reviewer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
International Journal of Computer Vision (IJCV)
IEEE Transactions on Image Processing (TIP)
IEEE Transactions on Multimedia (TMM)
Computational Visual Media Journal (CVMJ)
Pattern Recognition (PR)

Program Committee Member/Conference Reviewer
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2021, 2022
Neural Information Processing Systems (NeurIPS), 2020, 2021
International Conference on Machine Learning (ICML), 2021, 2022
International Conference on Learning Representations (ICLR), 2021
IEEE International Conference on Computer Vision (ICCV), 2021
European Conference on Computer Vision (ECCV), 2022
AAAI Conference on Artificial Intelligence (AAAI), 2022
International Joint Conference on Artificial Intelligence (IJCAI), 2022
IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
Asian Conference on Computer Vision (ACCV), 2020