Wenhai Wang

whai362

Wenhai Wang (王文海)
Affiliation: MMLab, The Chinese University of Hong Kong
Address: Room 703, Ho Sin Hang Engineering Building, The Chinese University of Hong Kong, Shatin, N.T. Hong Kong
Email: wangwenhai362[at]{163.com, gmail.com}, whwang@ie.cuhk.edu.hk

[Home] [GitHub] [Google Scholar]

(* Equal contribution, † Interns, # Corresponding authors,)

Technical Report

46. DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

Wenhai Wang^

45. ControlLLM: Augment Language Models with Tools by Searching on Graphs

Wenhai Wang#^

44. InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

Wenhai Wang*^

43. VideoChat: Chat-Centric Video Understanding

Wenhai Wang

42. Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

Wenhai Wang^

2024

43. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Wenhai Wang

42. Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

Wenhai Wang

41. The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Wenhai Wang*^

40. Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

Wenhai Wang

(Spotlight Paper (5%))

2023

39. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Wenhai Wang*^

38. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

Wenhai Wang

(Spotlight Paper (3.1%))

37. Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

Wenhai Wang

36. Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

Wenhai Wang

35. AvSegformer: Audio-Visual Segmentation with Transformer

Wenhai Wang

34. Feature Selection Based on Intrusive Outliers Rather Than All Instances

Wenhai Wang

33. FB-BEV: BEV Representation from Forward-Backward View Transformations

Wenhai Wang

32. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang*^

(Highlight Paper (2.5%))

31. Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Wenhai Wang

(Highlight Paper (2.5%))

30. Goal-oriented Autonomous Driving

Wenhai Wang

(Best Paper Award)

29. Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Wenhai Wang

28. Vision Transformer Adapter for Dense Predictions

Wenhai Wang#^

(Spotlight Paper (8.0%))

2022

27. Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

Wenhai Wang

26. On efficient reinforcement learning for full-length game of StarCraft II

Wenhai Wang

25. BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

[ECCV 2022' Top-10 Influential Papers]

[100 Most Cited AI Papers in 2022]

Wenhai Wang*^

24. VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Wenhai Wang*^

23. Incremental Few-Shot Semantic Segmentation via Embedding Adaptive-Update and Hyper-class Representation

Wenhai Wang

22. Generalized Focal Loss: Towards Efficient Representation Learning for Dense Object Detection

Wenhai Wang

(ESI Highly Cited Paper (1%))

21. Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Wenhai Wang#^

20. PVT v2: Improved Baselines with Pyramid Vision Transformer

Wenhai Wang#^

(ESI Highly Cited Paper (1%), ESI Hot Paper (0.1%), CNKI's Academic Essentials, Best Paper Honorable Mention Award)

19. Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Wenhai Wang#^

2018-2021

18. Grid Dividing for Single-Stage Instance Segmentation

Wenhai Wang^

17. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

[NeurIPS21' Top-10 Influential Papers]

Wenhai Wang

16. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

[ICCV21' Top-10 Influential Papers]

Wenhai Wang^

(Oral Presentation (3.4%), 2023 World Artificial Intelligence Conference Youth Outstanding Paper Award)

15. DetCo: Unsupervised Contrastive Learning for Object Detection

Wenhai Wang

14. PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

Wenhai Wang*^

13. PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

Wenhai Wang*^

12. Segmenting Transparent Object in the Wild with Transformer

Wenhai Wang

11. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Wenhai Wang

10. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

Wenhai Wang

9. AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Wenhai Wang^

8. Segmenting Transparent Objects in the Wild

Wenhai Wang

7. Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

Wenhai Wang

6. Scene Text Image Super-Resolution in the Wild

Wenhai Wang

5. PolarMask: Single Shot Instance Segmentation with Polar Representation

[CVPR20' Top-10 Influential Papers]

Wenhai Wang

(Oral Presentation (5.7%))

4. Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Wenhai Wang*^

3. Shape Robust Text Detection with Progressive Scale Expansion Network

Wenhai Wang*^

2. Selective Kernel Networks

Wenhai Wang

1. Mixed Link Networks

Wenhai Wang*^

(Oral Presentation)