Conv2Former : A Simple Transformer-Style ConvNet for Visual Recognition

Vision Transformers have been the most popular network architecture in visual recognition recently due to the strong ability of encode global information. However, its high computational cost when processing high-resolution images limits the applications in downstream tasks. In this paper, we take a...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 12 vom: 15. Dez., Seite 8274-8283
Auteur principal: Hou, Qibin (Auteur)
Autres auteurs: Lu, Cheng-Ze, Cheng, Ming-Ming, Feng, Jiashi
Format: Article en ligne
Langue:English
Publié: 2024
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article