Google Scholar

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …

Save Cite Cited by 18555 Related articles All 20 versions View as HTML

[PDF] neurips.cc

K-lite: Learning transferable visual models with external knowledge

S Shen, C Li, X Hu, Y Xie, J Yang… - Advances in …, 2022 - proceedings.neurips.cc

The new generation of state-of-the-art computer vision systems are trained from natural
language supervision, ranging from simple object category names to descriptive captions …

Save Cite Cited by 71 Related articles All 5 versions View as HTML

[PDF] neurips.cc

Elevater: A benchmark and toolkit for evaluating language-augmented visual models

C Li, H Liu, L Li, P Zhang, J Aneja… - Advances in …, 2022 - proceedings.neurips.cc

Learning visual representations from natural language supervision has recently shown great
promise in a number of pioneering works. In general, these language-augmented visual …

Save Cite Cited by 106 Related articles All 8 versions View as HTML

[PDF] springer.com

Transferring vision-language models for visual recognition: A classifier perspective

W Wu, Z Sun, Y Song, J Wang, W Ouyang - International Journal of …, 2024 - Springer

Transferring knowledge from pre-trained deep models for downstream tasks, particularly
with limited labeled samples, is a fundamental problem in computer vision research. Recent …

Save Cite Cited by 9 Related articles All 3 versions

[PDF] thecvf.com

Sus-x: Training-free name-only transfer of vision-language models

V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …

Save Cite Cited by 58 Related articles All 5 versions View as HTML

[PDF] mlr.press

Generative pretraining from pixels

M Chen, A Radford, R Child, J Wu… - International …, 2020 - proceedings.mlr.press

Inspired by progress in unsupervised representation learning for natural language, we
examine whether similar models can learn useful representations for images. We train a …

Save Cite Cited by 1564 Related articles All 11 versions View as HTML

[PDF] thecvf.com

Learning vision from models rivals learning vision from data

Y Tian, L Fan, K Chen, D Katabi… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce SynCLR a novel approach for learning visual representations exclusively from
synthetic images without any real data. We synthesize a large dataset of image captions …

Save Cite Cited by 17 Related articles All 3 versions View as HTML

[PDF] thecvf.com

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com

Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

Save Cite Cited by 53 Related articles All 4 versions View as HTML

[PDF] arxiv.org

Learning to decompose visual features with latent textual prompts

F Wang, M Li, X Lin, H Lv, AG Schwing, H Ji - arXiv preprint arXiv …, 2022 - arxiv.org

Recent advances in pre-training vision-language models like CLIP have shown great
potential in learning transferable visual representations. Nonetheless, for downstream …

Save Cite Cited by 22 Related articles All 4 versions View as HTML

[PDF] neurips.cc

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Y Tian, L Fan, P Isola, H Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …

Save Cite Cited by 65 Related articles All 5 versions View as HTML

Cite

Advanced search

Saved to My library

Learning transferable visual models from natural language supervision

K-lite: Learning transferable visual models with external knowledge

Elevater: A benchmark and toolkit for evaluating language-augmented visual models

Transferring vision-language models for visual recognition: A classifier perspective

Sus-x: Training-free name-only transfer of vision-language models

Generative pretraining from pixels

Learning vision from models rivals learning vision from data

Vila: On pre-training for visual language models

Learning to decompose visual features with latent textual prompts

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Related searches