Learning transferable visual models from natural language supervision
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …
object categories. This restricted form of supervision limits their generality and usability since …
K-lite: Learning transferable visual models with external knowledge
The new generation of state-of-the-art computer vision systems are trained from natural
language supervision, ranging from simple object category names to descriptive captions …
language supervision, ranging from simple object category names to descriptive captions …
Elevater: A benchmark and toolkit for evaluating language-augmented visual models
Learning visual representations from natural language supervision has recently shown great
promise in a number of pioneering works. In general, these language-augmented visual …
promise in a number of pioneering works. In general, these language-augmented visual …
Transferring vision-language models for visual recognition: A classifier perspective
Transferring knowledge from pre-trained deep models for downstream tasks, particularly
with limited labeled samples, is a fundamental problem in computer vision research. Recent …
with limited labeled samples, is a fundamental problem in computer vision research. Recent …
Sus-x: Training-free name-only transfer of vision-language models
V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …
effective way to train large-scale vision-language models. CLIP demonstrates impressive …
Generative pretraining from pixels
Inspired by progress in unsupervised representation learning for natural language, we
examine whether similar models can learn useful representations for images. We train a …
examine whether similar models can learn useful representations for images. We train a …
Learning vision from models rivals learning vision from data
We introduce SynCLR a novel approach for learning visual representations exclusively from
synthetic images without any real data. We synthesize a large dataset of image captions …
synthetic images without any real data. We synthesize a large dataset of image captions …
Vila: On pre-training for visual language models
Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …
language models. There have been growing efforts on visual instruction tuning to extend the …
Learning to decompose visual features with latent textual prompts
Recent advances in pre-training vision-language models like CLIP have shown great
potential in learning transferable visual representations. Nonetheless, for downstream …
potential in learning transferable visual representations. Nonetheless, for downstream …
Stablerep: Synthetic images from text-to-image models make strong visual representation learners
We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …
generated by text-to-image models. This is a natural question in the light of the excellent …