[HBM+21] D. Hendrycks, S. Basart, N. Mu, S. Kadavath, F. Wang, E. Dorundo, R. Desai, T.
Zhu, S. Parajuli, M. Guo, et al. “The many faces of robustness: A critical analysis
of out-of-distribution generalization”. In: Conference on Computer Vision and Pattern
Recognition (CVPR). 2021.
[HFW+20] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick. “Momentum contrast for unsuper-
vised visual representation learning”. In: Conference on Computer Vision and Pattern
Recognition (CVPR). 2020.
[HWG+21] J. Z. HaoChen, C. Wei, A. Gaidon, and T. Ma. “Provable guarantees for self-
supervised deep learning with spectral contrastive loss”. In: Advances in Neural In-
formation Processing Systems (NeurIPS) (2021).
[HYH13]
M. Hodosh, P. Young, and J. Hockenmaier. “Framing image description as a rank-
ing task: Data, models and evaluation metrics”. In: Journal of Artificial Intelligence
Research (JAIR) (2013).
[HZB+21]
D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, and D. Song. “Natural adversarial
examples”. In: Conference on Computer Vision and Pattern Recognition (CVPR). 2021.
[HZR+16]
K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recogni-
tion”. In: Computer Vision and Pattern Recognition (CVPR). 2016.
[IWW+21]
G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V.
Shankar, H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, and L. Schmidt. Open-
[JGB+17]
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. “Bag of Tricks for Efficient Text
Classification”. In: European Association for Computational Linguistics (EACL). 2017.
[KBZ+20]
A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N. Houlsby.
“Big Transfer (BiT): General Visual Representation Learning”. In: European Confer-
ence on Computer Vision (ECCV). 2020.
[KGP21]
E. Kreiss, N. D. Goodman, and C. Potts. “Concadia: Tackling Image Accessibility
with Descriptive Texts and Context”. In: arXiv preprint arXiv:2104.08376 (2021).
[KSL19]
S. Kornblith, J. Shlens, and Q. V. Le. “Do better imagenet models transfer better?”
In: Conference on Computer Vision and Pattern Recognition (CVPR). 2019.
[LBL+20]
F. Locatello, S. Bauer, M. Lucic, G. R”atsch, S. Gelly, B. Sch”olkopf, and O. Bachem.
“A sober look at the unsupervised learning of disentangled representations and
their evaluation”. In: Journal of Machine Learning Research (JMLR) (2020).
[LLX+22]
J. Li, D. Li, C. Xiong, and S. Hoi. “Blip: Bootstrapping language-image pre-training
for unified vision-language understanding and generation”. In: arXiv preprint
arXiv:2201.12086 (2022).
[LLZ+22]
Y. Li, F. Liang, L. Zhao, Y. Cui, W. Ouyang, J. Shao, F. Yu, and J. Yan. “Supervi-
sion exists everywhere: A data efficient contrastive language-image pre-training
paradigm”. In: International Conference on Learning Representations (ICLR). 2022.
[LMB+14]
T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll’ar, and C. L.
Zitnick. “Microsoft coco: Common objects in context”. In: European Conference on
Computer Vision (ECCV). 2014.
[MKW+21] N. Mu, A. Kirillov, D. Wagner, and S. Xie. “SLIP: Self-supervision meets Language-
Image Pre-training”. In: arXiv preprint arXiv:2112.12750 (2021).
15