skip to main content
10.1145/2072298.2071951acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Fusing object detection and region appearance for image-text alignment

Published: 28 November 2011 Publication History
  • Get Citation Alerts
  • Abstract

    We present a method for automatically aligning words to image regions that integrates specific object classifiers (e.g., "car" detectors) with weak models based on appearance features. Previous strategies have largely focused on the latter, and thus have not exploited progress on object category recognition. Hence, we augment region labeling with object detection, which simplifies the problem by reliably identifying a subset of the labels, and thereby reducing correspondence ambiguity overall. Comprehensive testing on the SAIAPR TC dataset shows that principled integration of object detection improves the region labeling task.

    References

    [1]
    L. H. Armitage and P. G. B. Enser. Analysis of user need in image archives. Journal of Information Science, 23(4):287--299, 1997.
    [2]
    K. Barnard, P. Duygulu, N. d. Freitas, D. Forsyth, D. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107--1135, 2003.
    [3]
    K. Barnard and Q. Fan. Reducing correspondence ambiguity in loosely labeled training data. In IEEE CVPR, 2007.
    [4]
    K. Barnard, Q. Fan, R. Swaminathan, A. Hoogs, R. Collins, P. Rondot, and J. Kaufhold. Evaluation of localized semantics: data, methodology, and experiments. IJCV, 77:199--217, 2008.
    [5]
    K. Barnard and D. Forsyth. Learning the semantics of words and pictures. In International Conference on Computer Vision, pages II:408--415, 2001.
    [6]
    P. Carbonetto, N. d. Freitas, and K. Barnard. A statistical model for general contextual object recognition. In ECCV, volume I, pages 350--362, 2004.
    [7]
    T. Deselaers, B. Alexe, and V. Ferrari. Localizing objects while learning their appearance. In ECCV, volume 6314 of LNCS, pages 452--466. Springer, 2010.
    [8]
    H. J. Escalante, C. A. Hernandez, J. A. Gonzalez, A. Lopez-Lopez, M. Montes, E. F. Morales, L. E. Sucar, L. Villasenor, and M. Grubinger. The segmented and annotated iapr tc-12 benchmark. Computer Vision and Image Understanding, 114(4, Special issue on Image and Video Retrieval Evaluation):419--428, 2010.
    [9]
    C. Fellbaum, P. G. A. Miller, R. Tengi, and P. Wakefield. Wordnet - a lexical database for english.
    [10]
    P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE PAMI, 2009.
    [11]
    A. Gupta and L. S. Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In In ECCV, 2008.
    [12]
    Y. Jin, L. Khan, L. Wang, and M. Awad. Image annotations by combining multiple evidence & wordnet. In ACM MM '05, New York, NY, USA, 2005.
    [13]
    V. Lavrenko, R. Manmatha, and J. Jeon. A model for learning the semantics of pictures. In NIPS, 2003.
    [14]
    J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(9):888--905, 2000.
    [15]
    D. M. Tax, M. V. Breukelen, R. P. Duin, and J. Kittler. Combining multiple classifiers by averaging or by multiplying?, 2000.
    [16]
    J. Verbeek, M. Guillaumin, T. Mensink, and C. Schmid. Image annotation with tagprop on the mirflickr set. In MIR'10, pages 537--546, New York, NY, USA, 2010. ACM.
    [17]
    H. Zhang, A. C. Berg, M. Maire, and J. Malik. Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In In CVPR, pages 2126--2136, 2006.

    Cited By

    View all

    Index Terms

    1. Fusing object detection and region appearance for image-text alignment

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '11: Proceedings of the 19th ACM international conference on Multimedia
      November 2011
      944 pages
      ISBN:9781450306164
      DOI:10.1145/2072298
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 November 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Short-paper

      Conference

      MM '11
      Sponsor:
      MM '11: ACM Multimedia Conference
      November 28 - December 1, 2011
      Arizona, Scottsdale, USA

      Acceptance Rates

      Overall Acceptance Rate 995 of 4,171 submissions, 24%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media

      -