skip to main content
research-article

BLPSeg: Balance the Label Preference in Scribble-Supervised Semantic Segmentation

Published: 01 January 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Scribble-supervised semantic segmentation is an appealing weakly supervised technique with low labeling cost. Existing approaches mainly consider diffusing the labeled region of scribble by low-level feature similarity to narrow the supervision gap between scribble labels and mask labels. In this study, we observe an annotation bias between scribble and object mask, i.e., label workers tend to scribble on the spacious region instead of corners. This label preference makes the model learn well on those frequently labeled regions but poor on rarely labeled pixels. Therefore, we propose BLPSeg to balance the label preference for complete segmentation. Specifically, the BLPSeg first predicts an annotation probability map to evaluate the rarity of labels on each image, then utilizes a novel BLP loss to balance the model training by up-weighting those rare annotations. Additionally, to further alleviate the impact of label preference, we design a local aggregation module (LAM) to propagate supervision from labeled to unlabeled regions in gradient backpropagation. We conduct extensive experiments to illustrate the effectiveness of our BLPSeg. Our single-stage method even outperforms other advanced multi-stage methods and achieves state-of-the-art performance.

    References

    [1]
    J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 3431–3440.
    [2]
    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected CRFs,” in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–14.
    [3]
    H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6230–6239.
    [4]
    J. Fuet al., “Dual attention network for scene segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3141–3149.
    [5]
    T. Wu, S. Tang, R. Zhang, J. Cao, and Y. Zhang, “CGNet: A light-weight context guided network for semantic segmentation,” IEEE Trans. Image Process., vol. 30, pp. 1169–1179, 2021.
    [6]
    J. Fu, J. Liu, Y. Wang, J. Zhou, C. Wang, and H. Lu, “Stacked deconvolutional network for semantic segmentation,” IEEE Trans. Image Process., early access, Jan. 25, 2019. 10.1109/TIP.2019.2895460.
    [7]
    Z. Liuet al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9992–10002.
    [8]
    A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele, “Simple does it: Weakly supervised instance and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1665–1674.
    [9]
    C. Song, Y. Huang, W. Ouyang, and L. Wang, “Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3131–3140.
    [10]
    Y. Oh, B. Kim, and B. Ham, “Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6909–6918.
    [11]
    V. Kulharia, S. Chandra, A. Agrawal, P. Torr, and A. Tyagi, “Box2Seg: Attention weighted loss and discriminative feature learning for weakly supervised segmentation,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 290–308.
    [12]
    T. Ma, Q. Wang, H. Zhang, and W. Zuo, “Delving deeper into pixel prior for box-supervised semantic segmentation,” IEEE Trans. Image Process., vol. 31, pp. 1406–1417, 2022.
    [13]
    D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 3159–3167.
    [14]
    P. Vernaza and M. Chandraker, “Learning random-walk label propagation for weakly-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2953–2961.
    [15]
    M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers, “Normalized cut loss for weakly-supervised CNN segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1818–1827.
    [16]
    M. Tang, F. Perazzi, A. Djelouah, I. B. Ayed, C. Schroers, and Y. Boykov, “On regularized losses for weakly-supervised CNN segmentation,” in Proc. Eur. Conf. Comput. Vis., Oct. 2018, pp. 524–540.
    [17]
    B. Zhang, J. Xiao, and Y. Zhao, “Dynamic feature regularized loss for weakly supervised semantic segmentation,” 2021, arXiv:2108.01296.
    [18]
    A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei, “What’s the point: Semantic segmentation with point supervision,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, Sep. 2016, pp. 549–565.
    [19]
    X. He, L. Fang, M. Tan, and X. Chen, “Intra- and inter-slice contrastive learning for point supervised OCT fluid segmentation,” IEEE Trans. Image Process., vol. 31, pp. 1870–1881, 2022.
    [20]
    A. Kolesnikov and C. H. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, Sep. 2016, pp. 695–711.
    [21]
    Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan, “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6488–6496.
    [22]
    Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang, “Weakly-supervised semantic segmentation network with deep seeded region growing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 7014–7023.
    [23]
    J. Ahn and S. Kwak, “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4981–4990.
    [24]
    C. Redondo-Cabrera, M. Baptista-Ríos, and R. J. López-Sastre, “Learning to exploit the prior network knowledge for weakly supervised semantic segmentation,” IEEE Trans. Image Process., vol. 28, no. 7, pp. 3649–3661, Jul. 2019.
    [25]
    J. Xuet al., “Scribble-supervised semantic segmentation inference,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 15334–15343.
    [26]
    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1–9.
    [27]
    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2016.
    [28]
    A. Vaswaniet al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11.
    [29]
    Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “CCNet: Criss-cross attention for semantic segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 603–612.
    [30]
    S. Zhenget al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6877–6886.
    [31]
    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 12077–12090.
    [32]
    A. Dosovitskiyet al., “An image is worth 16✗16 words: Transformers for image recognition at scale,” 2020, arXiv:2010.11929.
    [33]
    K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9726–9735.
    [34]
    T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Proc. Int. Conf. Mach. Learn. (ICML), 2020, pp. 1597–1607.
    [35]
    B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 17864–17875.
    [36]
    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, Nov. 2020, pp. 213–229.
    [37]
    Y. Wang, J. Zhang, M. Kan, S. Shan, and X. Chen, “Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 12272–12281.
    [38]
    X. Li, H. Ma, and X. Luo, “Weaklier supervised semantic segmentation with only one image level annotation per category,” IEEE Trans. Image Process., vol. 29, pp. 128–141, 2020.
    [39]
    Y. Xu and P. Ghamisi, “Consistency-regularized region-growing network for semantic segmentation of urban scenes with point-level annotations,” IEEE Trans. Image Process., vol. 31, pp. 5038–5051, 2022.
    [40]
    T.-W. Ke, J.-J. Hwang, and S. X. Yu, “Universal weakly supervised segmentation by pixel-to-segment contrastive learning,” in Proc. Int. Conf. Learn. Represent., 2021, pp. 1–14.
    [41]
    B. Zhang, J. Xiao, Y. Wei, and Y. Zhao, “Credible dual-expert learning for weakly supervised semantic segmentation,” Int. J. Comput. Vis., vol. 131, pp. 1892–1908, Apr. 2023.
    [42]
    B. Wanget al., “Boundary perception guidance: A scribble-supervised semantic segmentation approach,” in Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019, pp. 3663–3669.
    [43]
    A. Obukhov, S. Georgoulis, D. Dai, and L. Van Gool, “Gated CRF loss for weakly supervised semantic image segmentation,” pp. 1–14, 2019, arXiv:1906.04651.
    [44]
    Z. Pan, P. Jiang, Y. Wang, C. Tu, and A. G. Cohn, “Scribble-supervised semantic segmentation by uncertainty reduction on neural representation and self-supervision on neural eigenspace,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 7396–7405.
    [45]
    P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” Int. J. Comput. Vis., vol. 59, no. 2, pp. 167–181, Sep. 2004.
    [46]
    R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC superpixels compared to state-of-the-art superpixel methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282, Nov. 2012.
    [47]
    A. Vedaldi and S. Soatto, “Quick shift and kernel methods for mode seeking,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2008, pp. 705–718.
    [48]
    K. Haris, S. N. Efstratiadis, N. Maglaveras, and A. K. Katsaggelos, “Hybrid image segmentation using watersheds and fast region merging,” IEEE Trans. Image Process., vol. 7, no. 12, pp. 1684–1699, Dec. 1998.
    [49]
    A. Banerjee, I. S. Dhillon, J. Ghosh, S. Sra, and G. Ridgeway, “Clustering on the unit hypersphere using von Mises–Fisher distributions,” J. Mach. Learn. Res., vol. 6, no. 9, pp. 1345–1382, 2005.
    [50]
    D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp. 603–619, May 2002.
    [51]
    J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-means clustering algorithm,” J. Roy. Stat. Soc. Ser. C, Appl. Statist., vol. 28, no. 1, pp. 100–108, Jan. 1979.
    [52]
    F. R. Chung, Spectral Graph Theory, vol. 92. Providence, RI, USA: American Mathematical Soc., 1997.
    [53]
    J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Aug. 2000.
    [54]
    C. Rother, V. Kolmogorov, and A. Blake, “GrabCut: Interactive foreground extraction using iterated graph cuts,” ACM Trans. Graph., vol. 23, no. 3, pp. 309–314, 2004.
    [55]
    X. Ji, A. Vedaldi, and J. Henriques, “Invariant information clustering for unsupervised image classification and segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 9864–9873.
    [56]
    M. Caronet al., “Emerging properties in self-supervised vision transformers,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9630–9640.
    [57]
    Y. Ouali, C. Hudelot, and M. Tami, “Autoregressive unsupervised image segmentation,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, Nov. 2020, pp. 142–158.
    [58]
    S. E. Mirsadeghi, A. Royat, and H. Rezatofighi, “Unsupervised image segmentation by mutual information maximization and adversarial regularization,” IEEE Robot. Autom. Lett., vol. 6, no. 4, pp. 6931–6938, Oct. 2021.
    [59]
    R. Harb and P. Knöbelreiter, “InfoSeg: Unsupervised semantic image segmentation with mutual information maximization,” in Proc. DAGM German Conf. Pattern Recognit. Cham, Switzerland: Springer, Jan. 2021, pp. 18–32.
    [60]
    T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2999–3007.
    [61]
    M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal visual object classes challenge: A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, Jan. 2015.
    [62]
    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
    [63]
    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
    [64]
    T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 418–434.
    [65]
    J. Lei Ba, J. Ryan Kiros, and G. E. Hinton, “Layer normalization,” 2016, arXiv:1607.06450.
    [66]
    A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 1–12.
    [67]
    Z. Liang, T. Wang, X. Zhang, J. Sun, and J. Shen, “Tree energy loss: Towards sparsely annotated semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 16886–16895.
    [68]
    J. Lee, E. Kim, and S. Yoon, “Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 4070–4078.
    [69]
    T. Wuet al., “Embedded discriminative attention mechanism for weakly supervised semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 16760–16769.
    [70]
    M. Pu, Y. Huang, Q. Guan, and Q. Zou, “GraphNet: Learning image pseudo annotations for weakly-supervised semantic segmentation,” in Proc. 26th ACM Int. Conf. Multimedia, Oct. 2018, pp. 483–491.
    [71]
    B. Zhang, J. Xiao, J. Jiao, Y. Wei, and Y. Zhao, “Affinity attention graph neural network for weakly supervised semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 8082–8096, Nov. 2022.
    [72]
    M. Li, D. Chen, and S. Liu, “Weakly supervised segmentation loss based on graph cuts and superpixel algorithm,” Neural Process. Lett., vol. 54, pp. 2339–2362, Jan. 2022.
    [73]
    L. Wuet al., “Sparsely annotated semantic segmentation with adaptive Gaussian mixtures,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2023, pp. 15454–15464.
    [74]
    B. Wang, Y. Qiao, D. Lin, S. D. H. Yang, and W. Li, “Cycle-consistent learning for weakly supervised semantic segmentation,” in Proc. 3rd Int. Workshop Hum.-Centric Multimedia Anal., 2022, pp. 7–13.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Image Processing
    IEEE Transactions on Image Processing  Volume 32, Issue
    2023
    5324 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 January 2023

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media

    -