skip to main content
10.1145/3350546.3352548acmotherconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
short-paper

Semi-supervised text classification with deep convolutional neural network using feature fusion approach

Published: 14 October 2019 Publication History

Abstract

Supervised learning algorithms employ labeled training data for classification purposes while obtaining labeled data for large datasets is costly and time consuming. Semi-supervised learning algorithms, on the contrary, use a small set of labeled data and a large set of unlabeled data to improve predication performance and thus may be a good alternative to supervised learning algorithms for large text datasets. Although many semi-supervised learning algorithms have been proposed in the data science literature, most of these algorithms are not feasible for discrete and unstructured text data.
This paper aims to improve classification accuracy of semi-supervised learning algorithms applied to text data. To achieve this goal, a novel design for convolutional neural network is employed in a co-training semi-supervised learning algorithm which adds augmented data as the second input of the convolutional neural network to predict labels of text data. we also propose a novel approach for partitioning the dataset into independent views via topic modeling to train independent classifiers. In so doing, neighbour classifiers are found and confident predictions of unlabeled data are fused into labeled data. The prediction accuracy of the combined algorithm is then compared to the state-of-the-art supervised and semi-supervised learning algorithms. Our findings show that the proposed combined algorithm outperforms the supervised and semi-supervised algorithms in terms of prediction accuracy.

References

[1]
A. Agarwal, I. Zaitsev, and T. Joachims. 2018. Counterfactual Learning-to-Rank for Additive Metrics and Deep Models. In CausalML.
[2]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.
[3]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks 20, 3 (2009), 542–542.
[4]
Andrew M Dai and Quoc V Le. 2015. Semi-supervised sequence learning. In Advances in neural information processing systems. 3079–3087.
[5]
Frinken et al.2014. Keyword spotting for self-training of BLSTM NN based handwriting recognition systems. Pattern Recognition 47, 3 (2014), 1073–1082.
[6]
Karpathy et al.2014. Large-scale video classification with convolutional neural networks. In IEEE conference on Computer Vision and Pattern Recognition. 1725–1732.
[7]
Liu et al.2013. Adaptive co-training SVM for sentiment classification on tweets. In 22nd ACM CIKM. ACM, 2079–2088.
[8]
Triguero et al.2014. On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132(2014), 30–41.
[9]
Vasileios Iosifidis and Eirini Ntoutsi. 2017. Large scale sentiment learning with limited labels. In 23rd ACM SIGKDD conference. ACM, 1823–1832.
[10]
Rie Johnson and Tong Zhang. 2014. Effective use of word order for text categorization with convolutional neural networks. arXiv preprint:1412.1058(2014).
[11]
Rie Johnson and Tong Zhang. 2015. Semi-supervised convolutional neural networks for text categorization via region embedding. In Advances in neural information processing systems. 919–927.
[12]
Rie Johnson and Tong Zhang. 2016. Supervised and semi-supervised text categorization using LSTM for region embeddings. arXiv preprint:1602.02373(2016).
[13]
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188(2014).
[14]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016).
[15]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.
[16]
Abhishek Kumar and Hal Daumé. 2011. A co-training approach for multi-view spectral clustering. In Proceedings of ICML-11. 393–400.
[17]
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In AAAI, Vol. 333. 2267–2273.
[18]
Leif E Peterson. 2009. K-nearest neighbor. Scholarpedia 4, 2 (2009), 1883.
[19]
Zhiquan Qi, Yingjie Tian, and Yong Shi. 2012. Laplacian twin support vector machine for semi-supervised classification. Neural Networks 35(2012), 46–53.
[20]
Wenqing et al. Sun. 2017. Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data. Computerized Medical Imaging and Graphics 57 (2017), 4–9.
[21]
Jafar Tanha, Maarten van Someren, and Hamideh Afsarmanesh. 2017. Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics 8, 1(2017), 355–370.
[22]
Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: volume 1. ACL, 235–243.
[23]
Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv preprint arXiv:1304.5634(2013).
[24]
Xiaojin Zhu. 2005. Semi-supervised learning literature survey. Technical Report. Computer Sciences, University of Wisconsin-Madison.

Cited By

View all
  • (2024)A dual-ways feature fusion mechanism enhancing active learning based on TextCNNIntelligent Data Analysis10.3233/IDA-230332(1-23)Online publication date: 25-Jan-2024
  • (2023)A review of semi-supervised learning for text classificationArtificial Intelligence Review10.1007/s10462-023-10393-856:9(9401-9469)Online publication date: 31-Jan-2023
  • (2021)Semi-supervised Text Classification with Temporal Ensembling2021 International Conference on Computer Communication and Artificial Intelligence (CCAI)10.1109/CCAI50917.2021.9447486(204-208)Online publication date: 7-May-2021

Index Terms

  1. Semi-supervised text classification with deep convolutional neural network using feature fusion approach
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          WI '19: IEEE/WIC/ACM International Conference on Web Intelligence
          October 2019
          507 pages
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 14 October 2019

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Convolutional neural network
          2. Data augmentation
          3. Deep learning
          4. Semi-supervised leaning
          5. Text classification

          Qualifiers

          • Short-paper
          • Research
          • Refereed limited

          Conference

          WI '19

          Acceptance Rates

          Overall Acceptance Rate 118 of 178 submissions, 66%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)11
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 25 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)A dual-ways feature fusion mechanism enhancing active learning based on TextCNNIntelligent Data Analysis10.3233/IDA-230332(1-23)Online publication date: 25-Jan-2024
          • (2023)A review of semi-supervised learning for text classificationArtificial Intelligence Review10.1007/s10462-023-10393-856:9(9401-9469)Online publication date: 31-Jan-2023
          • (2021)Semi-supervised Text Classification with Temporal Ensembling2021 International Conference on Computer Communication and Artificial Intelligence (CCAI)10.1109/CCAI50917.2021.9447486(204-208)Online publication date: 7-May-2021

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media

          -