Skip to main content

Towards Automatic Classification of Wikipedia Content

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2010 (IDEAL 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6283))

  • 1637 Accesses

Abstract

Wikipedia – the Free Encyclopedia encounters the problem of proper classification of new articles everyday. The process of assignment of articles to categories is performed manually and it is a time consuming task. It requires knowledge about Wikipedia structure, which is beyond typical editor competence, which leads to human-caused mistakes – omitting or wrong assignments of articles to categories. The article presents application of SVM classifier for automatic classification of documents from The Free Encyclopedia. The classifier application has been tested while using two text representations: inter-documents connections (hyperlinks) and word content. The results of the performed experiments evaluated on hand crafted data show that the Wikipedia classification process can be partially automated. The proposed approach can be used for building a decision support system which suggests editors the best categories that fit new content entered to Wikipedia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

') var buybox = document.querySelector("[data-id=id_"+ timestamp +"]").parentNode var buyingOptions = buybox.querySelectorAll(".buying-option") ;[].slice.call(buyingOptions).forEach(initCollapsibles) var buyboxMaxSingleColumnWidth = 480 function initCollapsibles(subscription, index) { var toggle = subscription.querySelector(".buying-option-price") subscription.classList.remove("expanded") var form = subscription.querySelector(".buying-option-form") var priceInfo = subscription.querySelector(".price-info") var buyingOption = toggle.parentElement if (toggle && form && priceInfo) { toggle.setAttribute("role", "button") toggle.setAttribute("tabindex", "0") toggle.addEventListener("click", function (event) { var expandedBuyingOptions = buybox.querySelectorAll(".buying-option.expanded") var buyboxWidth = buybox.offsetWidth ;[].slice.call(expandedBuyingOptions).forEach(function(option) { if (buyboxWidth <= buyboxMaxSingleColumnWidth && option != buyingOption) { hideBuyingOption(option) } }) var expanded = toggle.getAttribute("aria-expanded") === "true" || false toggle.setAttribute("aria-expanded", !expanded) form.hidden = expanded if (!expanded) { buyingOption.classList.add("expanded") } else { buyingOption.classList.remove("expanded") } priceInfo.hidden = expanded }, false) } } function hideBuyingOption(buyingOption) { var toggle = buyingOption.querySelector(".buying-option-price") var form = buyingOption.querySelector(".buying-option-form") var priceInfo = buyingOption.querySelector(".price-info") toggle.setAttribute("aria-expanded", false) form.hidden = true buyingOption.classList.remove("expanded") priceInfo.hidden = true } function initKeyControls() { document.addEventListener("keydown", function (event) { if (document.activeElement.classList.contains("buying-option-price") && (event.code === "Space" || event.code === "Enter")) { if (document.activeElement) { event.preventDefault() document.activeElement.click() } } }, false) } function initialStateOpen() { var buyboxWidth = buybox.offsetWidth ;[].slice.call(buybox.querySelectorAll(".buying-option")).forEach(function (option, index) { var toggle = option.querySelector(".buying-option-price") var form = option.querySelector(".buying-option-form") var priceInfo = option.querySelector(".price-info") if (buyboxWidth > buyboxMaxSingleColumnWidth) { toggle.click() } else { if (index === 0) { toggle.click() } else { toggle.setAttribute("aria-expanded", "false") form.hidden = "hidden" priceInfo.hidden = "hidden" } } }) } initialStateOpen() if (window.buyboxInitialised) return window.buyboxInitialised = true initKeyControls() })()

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Sebastiani, F.: Machine learning in automated text categorization. ACM computing surveys (CSUR) 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  2. Voss, J.: Measuring wikipedia. In: Proc. of International Conference of the International Society for Scientometrics and Informetrics (2005)

    Google Scholar 

  3. Liu, N., Zhang, B., Yan, J., Chen, Z., Liu, W., Bai, F., Chien, L.: Text representation: From vector to tensor. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 725–728. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  4. Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector spaces model in information retrieval. In: SIGIR ’85: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM Press, New York (1985)

    Chapter  Google Scholar 

  5. Hearst, M., Dumais, S., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent systems 13, 18–28 (1998)

    Google Scholar 

  6. Miller, G.A., Beckitch, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. In: Cognitive Science Laboratory. Princeton University Press, Princeton (1993)

    Google Scholar 

  7. Voorhees, E.: Using WordNet to disambiguate word senses for text retrieval. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 171–180. ACM, New York (1993)

    Chapter  Google Scholar 

  8. Szymański, J., Mizgier, A., Szopiński, M., Lubomski, P.: Ujednoznacznianie słów przy użyciu słownika WordNet. Wydawnictwo Naukowe PG TI 2008 18, 89–195 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Szymański, J. (2010). Towards Automatic Classification of Wikipedia Content. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2010. IDEAL 2010. Lecture Notes in Computer Science, vol 6283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15381-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15381-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15380-8

  • Online ISBN: 978-3-642-15381-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation

-