PubMed related articles: a probabilistic topic-based model for content similarity
- PMID: 17971238
- PMCID: PMC2212667
- DOI: 10.1186/1471-2105-8-423
PubMed related articles: a probabilistic topic-based model for content similarity
Abstract
Background: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance-but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH in MEDLINE.
Results: The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision.
Conclusion: Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search.
Figures
![Figure 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2212667/bin/1471-2105-8-423-1.gif)
![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2212667/bin/1471-2105-8-423-2.gif)
![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2212667/bin/1471-2105-8-423-3.gif)
![Figure 4](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2212667/bin/1471-2105-8-423-4.gif)
![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2212667/bin/1471-2105-8-423-5.gif)
![Figure 6](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2212667/bin/1471-2105-8-423-6.gif)
Similar articles
-
Learning to rank diversified results for biomedical information retrieval from multiple features.Biomed Eng Online. 2014;13 Suppl 2(Suppl 2):S3. doi: 10.1186/1475-925X-13-S2-S3. Epub 2014 Dec 11. Biomed Eng Online. 2014. PMID: 25560088 Free PMC article.
-
PageRank without hyperlinks: reranking with PubMed related article networks for biomedical text retrieval.BMC Bioinformatics. 2008 Jun 6;9:270. doi: 10.1186/1471-2105-9-270. BMC Bioinformatics. 2008. PMID: 18538027 Free PMC article.
-
Objective and automated protocols for the evaluation of biomedical search engines using No Title Evaluation protocols.BMC Bioinformatics. 2008 Feb 29;9:132. doi: 10.1186/1471-2105-9-132. BMC Bioinformatics. 2008. PMID: 18312673 Free PMC article.
-
Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library.Int J Med Inform. 2006 Jun;75(6):488-95. doi: 10.1016/j.ijmedinf.2005.06.007. Epub 2005 Sep 13. Int J Med Inform. 2006. PMID: 16165395
-
Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation.Bioinformatics. 2003;19 Suppl 1:i91-4. doi: 10.1093/bioinformatics/btg1011. Bioinformatics. 2003. PMID: 12855443
Cited by
-
Opportunities and challenges for ChatGPT and large language models in biomedicine and health.Brief Bioinform. 2023 Nov 22;25(1):bbad493. doi: 10.1093/bib/bbad493. Brief Bioinform. 2023. PMID: 38168838 Free PMC article.
-
Non-pharmacological interventions for improving sleep in people living with HIV: a systematic narrative review.Front Neurol. 2023 Nov 20;14:1017896. doi: 10.3389/fneur.2023.1017896. eCollection 2023. Front Neurol. 2023. PMID: 38125837 Free PMC article.
-
MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.Bioinformatics. 2023 Nov 1;39(11):btad651. doi: 10.1093/bioinformatics/btad651. Bioinformatics. 2023. PMID: 37930897 Free PMC article.
-
Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health.ArXiv [Preprint]. 2023 Oct 17:arXiv:2306.10070v2. ArXiv. 2023. Update in: Brief Bioinform. 2023 Nov 22;25(1):bbad493. doi: 10.1093/bib/bbad493. PMID: 37904734 Free PMC article. Updated. Preprint.
-
Searching and Evaluating Publications and Preprints Using Europe PMC.Curr Protoc. 2023 Mar;3(3):e694. doi: 10.1002/cpz1.694. Curr Protoc. 2023. PMID: 36946755 Free PMC article.
References
-
- Wilbur WJ. Modeling Text Retrieval in Biomedicine. In: Chen H, Fuller SS, Friedman C, Hersh W, editor. Medical Informatics: Knowledge Management and Data Mining in Biomedicine. New York: Springer; 2005. pp. 277–297.
-
- Lin J, DiCuccio M, Grigoryan V, Wilbur WJ. Tech Rep LAMP-TR-145/CS-TR-4877/UMIACS-TR-2007-36/HCIL-2007-10. University of Maryland, College Park, Maryland; 2007. Exploring the Effectiveness of Related Article Search in PubMed.
-
- Harman DK. The TREC Test Collections. In: Voorhees EM, Harman DK, editor. TREC: Experiment and Evaluation in Information Retrieval. Cambridge, Massachusetts: MIT Press; 2005. pp. 21–52.
-
- Cleverdon CW, Mills J, Keen EM. Factors Determining the Performance of Indexing Systems. Two volumes, ASLIB Cranfield Research Project, Cranfield, England. 1968.
-
- Robertson SE, Walker S, Jones S, Hancock-Beaulieu M, Gatford M. Okapi at TREC-3. Proceedings of the 3rd Text REtrieval Conference (TREC-3) 1994.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources