Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jun 6:9:270.
doi: 10.1186/1471-2105-9-270.

PageRank without hyperlinks: reranking with PubMed related article networks for biomedical text retrieval

Affiliations

PageRank without hyperlinks: reranking with PubMed related article networks for biomedical text retrieval

Jimmy Lin. BMC Bioinformatics. .

Abstract

Background: Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed(R) search interface, a MEDLINE(R) citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web.

Results: We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics.

Conclusion: The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Screenshot of PubMed showing a MEDLINE abstract. The "Related Articles" panel on the right is populated with titles of articles that may be of interest.
Figure 2
Figure 2
Effectiveness of interpolating Terrier retrieval scores with PageRank scores (MAP20).
Figure 3
Figure 3
Effectiveness of interpolating Terrier retrieval scores with PageRank scores (MAP40).
Figure 4
Figure 4
Effectiveness of interpolating Terrier retrieval scores with PageRank scores (P20).
Figure 5
Figure 5
Effectiveness of interpolating Terrier retrieval scores with HITS authority scores (MAP20).
Figure 6
Figure 6
Effectiveness of interpolating Terrier retrieval scores with HITS authority scores (MAP40).
Figure 7
Figure 7
Effectiveness of interpolating Terrier retrieval scores with HITS authority scores (P20).
Figure 8
Figure 8
Effectiveness of interpolating Terrier retrieval scores with HITS hub scores (MAP20).
Figure 9
Figure 9
Effectiveness of interpolating Terrier retrieval scores with HITS hub scores (MAP40).
Figure 10
Figure 10
Effectiveness of interpolating Terrier retrieval scores with HITS hub scores (P20).

Similar articles

Cited by

References

    1. Page L, Brin S, Motwani R, Winograd T. Stanford Digital Library Working Paper SIDL-WP-1999-0120. Stanford University; 1999. The PageRank Citation Ranking: Bringing Order to the Web.
    1. Kleinberg JM. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM. 1999;46:604–632.
    1. Lin J, Wilbur WJ. PubMed Related Articles: A Probabilistic Topic-based Model for Content Similarity. BMC Bioinformatics. 2007;8:423. - PMC - PubMed
    1. Hersh WR, Cohen A, Yang J, Bhupatiraju R, Roberts P, Hearst M. TREC 2005 Genomics Track Overview. Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005), Gaithersburg, Maryland. 2005.
    1. Amati G, van Rijsbergen CJ. Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM Transactions on Information Systems. 2002;20:357–389.

Publication types

MeSH terms

LinkOut - more resources

-