Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009:2009:bap018.
doi: 10.1093/database/bap018. Epub 2009 Nov 27.

Understanding PubMed user search behavior through log analysis

Affiliations

Understanding PubMed user search behavior through log analysis

Rezarta Islamaj Dogan et al. Database (Oxford). 2009.

Abstract

This article reports on a detailed investigation of PubMed users' needs and behavior as a step toward improving biomedical information retrieval. PubMed is providing free service to researchers with access to more than 19 million citations for biomedical articles from MEDLINE and life science journals. It is accessed by millions of users each day. Efficient search tools are crucial for biomedical researchers to keep abreast of the biomedical literature relating to their own research. This study provides insight into PubMed users' needs and their behavior. This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries. Multiple aspects of users' interactions with PubMed are characterized in detail with evidence from these logs. Despite having many features in common with general Web searches, biomedical information searches have unique characteristics that are made evident in this study. PubMed users are more persistent in seeking information and they reformulate queries often. The three most frequent types of search are search by author name, search by gene/protein, and search by disease. Use of abbreviation in queries is very frequent. Factors such as result set size influence users' decisions. Analysis of characteristics such as these plays a critical role in identifying users' information needs and their search habits. In turn, such an analysis also provides useful insight for improving biomedical information retrieval.Database URL:http://www.ncbi.nlm.nih.gov/PubMed.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An overview of user interactions with PubMed. A user queries PubMed or uses other systems for a particular biomedical information need. Offered a set of retrieved documents, the user can browse the result set and subsequently click to view abstracts or full-text articles, issue a new query or abandon the current search. Solid lines show the basic user action leading to a set of results. Dashed lines show the possible follow-on actions.
Figure 2.
Figure 2.
Histogram view of the distribution of users, detailed by number of queries they issue.
Figure 3.
Figure 3.
Distribution of number of queries relative to the number of tokens.
Figure 4.
Figure 4.
Annotated queries by category. Queries annotated with bibliographic categories (Author Name, Citation, Journal Name and MEDLINE Title) are shown in purple, queries annotated with non-bibliographic categories (Gene/Protein, Disorder, Chemical/Drug, Biological Process, Medical Procedure, Living Being, Research Procedure, Cell Component, Body Part, Device or Tissue) are shown in blue, the percentage of queries containing an abbreviation is shown in yellow, and the queries that could not be fitted in the proposed set of categories are shown in red.
Figure 5.
Figure 5.
Distribution of queries according to their returned result set size. One third of queries returned from 1 to 20 citations, which are displayed in a single page.
Figure 6.
Figure 6.
Distribution of bibliographic queries (author name, journal name, title and other citation information) and non-bibliographic queries (disorder, gene/protein, research or medical procedure, device, body part, cell, tissue or living being) according to their result set size.
Figure 7.
Figure 7.
Distribution of abstract view requests for ordinal positions of the first page of results (data follows a Power law shown with the red line).
Figure 8.
Figure 8.
Distribution of abstract retrievals per ordinal position.
Figure 9.
Figure 9.
Distribution of abstract retrievals per ordinal position (ratio is computed per page).
Figure 10.
Figure 10.
Distribution of abstract and full-text requests given the number of citations returned per query. (Number of returned citations is shown in log scale).
Figure 11.
Figure 11.
Distribution of subsequent queries according to their time difference.
Figure 12.
Figure 12.
Distribution of queries subsequent to zero-result queries, detailed by the number of returned citations.
Figure 13.
Figure 13.
Distribution of the abandoned queries and subsequent queries according to their returned number of citations.
Figure 14.
Figure 14.
Distribution of time to first and last click in minutes.

Similar articles

Cited by

References

    1. Hunter L, Cohen KB. Biomedical language processing: what's; beyond PubMed? Mol. Cell. 2006;21:589–594. - PMC - PubMed
    1. Tenopir C. Online databases: are e-journals good for science? Library J. 2008;133:24.
    1. Taylor R. Question negotiation and information seeking in libraries. College Res. Libraries. 1968;29:178–194.
    1. Murray GC, Teevan J. Query log analysis: social and technological challenges (WWW 2007 Workshop Report) ACM SIGIR Forum. 2007;41:112–120.
    1. Spink A, Jansen BJ, editors. Web Search: Public Searching of the Web. Kluwer, Dordrecht: 2004.
-