Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan;39(Database issue):D225-9.
doi: 10.1093/nar/gkq1189. Epub 2010 Nov 24.

CDD: a Conserved Domain Database for the functional annotation of proteins

Affiliations

CDD: a Conserved Domain Database for the functional annotation of proteins

Aron Marchler-Bauer et al. Nucleic Acids Res. 2011 Jan.

Abstract

NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Conserved domain annotation on a well-characterized protein sequence. Shown here is the default concise view generated by the CD-Search tool, using pre-calculated alignment information. The view is divided into two panels: a graphical summary and a table detailing the individual matches. The query sequence coordinates are indicated on a gray bar in the top portion of the graphical summary. ‘Specific hits’ to NCBI-curated domain models are positioned in a separate area below the query sequence, with corresponding balloons rendered in saturated colors. The extent of the best-scoring hit for a region on the query also determines the annotation with the corresponding conserved domain ‘Superfamily’. ‘Superfamilies’ are positioned in the area below the ‘Specific hits’, and together these are enclosed in boxes to indicate superfamily membership of the NCBI-curated models. If the full (detailed) results display is selected, an area summarizing ‘Non-specific hits’ will be shown as well, and the corresponding boxes will be drawn so as to resolve their superfamily relationships; the highest ranked match for each superfamily defines the extents of the corresponding box. ‘Non-specific hits’ and ‘Superfamily’ balloons are rendered in pastel colors, with each superfamily being assigned a separate color. Matches to ‘multi-domain’ models are rendered as gray balloons in a separate area of the summary graph. Only the best-ranked non-overlapping multi-domain models are shown. Functional sites, as annotated on NCBI-curated domain models, are mapped to the query sequence and depicted as triangles. Sites are mapped from the highest ranked model only, and they are colored according to their source. Both conserved domain balloons and site annotations are hot-linked, so that moving the mouse over the objects displays additional information, and so that clicking on the objects launches conserved domain summary pages for the particular domain model, embedding the user query sequence in the alignment for further analysis, if applicable. A tabular view below the graphical summary lists E-values, multi-domain status and various identifiers for the conserved domain models identified as matches. The table rows can be expanded to display a detailed pair-wise sequence alignment between the query sequence and the domain model’s consensus sequence. An alignment of all sequences comprising a domain model, with or without the query sequence embedded, is accessible by clicking on the domain’s balloon representation in the graphical summary or its unique accession in the tabular summary, respectively.
Figure 2.
Figure 2.
The web-interface to Batch CD-Search. An input dialogue lets the user specify a set of protein queries or upload a corresponding file. The preliminary results page (not shown here) provides controls for downloading results in a variety of formats. The sample download format featured here lists one annotation per line, specifying the protein query, the type of domain hit (specific hit, superfamily or multidomain), from–to intervals on the query, E-value and score and the domain model’s name and accession. The Batch CD-Search help document describes the additional download options and formats available.

Similar articles

  • CDD: NCBI's conserved domain database.
    Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Bryant SH. Marchler-Bauer A, et al. Nucleic Acids Res. 2015 Jan;43(Database issue):D222-6. doi: 10.1093/nar/gku1221. Epub 2014 Nov 20. Nucleic Acids Res. 2015. PMID: 25414356 Free PMC article.
  • CDD: conserved domains and protein three-dimensional structure.
    Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Lu S, Marchler GH, Song JS, Thanki N, Yamashita RA, Zhang D, Bryant SH. Marchler-Bauer A, et al. Nucleic Acids Res. 2013 Jan;41(Database issue):D348-52. doi: 10.1093/nar/gks1243. Epub 2012 Nov 28. Nucleic Acids Res. 2013. PMID: 23197659 Free PMC article.
  • CDD: specific functional annotation with the Conserved Domain Database.
    Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Lu S, Marchler GH, Mullokandov M, Song JS, Tasneem A, Thanki N, Yamashita RA, Zhang D, Zhang N, Bryant SH. Marchler-Bauer A, et al. Nucleic Acids Res. 2009 Jan;37(Database issue):D205-10. doi: 10.1093/nar/gkn845. Epub 2008 Nov 4. Nucleic Acids Res. 2009. PMID: 18984618 Free PMC article.
  • CDD: a conserved domain database for interactive domain family analysis.
    Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, Hao L, He S, Hurwitz DI, Jackson JD, Ke Z, Krylov D, Lanczycki CJ, Liebert CA, Liu C, Lu F, Lu S, Marchler GH, Mullokandov M, Song JS, Thanki N, Yamashita RA, Yin JJ, Zhang D, Bryant SH. Marchler-Bauer A, et al. Nucleic Acids Res. 2007 Jan;35(Database issue):D237-40. doi: 10.1093/nar/gkl951. Epub 2006 Nov 29. Nucleic Acids Res. 2007. PMID: 17135202 Free PMC article.
  • Protein family classification and functional annotation.
    Wu CH, Huang H, Yeh LS, Barker WC. Wu CH, et al. Comput Biol Chem. 2003 Feb;27(1):37-47. doi: 10.1016/s1476-9271(02)00098-1. Comput Biol Chem. 2003. PMID: 12798038 Review.

Cited by

References

    1. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. - PMC - PubMed
    1. Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P. SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 2006;34:D257–D260. - PMC - PubMed
    1. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. - PMC - PubMed
    1. Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, Richter AR, White O. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 2007;35:D260–D264. - PMC - PubMed
    1. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2010;38:D5–D16. - PMC - PubMed

Publication types

-