Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;79 Suppl 10(Suppl 10):21-36.
doi: 10.1002/prot.23190. Epub 2011 Oct 14.

CASP9 target classification

Affiliations

CASP9 target classification

Lisa N Kinch et al. Proteins. 2011.

Abstract

The Critical assessment of protein structure prediction round 9 (CASP9) aimed to evaluate predictions for 129 experimentally determined protein structures. To assess tertiary structure predictions, these target structures were divided into domain-based evaluation units that were then classified into two assessment categories: template based modeling (TBM) and template free modeling (FM). CASP9 targets were split into domains of structurally compact evolutionary modules. For the targets with more than one defined domain, the decision to split structures into domains for evaluation was based on server performance. Target domains were categorized based on their evolutionary relatedness to existing templates as well as their difficulty levels indicated by server performance. Those target domains with sequence-related templates and high server prediction performance were classified as TMB, whereas those targets without identifiable templates and low server performance were classified as FM. However, using these generalizations for classification resulted in a blurred boundary between CASP9 assessment categories. Thus, the FM category included those domains without sequence detectable templates (25 target domains) as well as some domains with difficult to detect templates whose predictions were as poor as those without templates (five target domains). Several interesting examples are discussed, including targets with sequence related templates that exhibit unusual structural differences, targets with homologous or analogous structure templates that are not detectable by sequence, and targets with new folds.

PubMed Disclaimer

Figures

Figure 1
Figure 1. CASP9 domains
CASP9 target structures with defined domains (colored red and blue) are split according to graphs of whole chain (x axis) vs. weighted sum of domain (y axis) GDT scores. (A) Slope of GDT scores for target T0521 suggests splitting. (B) Target T0521 forms a swapped (slate and white helix) and intertwined dimer (gray second chain) of duplicated EF Hand domains. (C) Template for T0521 (2aao) forms a non-intertwined swapped dimer of duplicated EF Hand domains colored as above. (D) Graph Slope of GDT scores for target T0515 suggests no split. (E) The domains of target T0515 are arranged in a similar orientation as (F) the template (1f3t).
Figure 2
Figure 2. Gaussian kernel density cutoffs
Gaussian kernel density estimates for various bandwidths (small to large – magenta to blue, densities at representative intermediate bandwidths are shown as thicker red, brown and black curves) built on (A) first sever models for the GDT-TS scores above random and (B) “Number of first models”, respectively. This “Number” was computed by averaging the number of first models above random with the number of first models above a difficulty cutoff of 36 and can be thought of as a number of reasonably good models for a given target. Long ticks on the X-axis mark the position of corresponding score for each target. Green vertical lines mark the data-suggested cutoff.
Figure 3
Figure 3. CASP9 target score distributions
(A) A histogram depicts T0571 GDT scores above random for all CASP9 first server models and suggests a difficulty cutoff around GDT score 36. (B) A scatter plot of “Number of first models” vs. average GDT scores depicts the distribution of CASP9 target domains. Positions of FM targets are shown by target number, bolded numbers are for targets with templates detectable by sequence. Positions of TBM targets (templates are readily detectable by sequence methods) are show as black dots. Gray lines correspond to cutoffs from gaussian kernel density estimates. (C) A histogram of first principal component scores, which combine four different individual scores that are calculated for each target domain (number of groups scoring above random, number of groups scoring above difficulty cutoff 36, average of GDT scores above random, and highest GDT score between target and closest template as found by LGA program), shows incomplete separation for FM (black) and TBM (gray) targets.
Figure 4
Figure 4. Unusual structure differences between targets and sequence-related templates
(A) Target T0604 is comprised of three domains: T0604_1(blue), T0604_2 (green), and T0604_3 (red). (B) T0604_1 forms a unique ferredoxin-like fold with a longer sheet than (C) the closest template 2w7a. (D) The FAD/NAD(P)-binding domain of 2i0z is the closest template to (E) the FAD/NAD(P)-binding domain of Target T0604_2. (F) A more distantly related alternate template for T0604_2 (1kdg) possesses more insertions (white). (G) A six-stranded barrel with an inserted four-helical bundle (HI0933 insert domain-like) is inserted in the closest FAD/NAD(P)-binding domain template 2i0z. (H) The target T0604_3 domain insert forms an α+β sandwich that resembles (I) the FAD-linked reductase C-terminal domain insertion of the more distantly related FAD/NAD(P)-binding domain (1kdg). (J) Target T0553 α-Helices in the N- and C- terminal domains in are in blue (T0553_1) and red (T0553_2) colors respectively. (K) Four α-helices are labeled 1 to 4 for each of the duplicated domains. The N- and C-terminal loops are colored in green. (L) Packed EF hands consist of two α-helices each (labeled 0 and 1 for the first EF-hand shown in yellow, and 2 and 3 for the second EF-hand shown in orange) and a loop (colored purple) in between them.
Figure 5
Figure 5. Domains with templates not detectable by sequence
(A) Target T0531 3-strand β-meander resembles the (B) midkine fold, with a structural alignment, preserving conserved disulfide pairs (magenta) that are important to the fold. (C) Target T0561 includes a central 3-helical bundle (rainbow) with N-terminal (slate) and C-terminal (salmon) elaborations that are similar to (D) the core helix-turn-helix motif (HTH) of the C-terminal domain of replication initiation factor DnaA, with the core HTH alignment including conserved functional residues.
Figure 6
Figure 6. Mapping unclassified TBM targets to fold space
(A) Target T0557 α+β sandwich identifies (B) the N-terminal domain of a divergent AAA (3lmm_1) as a close homolog with similar N-terminal helical extensions and β-hairpin like insertions (gray) as compared to (C) the related IF3-like SCOP fold (1udv). (D) Target T0540 β-sandwich belongs to the FAIM1 family, which includes (E) a structure representative (2k2d) described as a 7-stranded β-sandwich. The common β-meander of the two flattened sandwiches resembles (F) the β-meander topology of streptavadin-like barrels (1ei5).
Figure 7
Figure 7. Structure analogs identified for T0603 TBM domains
(A) The N-terminal structural domain of T0603 has a small 3-stranded β-meander (magenta) that could replace (B) the helix (magenta) of α/β folds like the RNase H-like motif (1o13). (C) Three split helices (rainbow) of the T0603 C-terminal helical domain form an endonuclease active site (black) with a positively charges side chain (magenta) positioned similarly as (D) positively charged side chains (magenta) from the O-phosphoseryl-tRNA kinase C-terminal domain (3adb) nucleotide-binding site (black).
Figure 8
Figure 8. New fold includes small subdomain with local contacts
(A) The target T0529_1 includes a small set of helices (rainbow) that display local contacts and form part of the functional site (magenta). A large number of additional N-terminal (slate) and C-terminal (salmon) secondary structural elements decorate this core. (B) ProSMoS and HorA identify an array of helices the C-terminus of the gamma subunit of dissimilatory sulfite reductase I (3or1_C) that are arranged in a similar topology to the defined target sub domain.

Similar articles

Cited by

References

    1. Majumdar I, Kinch LN, Grishin NV. A database of domain definitions for proteins with complex interdomain geometry. PLoS One. 2009;4(4):e5084. - PMC - PubMed
    1. Liu Y, Eisenberg D. 3D domain swapping: as domains continue to swap. Protein Sci. 2002;11(6):1285–1299. - PMC - PubMed
    1. Moglich A, Ayers RA, Moffat K. Structure and signaling mechanism of Per-ARNT-Sim domains. Structure. 2009;17(10):1282–1294. - PMC - PubMed
    1. Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003;31(13):3370–3374. - PMC - PubMed
    1. Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J. On the origin and highly likely completeness of single-domain protein structures. Proc Natl Acad Sci U S A. 2006;103(8):2605–2610. - PMC - PubMed

Publication types

LinkOut - more resources

-