Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan 23:9:40.
doi: 10.1186/1471-2105-9-40.

I-TASSER server for protein 3D structure prediction

Affiliations

I-TASSER server for protein 3D structure prediction

Yang Zhang. BMC Bioinformatics. .

Abstract

Background: Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP) experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions.

Results: An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score) based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1]) of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 A for RMSD.

Conclusion: The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/I-TASSER.

PubMed Disclaimer

Figures

Figure 1
Figure 1
TM-score (a) and RMSD (b) versus C-score of the I-TASSER models for 500 testing proteins. The dashed curve in (a) is from Equation 3 which is fit from the 300 training proteins and used for estimating the TM-score of the I-TASSER models. The solid circles are the root mean squared deviation from the estimated TM-score values (RMSTD). The solid curve is from Equation 4 which is fit from the 300 training proteins. The dotted lines are the TM-score and C-score cutoffs for correct folds.
Figure 2
Figure 2
Two examples of the I-TASSER models from 1ca4A and 1cmaA. Both models have similar RMSD values but indicate significantly different modeling qualities. In the superposition, the thin backbones are the native structure and thick backbones the I-TASSER models. Blue to red runs from N- to C-terminal.
Figure 3
Figure 3
TM-score (a) and RMSD (b) of the I-TASSER models versus the length of target proteins. The numbers indicate the Pearson correlation coefficients.
Figure 4
Figure 4
RMSD versus C-score-ln(L) of the I-TASSER models for 500 test proteins (open circles). The dashed curve is from Equation 5 which is fit from the 300 training proteins and used for estimating RMSD of the I-TASSER models. The solid circles are the root mean squared RMSD deviation (RMSRD) from the estimated RMSD values. The solid curve is from Equation 6 which is fit from the 300 training proteins.

Similar articles

Cited by

References

    1. Murzin AG, Bateman A. CASP2 knowledge-based approach to distant homology recognition and fold prediction in CASP4. Proteins. 2001;Suppl 5:76–85. doi: 10.1002/prot.10037. - DOI - PubMed
    1. Ginalski K, Rychlewski L. Protein structure prediction of CASP5 comparative modeling and fold recognition targets using consensus alignment approach and 3D assessment. Proteins. 2003;53 Suppl 6:410–417. doi: 10.1002/prot.10548. - DOI - PubMed
    1. Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294:93–96. doi: 10.1126/science.1065659. - DOI - PubMed
    1. Skolnick J, Fetrow JS, Kolinski A. Structural genomics and its importance for gene function analysis. Nat Biotechnol. 2000;18:283–287. doi: 10.1038/73723. - DOI - PubMed
    1. Zemla A, Venclovas C, Moult J, Fidelis K. Processing and analysis of CASP3 protein structure predictions. Proteins. 1999;Suppl 3:22–29. doi: 10.1002/(SICI)1097-0134(1999)37:3+<22::AID-PROT5>3.0.CO;2-W. - DOI - PubMed

Publication types

LinkOut - more resources

-