Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Aug;85(2):1145-64.
doi: 10.1016/S0006-3495(03)74551-2.

TOUCHSTONE II: a new approach to ab initio protein structure prediction

Affiliations
Comparative Study

TOUCHSTONE II: a new approach to ab initio protein structure prediction

Yang Zhang et al. Biophys J. 2003 Aug.

Abstract

We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting C(alpha) atoms, with attached C(beta) atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 approximately 120 residues) with predicted structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 approximately 174 residues) with structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Schematic representation of a three-residue fragment of polypeptide chain in the CABS model. The Cα trace is confined to the underlying cubic lattice system, whereas the Cβ atom and side-group rotamers are off-latticed and specified by the positions of three adjacent Cα atoms.
FIGURE 2
FIGURE 2
Schematic illustration of the virtual Cα-Cα vectors for regular helical and sheet structures. formula image, formula image, formula image, where ri,i+1 is the Cα-Cα bond vector from vertex i to vertex i + 1. As demonstrated in the first two terms of Eq. 2, for both helical and sheet structures, li and li+4 are oriented in parallel whereas ui and ui+2 are either antiparallel (helix) or parallel (sheet).
FIGURE 3
FIGURE 3
Energy versus RMSD of decoys to native structure of protein 1cis_. (a) Decoys generated by Monte Carlo simulations of the SICHO model, energies of decoys are evaluated by the SICHO force field. (b) The same decoys as in a but the energies are evaluated by the CABS force field. (c) The decoys generated by Monte Carlo simulations of the CABS model, energies of decoys are evaluated by the SICHO force field. (d) The same decoys as in c but the energies are evaluated by the CABS force field. (e) A schematic illustration of landscape of the SICHO and CABS models. Due to differences in potential energy functions, the important regions of phase space in the two simulations do not match, and the lowest energy state may be nonnative.
FIGURE 4
FIGURE 4
The energy versus RMSD for the decoy structures of 1fas_ produced by the CABS model. (a) Correlations of 19 subenergy terms with the RMSD to native. (b) Combined energy with wi = 1. (c) Combined energy with optimized weight parameters.
FIGURE 5
FIGURE 5
Schematic diagrams of the movements employed in the Monte Carlo simulations. The Cα-traces before and after movements are denoted by the solid and dashed lines, respectively. (a) A basic prefabricated 3-bond update of the fragment [i, i + 3] in the simulations. (b) A 5-bond update of the fragment [i, i + 5] consists of two consecutive 3-bond movements. The first 3-bond movement updates the interval of [i, i + 3], and the second 3-bond movement updates the piece of [i + 2, i + 5]. (c) An 8-bond translation of the fragment in [i, i + 8] over a small distance l. (d) A permutation of a 3-bond piece of [i, i + 3] and a 2-bond piece of [j, j + 2]. The thin arrows denote the shift orientation of the amino acid sequence. (e) Examples of random walks from i to the N-terminus or from j to the C-terminus.
FIGURE 6
FIGURE 6
(a) RMSD of the best cluster in the top five clusters versus protein length N in the CABS simulations without using protein-specific restraints. The solid circles denote the training proteins that are used in the optimization of force field. The open circles are the test proteins. All the successful fold cases are small proteins with N < 120 amino acids. (b) RMSD of the best cluster in top five clusters versus protein length N in the CABS simulations with threading-based restraints. The large proteins (>120 residues) can be folded only when appropriate restraints are incorporated in the simulations.
FIGURE 7
FIGURE 7
RMSD improvement on including the threading-based tertiary and secondary restraints versus the accuracy of the restraints. formula image, where RMSDwo and RMSDw are the RMSD of the best clusters to native structures in the simulations without and with using the threading-based restraints. N is the number of the amino acids of proteins, Ncc the number of correct contact restraints, Ncp the number of total predicted contact restraints, Ndc the number of correct short-range distant restraints, and Ndp the number of total predicted distant restraints.
FIGURE 8
FIGURE 8
Comparison of the folding results by the SICHO and CABS models on the 60-nonhomologous-protein set. The shown data are the number of proteins that have their best cluster below a given RMSD threshold versus the RMSD threshold.
FIGURE 9
FIGURE 9
RMSD to native of all cluster centroids for 125 proteins versus the normalized structure density. The solid circles denote the best clusters of lowest RMSD to native in each of the 125 proteins.
FIGURE 10
FIGURE 10
RMSD of the best cluster to native versus different funneling parameters of the energy landscape. (a) The maximum structure density Dmax. (b) The maximum multiplicity Rmax. (c) L-score of energy landscape (defined in Eq. 23).
FIGURE 11
FIGURE 11
(a) Rate of successful fold (best RMSD < 6.5 Å) versus the cutoff of maximum density (Dmax > Dcut). (b) Average RMSD versus the cutoff of maximum density.

Similar articles

Cited by

References

    1. Anfinsen, C. B. 1973. Principles that govern the folding of protein chains. Science. 181:223–230. - PubMed
    1. Baker, D. 2000. A surprising simplicity to protein folding. Nature. 405:39–42. - PubMed
    1. Benner, S. A., and D. Gerloff. 1991. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv. Enzyme Regul. 31:121–181. - PubMed
    1. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. 2000. The protein data bank. Nucleic Acids Res. 28:235–242. - PMC - PubMed
    1. Betancourt, M. R., and J. Skolnick. 2001. Finding the needle in a haystack: educing native folds from ambiguous ab initial protein structure predictions. J. Comput. Chem. 22:339–353.

LinkOut - more resources

-