Better prediction of protein contact number using a support vector regression analysis of amino acid sequence
- PMID: 16221309
- PMCID: PMC1277819
- DOI: 10.1186/1471-2105-6-248
Better prediction of protein contact number using a support vector regression analysis of amino acid sequence
Abstract
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of Cbeta atoms in other residues within a sphere around the Cbeta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence.
Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds.
Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties.
Figures
![Figure 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/1277819/bin/1471-2105-6-248-1.gif)
![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/1277819/bin/1471-2105-6-248-2.gif)
![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/1277819/bin/1471-2105-6-248-3.gif)
![Figure 4](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/1277819/bin/1471-2105-6-248-4.gif)
![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/1277819/bin/1471-2105-6-248-5.gif)
![Figure 6](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/1277819/bin/1471-2105-6-248-6.gif)
Similar articles
-
HSEpred: predict half-sphere exposure from protein sequences.Bioinformatics. 2008 Jul 1;24(13):1489-97. doi: 10.1093/bioinformatics/btn222. Epub 2008 May 8. Bioinformatics. 2008. PMID: 18467349
-
Predicting residue-wise contact orders in proteins by support vector regression.BMC Bioinformatics. 2006 Oct 3;7:425. doi: 10.1186/1471-2105-7-425. BMC Bioinformatics. 2006. PMID: 17014735 Free PMC article.
-
Large-scale prediction of protein structure and function from sequence.Curr Pharm Des. 2006;12(17):2067-86. doi: 10.2174/138161206777585238. Curr Pharm Des. 2006. PMID: 16796556 Review.
-
The family feud: do proteins with similar structures fold via the same pathway?Curr Opin Struct Biol. 2005 Feb;15(1):42-9. doi: 10.1016/j.sbi.2005.01.011. Curr Opin Struct Biol. 2005. PMID: 15718132 Review.
-
Predicting absolute contact numbers of native protein structure from amino acid sequence.Proteins. 2005 Jan 1;58(1):158-65. doi: 10.1002/prot.20300. Proteins. 2005. PMID: 15523668
Cited by
-
Prediction of protein-protein interaction sites in intrinsically disordered proteins.Front Mol Biosci. 2022 Sep 30;9:985022. doi: 10.3389/fmolb.2022.985022. eCollection 2022. Front Mol Biosci. 2022. PMID: 36250006 Free PMC article. Review.
-
Deep learning methods in protein structure prediction.Comput Struct Biotechnol J. 2020 Jan 22;18:1301-1310. doi: 10.1016/j.csbj.2019.12.011. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32612753 Free PMC article. Review.
-
Predicting protein inter-residue contacts using composite likelihood maximization and deep learning.BMC Bioinformatics. 2019 Oct 29;20(1):537. doi: 10.1186/s12859-019-3051-7. BMC Bioinformatics. 2019. PMID: 31664895 Free PMC article.
-
A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction.BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):569. doi: 10.1186/s12859-017-1971-7. BMC Bioinformatics. 2017. PMID: 29297299 Free PMC article.
-
3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures.Molecules. 2017 Dec 15;22(12):2230. doi: 10.3390/molecules22122230. Molecules. 2017. PMID: 29244774 Free PMC article.
References
-
- Pollastri G, Baldi P, Fariselli P, Casadio R. Improved prediction of the number of residue contacts in proteins by recurrent neural networks. Bioinformatics. 2001;17 Suppl 1:S234–42. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials