Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jan 28:9:62.
doi: 10.1186/1471-2105-9-62.

VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens

Affiliations
Comparative Study

VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens

Aarti Garg et al. BMC Bioinformatics. .

Abstract

Background: Prediction of bacterial virulent protein sequences has implications for identification and characterization of novel virulence-associated factors, finding novel drug/vaccine targets against proteins indispensable to pathogenicity, and understanding the complex virulence mechanism in pathogens.

Results: In the present study we propose a bacterial virulent protein prediction method based on bi-layer cascade Support Vector Machine (SVM). The first layer SVM classifiers were trained and optimized with different individual protein sequence features like amino acid composition, dipeptide composition (occurrences of the possible pairs of ith and i+1th amino acid residues), higher order dipeptide composition (pairs of ith and i+2nd residues) and Position Specific Iterated BLAST (PSI-BLAST) generated Position Specific Scoring Matrices (PSSM). In addition, a similarity-search based module was also developed using a dataset of virulent and non-virulent proteins as BLAST database. A five-fold cross-validation technique was used for the evaluation of various prediction strategies in this study. The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier. The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.

Conclusion: VirulentPred is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. VirulentPred is available as a freely accessible World Wide Web server - VirulentPred, at http://bioinfo.icgeb.res.in/virulent/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
VirulentPred web server. The bi-layer Cascade SVM is used as default method for VirulentPred predictions (at the default threshold value of 0.0) as it was found to be most accurate after evaluation of different SVM classifiers developed in the study.
Figure 2
Figure 2
VirulentPred predictions in the various ranges of SVM scores. For generating the plot, 367 sequences of both the independent datasets (consisting of 181 virulent and 186 non-virulent proteins) were classified using VirulentPred. The number of false positive prediction was very high at a threshold SVM scores in the range 0.8 to 1.1. Hence, to reduce false positive prediction, a stringent criterion of threshold value of ≥1 was used for the annotation of complete proteomes of pathogens.
Figure 3
Figure 3
VirulentPred predictions for different proteomes. The plot depicts the number of proteins predicted to be virulent (at a higher threshold value, ≥1) in proteomes of 7 different bacteria.
Figure 4
Figure 4
Conversion of PSSM into training vectors. The steps used to convert PSSM profiles generated by PSI-BLAST into a training vector of 400 dimensions.
Figure 5
Figure 5
Schema of the bi-layer cascade SVM module. The SVM classifier was the most efficient classifier developed in the study.

Similar articles

Cited by

References

    1. Weiss RA. Virulence and pathogenesis. Trends Microbiol. 2002;10:314–317. doi: 10.1016/S0966-842X(02)02391-0. - DOI - PubMed
    1. Hastings IM, Paget-McNicol S, Saul A. Can mutation and selection explain virulence in human P. falciparum infections? Mal J. 2004;2:3. - PMC - PubMed
    1. Brogden KA, Roth JA, Stanton TB, Bolin CA, Minion FC, Wannemuehler MJ. Virulence Mechanisms of Bacterial Pathogens. 3. ASM Press, Washington DC; 2000.
    1. Morens DM, Folkers GK, Fauci AS. The challenge of emerging and re-emerging infectious diseases. Nature. 2004;430:242–249. doi: 10.1038/nature02759. - DOI - PMC - PubMed
    1. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, McKenney K, Sutton GG, FitzHugh W, Fields CA, Gocayne JD, Scott JD, Shirley R, Liu LI, Glodek A, Kelley JM, Weidman JF, Phillips CA, Spriggs T, Hedblom E, Cotton MD, Utterback TR, Hanna MC, Nguyen DT, Saudek DM, Brandon RC, Fine LD, Fritchman JL, Fuhrmann JL, Geoghagen NSM, Gnehm CL, McDonald LA, Small KV, Fraser CM, Smith HO, Venter JC. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269:496–512. doi: 10.1126/science.7542800. - DOI - PubMed

Publication types

MeSH terms

-