PRC, the Profile Comparer

&nbsp by Martin Madera of the Gough group at the University of Bristol.



DESCRIPTION

PRC is a stand-alone program for aligning and scoring two profile hidden Markov models. This can be used to detect remote relationships between profiles more effectively than by doing simple profile-sequence comparisons. PRC takes into account all transition and emission probabilities in both hidden Markov models. The fundamental algorithm is symmetric, so prc HMM1 HMM2 is basically the same as prc HMM2 HMM1, although the reverse null and to a much larger extent the E-value fitting process are currently asymmetric.

PRC can read SAM, HMMER, PSI-BLAST and FASTA files. If you have a multiple sequence alignment and want to estimate a profile HMM from it, I recommend the SAM package, in particular the script w0.5.

PRC is available under the GNU General Public Lincence.



PRC UPDATES

If you would like to receive a short email every time there is a release of PRC, please send me an email to martin.madera@gmail.com and I will include you on my list.



DOCUMENTATION

Starting with release 1.5.0, there is a short README file that describes the command line options. If you are interested in the internals of PRC, have a look at the draft of my thesis chapter that deals with PRC.



RELEASES

Version 1.5.6 - 7th July 2009

This release fixes an important memory leak bug in handling of SAM binary models that was spotted by Alejandro Ochoa.

Ability to handle SAM binary models was added in version 1.5.5. The bug does not affect correctness of output, but may cause PRC to crash without generating a scores file for runs on large libraries of SAM binary models.

Version 1.5.5 - 27th August 2008

Added merge_aligns.pl, a script for combining three alignments, HMM1-HMM2 (created using PRC), HMM1-seq1 and HMM2-seq2 (created e.g. using SAM) into seq1-seq2, a pairwise sequence alignment.

PRC can now read binary SAM models. If you have a library of binary SAM models, there is no need to convert to the PRC format.

Version 1.5.3 - 16th September 2005

Tweaked the HMMER parser to fix problems with TIGRFAM models.

Version 1.5.2 - 28th February 2005

A number of minor bugfixes and improvements. I also added a draft of a chapter from my PhD thesis that describes the internals of PRC.

Bugfixes: the two main bugs fixed in this release were in the E-value estimation procedure (which occasionally resulted in a near-infinite loop and nonsensical results) and in the FASTA parser.

On the improvement side, I changed the default scoring function for aligning two match states to dot2. There does not seem to be much difference between dot1 and dot2 in terms of performance, but I like the theoretical properties of dot2. I also tweaked the reverse null model and slightly modified the E-value distribution function. Finally, I changed the default configuration from SPACE9 to SPACE5 (see the thesis chapter). SPACE5 is faster and there does not seem to be any drop in performance.

Version 1.5.0 - 19th October 2004

This release fixes a number of bugs and adds E-values and multiple hits.

The two most important bugs were in the HMMER parser and in the backward routine. The HMMER bug was silly -- I wasn't parsing the header correctly -- and should now be fixed.

The backward bug was more serious: some of the forward/backward alignments (also known as EM, Holmes/Durbin or Maximum Accuracy alignments) generated by PRC 1.4.0 are likely to be wrong. Please upgrade to 1.5.0 at the earliest opportunity.

On a more pleasant note, I have finally implemented E-values for local-local comparisons and added code for reporting multiple hits between a pair of models. There is even some rudimentary documentation in the form of the README file.

Version 1.4.0 - 28th July 2004

Added HMMER and PSI-BLAST parsers. Please note that the PSI-BLAST "checkpoint files" are generally platform-dependent, as are PRC binary models.

Following Bob Edgar's suggestions, I have expanded the set of allowed transitions to basically all transitions, so DI->ID etc. are now all allowed. I still don't allow insert-insert alignments, because they're nasty, but that's now the only exception.

Fixed a considerable number of bugs in the handling of global and local alignments (basically rewrote the code), and changed the format in which alignments are reported to a more sensible one (credits: pointed out by Julian Gough).

Added a logspace implementation of all algorithms. The idea is that algorithm works in linear space (which is quicker) until it sees an under/overflow, in which case it redoes the calculation in logspace. It seems to work but there could still be some glitches.

Version 1.3.1 - 13th January 2003

Added EM/forward-backward scoring and implemented the Holmes/Durbin optimal alignment accuracy algorithm. Added an option that prints out alignments. Can now do global-global, global-local and local-global scoring as well as local-local, but in all cases still only reports the top hit. Switches from floats to doubles internally when it is about to overflow.

Version 1.3.0 - 7th December 2002

The first public release. Can read profile HMMs in the SAM format and assign them a log-odds score. The score is calculated using the Viterbi algorithm with local-local scoring. No alignments are provided.



FEATURES YET TO BE IMPLEMENTED

  • Use of information from secondary structure predictions

  • Linear memory code

  • More documentation



DOWNLOAD

Click here for a directory that includes the source tarballs as well as Linux i686, x86_64 and Mac OS X binaries. PRC is distributed under the GNU General Public Licence.

convert_to_prc is a small program that converts from other formats to an internal PRC binary format, which is faster to read. Starting with release 1.5.0, it can also dump any PSI-BLAST, HMMER and SAM files in a human-readable format.

merge_aligns.pl is a Perl script for combining three alignments, HMM1-HMM2 (created using PRC), HMM1-seq1 and HMM2-seq2 (created e.g. using SAM) into seq1-seq2, a pairwise sequence alignment.

The current version is 1.5.5. If you are using 1.4.0, you should upgrade because of two quite serious bugs. If you are still using 1.3.x, you should DEFINITELY upgrade. If you are using 1.5.x and are happy with it, no need to upgrade.



RELATED LINKS

  • The UC Santa Cruz profile HMM package SAM, and our SUPERFAMILY library of profile HMMs that uses it

  • Sean Eddy's HMMER, and the PFAM library of profile HMMs that uses it

  • The NCBI PSI-BLAST, currently the most commonly used profile method




This page is maintained by Martin Madera. If you have any questions or suggestions, please feel free to email me.
-