Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Feb 15:6:31.
doi: 10.1186/1471-2105-6-31.

Automated generation of heuristics for biological sequence comparison

Affiliations

Automated generation of heuristics for biological sequence comparison

Guy St C Slater et al. BMC Bioinformatics. .

Abstract

Background: Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to implement. We introduce bounded sparse dynamic programming (BSDP) to allow rapid approximation to exhaustive alignment. This is used within a framework whereby the alignment algorithms are described in terms of their underlying model, to allow automated development of efficient heuristic implementations which may be applied to a general set of sequence comparison problems.

Results: The speed and accuracy of this approach compares favourably with existing methods. Examples of its use in the context of genome annotation are given.

Conclusions: This system allows rapid implementation of heuristics approximating to many complex alignment models, and has been incorporated into the freely available sequence alignment program, exonerate.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of joining two HSPs. A single point on each HSP is chosen to be be included in the alignment. This is the centre of the HSP according to the scores, such that equivalenced bases on either side contribute to half of the HSP score. As shown in this example, if the centre of the HSPs according to their length had been chosen instead, it would not have been possible to join them to form an alignment.
Figure 2
Figure 2
The five types of sub-alignment region around HSPs in which dynamic programming may be applied. The start of an HSP (A), handling small gaps by joining two adjacent ends of HSPs (B), handling large gaps by joining two distant HSPs via a span (C,D) and the end of an HSP (E). Other features, such as splice sites are incorporated into the alignment within the SARs.
Figure 3
Figure 3
A toy example of bounded sparse dynamic programming. In this example the bounds indicate that if the overall alignment threshold is greater than 280, no sub-alignment DP is required. Otherwise, the region between A and B is confirmed first, and the bounds obviate DP between A and C. For clarity, terminal bounds are not included in this example.
Figure 4
Figure 4
Models for alignment of an EST to genomic sequnce. The exhaustive model for aligning ESTs to genome DNA, showing portal states and span states is shown in Figure 4 (a). The portal states are match forward and match reverse, and these share a set of HSPs. The span states are intron forward and intron reverse, and may accommodate large gaps. These portal and span states are represented in the heuristic model shown in Figure 4 (b), with each transition represented by a different derived model, labelled A to E, corresponding to the SAR types show in Figure 2.
Figure 5
Figure 5
Example of generation of a derived model. The original model on the left (for Smith-Waterman-Gotoh) is used to generate the derived model on the right for joining two adjacent HSPs. Extra transitions are allowed from the START state to the INSERT and DELETE states and onto the END state, as the derived model must allow gaps to open directly from the HSPs (in contrast to Smith-Waterman-Gotoh where an alignment cannot start with a gap). Additionally, the original model is local, but in this case the derived model is global, as it must start and end on the HSP.
Figure 6
Figure 6
Example of BSDP during alignment of a protein sequence to genomic DNA. Figure 6 (a) shows 912 candidate SARs positioned on the HSPs. Figure 6 (b) shows the 109 SARs to which DP is applied during BSDP, and Figure 6 (c) shows the 18 SARs which appear in the final alignment.

Similar articles

Cited by

References

    1. Box GE. Robustness in the Strategy of Scientific Model Building. In: Launer R, Wilkinson G, editor. Robustness in Statistics. Academic Press New York; 1979.
    1. Smith T, Waterman M. Identification of Common Molecular Subsequences. Journal of Molecular Biology. 1981;147:195–197. - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. Journal of Molecular Biology. 1990;215:403–410. - PubMed
    1. Searls DB, Murphy KP. Proceedings of the Third International Conference On Intelligent Systems for Molecular Biology. The AAAI Press; 1995. Automata-Theoretic Models of Mutation and Alignment; pp. 341–349. - PubMed
    1. Searls DB. Sequence alignment through pictures. Trends in Genetics. 1996;12:35–37. - PubMed

LinkOut - more resources

-