Abstract

Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001).

It is clear that models of nucleotide substitution (henceforth models of evolution) play a significant role in molecular phylogenetics, particularly in the context of distance, maximum likelihood (ML), and Bayesian estimation. We know that the use of one or other model affects many, if not all, stages of phylogenetic inference. For example, estimates of phylogeny, substitution rates, bootstrap values, posterior probabilities, or tests of the molecular clock are clearly influenced by the model of evolution used in the analysis (Buckley, 2002; Buckley and Cunningham, 2002; Buckley et al., 2001; Kelsey et al., 1999; Pupko et al., 2002; Sullivan and Swofford, 1997, 2001; Suzuki et al., 2002; Tamura, 1994; Yang et al., 1995; Zhang, 1999). We can argue, in general, that phylogenetic methods are less accurate (that is, they recover an incorrect phylogeny more often), or become inconsistent (converging to an incorrect tree with increasing number of characters) when the model of evolution assumed is wrong (Bruno and Halpern, 1999; Felsenstein, 1978; Huelsenbeck and Hillis, 1993; Penny et al., 1994). It is evident that the use of appropriate models is essential if we are to be confident in the results of a phylogenetic analysis, and indeed, several strategies for model choice have been proposed in the context of phylogenetics. We refer the reader to Johnson and Omland (2003), Posada and Crandall (2001b) and Posada (2001) for a detailed introduction, and for an evaluation of the performance of these methods to recover the model generating the data. Computer programs exist that implement these methods (Adachi and Hasegawa, 1996; Posada and Crandall, 1998). Among the available methods for model selection in phylogenetics, hierarchical likelihood ratio tests (hLRTs) are the most popular. However, here we argue that the hLRTs approach is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two allow for assessment of model selection uncertainty and model averaging.

Model Selection

Before proceeding further, it is worth reiterating the fact that any model of evolution we can construct is never going to be the “true model” that generated the data we observe. In other words, the set of models is misspecified. All models are wrong but some are useful (Box, 1976), and model selection is best seen as a way of approximating, rather than identifying, full reality (Burnham and Anderson, 2003, pp. 20–23). Statistical model selection is commonly based on William of Occam's (ca.1320) parsimony principle,1 by which hypotheses should be kept as simple as possible. In statistical terms, this is a trade-off between bias (distance between the average estimate and truth) and variance (spread of the estimates around the truth) (Fig. 1). The idea is that by adding parameters to a model we obtain improvement in fit (see below) to some degree, but at the same time parameter estimates are “worse” because we have less data (i.e., information) per parameter. In addition, the computations typically require more time. So the question is how complex should the model be for a given problem.

The principle of parsimony. Model selection is more or less based on the trade-off between bias and variance versus the number of estimable parameters in the model. The principle of parsimony tells us that as we increase the number of parameters in a model the bias decreases but the variance increases. This principle underlies all model selection approaches.
Figure 1

The principle of parsimony. Model selection is more or less based on the trade-off between bias and variance versus the number of estimable parameters in the model. The principle of parsimony tells us that as we increase the number of parameters in a model the bias decreases but the variance increases. This principle underlies all model selection approaches.

The Likelihood Function

We referred above to the fit of a model to the data, but we have not yet explained how we measure this fit. In most cases, the fit of a model is measured by the likelihood function (see Edwards, 1972; Fisher, 1921), and in phylogenetics (see Felsenstein, 1981a; Goldman, 1990) we define the likelihood (L) as (proportional to) the probability of the data (D) given a model of evolution (M), a vector of K model parameters θ = (θ1, θ2, …,θK), a tree topology (τ), and a vector of S branch lengths, ν = (ν1, ν2, …,νS):
If the goal is to compute the likelihood of a given model, then θ, T, and ν are nuisance parameters—they affect the likelihood calculation but they are not really what we want to infer—and they should somehow be eliminated from the inference. A common strategy to remove nuisance parameters is to assume that they take those values that maximize the overall likelihood, thus reducing the likelihood to a function of the parameters of interest. What is usually done in practice is to estimate a tree (topology and branch lengths) from the data and then—implicitly assuming that this tree is the maximum likelihood tree for every candidate model—calculate maximum likelihood estimates of all model parameters, including the branch lengths, for every model given this tree. In this way we obtain the maximized (log) likelihood under model M:
where ^ means “estimate of” (formula is an estimate of θ). The strategy just described is sometimes called joint estimation. A different strategy to remove nuisance parameters is to assign them prior probabilities and integrate them out to obtain the marginalprobability of the data given only the model, that is, the model likelihood (also called integrative, marginal, or predictive likelihood):

However, this multidimensional integral can be very difficult to compute, and it is typically approximated using computationally intensive techniques like Markov chain Monte Carlo (MCMC) (Gilks et al., 1996; Hastings, 1970; Metropolis et al., 1953). Steel and Penny (2000) and Holder and Lewis (2003) provide an instructive discussion on joint and marginal estimation in the context of phylogenetics.

Hierarchical Likelihood Ratio Tests

The most popular strategy for model selection in phylogenetics are the hierarchical likelihood ratio tests (hLRTs) (Frati et al., 1997; Huelsenbeck and Crandall, 1997; Posada and Crandall, 1998) (Fig. 2). This method usually consists of performing pairwise likelihood ratio tests in a specific sequence until a final model is converged on that cannot be rejected. By means of the LRTs, we compare the maximized log-likelihoods of the null (ℓ0) and the alternative (ℓ1) models, and if the associated P-value is smaller than the predefined threshold (the significance level, usually 0.05), we say that alternative model fits the data significantly better than the null model (i.e., we reject the null model), and vice versa.
Hierarchical likelihood ratio tests (hLRTs). This figure illustrates an arbitrary hierarchy of LRTs for six different models. Within each LRT, the null model is depicted above the alternative model. When the LRT is not significant, the null model (above) is accepted (A), and it becomes the null model of the next LRT. When the LRT is significant, the null model is rejected (R) and the alternative model (below) becomes the null model of the next LRT. There are six possible paths depending on the outcome of the individual LRTs, and each path results in the selection of a different model. JC69: Jukes-Cantor model (Jukes and Cantor, 1969); K80: Kimura 1980 model (Kimura, 1980), also known as K2P; F81: Felsenstein 81 model (Felsenstein, 1981b); HKY85: Hasegawa-Kishino-Yano model (Hasegawa et al., 1985); SYM, symmetrical model (Zharkikh, 1994); GTR: general-time reversible model (Tavaré, 1986), also known as REV.
Figure 2

Hierarchical likelihood ratio tests (hLRTs). This figure illustrates an arbitrary hierarchy of LRTs for six different models. Within each LRT, the null model is depicted above the alternative model. When the LRT is not significant, the null model (above) is accepted (A), and it becomes the null model of the next LRT. When the LRT is significant, the null model is rejected (R) and the alternative model (below) becomes the null model of the next LRT. There are six possible paths depending on the outcome of the individual LRTs, and each path results in the selection of a different model. JC69: Jukes-Cantor model (Jukes and Cantor, 1969); K80: Kimura 1980 model (Kimura, 1980), also known as K2P; F81: Felsenstein 81 model (Felsenstein, 1981b); HKY85: Hasegawa-Kishino-Yano model (Hasegawa et al., 1985); SYM, symmetrical model (Zharkikh, 1994); GTR: general-time reversible model (Tavaré, 1986), also known as REV.

The approximation of this P-value is straightforward for nested models, using a standard or mixed χ2 distribution (Goldman, 1993; Goldman and Whelan, 2000; Kendall and Stuart, 1979; Ota et al., 2000). Two models are nested when one of them, the null model, is a special case of the other, the alternative model. For example, the Jukes-Cantor model (Jukes and Cantor, 1969) (JC69) is nested within the Kimura two-parameter model (Kimura, 1980) (K80), because if we assume that transitions and transversions occur at the same rate (i.e., κ = 1), K80 collapses to JC69. However, obtaining correctP-values for the LRT statistics can be difficult. LRTs implicitly assume that at least one of the models compared is correct, and when the models are misspecified these tests can often be incorrect (Foutz and Srivastava, 1977; Golden, 1995; Kent, 1982). Although proper LRTs can be constructed when models are wrong (Vuong, 1989), standard LRTs in phylogenetics are not robust to model misspecification (Zhang, 1999). When the models are non-nested, the χ2 approximation is not longer valid, and more computationally intensive Monte Carlo methods are needed (Goldman, 1993; Whelan and Goldman, 1999). In addition, when sample size is small the usual asymptotic approximation on which P-values are based no longer applies.

Furthermore, LRTs were designed for hypothesis testing, and although classical hypothesis testing is commonly used as a model selection strategy, it has been argued that hypothesis testing and model selection are distinct issues (Burnham and Anderson, 2003, pp. 132–134). A stepwise procedure like the hLRTs, in which we sequentially decide whether to add (or remove) certain parameters, is analogous to forward and backward selections in best-subset linear regression (Miller, 2002, pp. 39–46), which do not guarantee finding the optimal model. As pointed out by Sanderson and Kim (2000), we can identify several potential problems with the use of hLRTs for model selection in phylogenetics. There exist situations in which an optimal model may not exist for the hLRTs procedure. This kind of situation occurs, for example, if the general time-reversible model (Tavaré, 1986) (GTR) is not significantly better than the Hasegawa et al. model (1985) (HKY85), HKY85 is not significantly better than JC69, but GTR is significantly better than JC69. Even if an optimal model exists, it will be always a function of the significance level, and the outcome of the model choice procedure may vary accordingly. In addition, the hLRTs approach performs multiple tests with the same data, and this will increase the rate of false positives (that is, to reject the null hypothesis when it is true): the probability of falsely rejecting the null hypothesis at least once in n tests is 1−(1−α)n. Although there are statistical procedures to correct for this effect—like the Bonferroni correction (see Hochberg, 1988)—here the tests are nonindependent, and the appropriate adjustment can be very complex (see also Shimodaira, 1998, 2001; Shimodaira and Hasegawa, 1999). The outcome of the hLRTs might also be affected by the starting model (for the hLRTs procedure we need to select a starting point, usually represented by the simplest or the most complex model in the set of candidate models). In addition, there are cases in which the hLRTs will not select the best model, according to its own criteria, among the candidate models.

Indeed, these problems can have an impact on the analysis of real data sets, and we have analyzed a set of HIV sequences (Posada and Crandall, 2001a) for illustrative purposes (Fig. 3) (Pol, in press). In Figure 3a we can see a case in which an optimal model does not exist, as all of the three models are rejected when compared with one of the other two. However, we will select HKY85 as the best fit (because we did not compare HKY85 and GTR). Also, note that increasing the significance level (Fig. 3b) changes the outcome, as GTR now becomes the best fit model. With a different set of candidate models, and if we start with HKY85, the model selected will be HKY85 (Fig. 3c), which is a suboptimal choice, whereas if we start with GTR the model selected will be GTR (Fig. 3d), which is actually the optimal model. We cannot devise a hierarchy of hLRTs that overcomes all these problems at once, but better approaches exist than simply forward and backward selection (Miller, 2002).

Problems of hLRTs with a real data set. See text for further details. The data set analyzed is an alignment of 12 HIV-1 subtype D sequences of a fragment of 1462 nucleotides from the gag region (Posada and Crandall, 2001a). K81uf is the Kimura 1981 model (Kimura, 1981) with unequal base frequencies. TN93 is the Tamura-Nei model (Tamura and Nei, 1993). Solid arrows indicate the outcome of the LRT performed, whereas discontinuous arrows indicate the outcome of a potential LRT not performed. P is the associated P-value of the LRTs. The underlined model is the starting point of the hLRT, the best model according to all LRTs is indicated with an asterisk, and the model selected is enclosed within a square.
Figure 3

Problems of hLRTs with a real data set. See text for further details. The data set analyzed is an alignment of 12 HIV-1 subtype D sequences of a fragment of 1462 nucleotides from the gag region (Posada and Crandall, 2001a). K81uf is the Kimura 1981 model (Kimura, 1981) with unequal base frequencies. TN93 is the Tamura-Nei model (Tamura and Nei, 1993). Solid arrows indicate the outcome of the LRT performed, whereas discontinuous arrows indicate the outcome of a potential LRT not performed. P is the associated P-value of the LRTs. The underlined model is the starting point of the hLRT, the best model according to all LRTs is indicated with an asterisk, and the model selected is enclosed within a square.

Bayesian Model Selection

Model selection is an integral part of Bayesian estimation (Gelfand, 1996; Raftery, 1996; Wasserman, 2000), and within this framework, different strategies exist to accomplish the same tasks.

Bayes Factors

Bayes factors (Kass and Raftery, 1995) are the Bayesian analogue of the LRT (Suchard et al., 2003a). They contrast the evidence provided by the data for two competing models, i and j, as:

Evidence for Mi is considered very strong if Bij > 150, strong if 12 < Bij < 150, positive if 3 < Bij < 12, barely worth mentioning if 1 < Bij < 3, and negative (supports Mj) if Bij < 1 (Raftery, 1996). It is important to note that Bayes factors compare model likelihoods orP(D| M), which are calculated by integrating—not maximizing—over all possible parameter values (except in empirical Bayesian approaches, where maximum likelihood estimates can be used instead). Therefore we should not confound them with the log of the maximized likelihoods (ℓ) used in the LRTs and AIC. Bayes factors are already being used in the context of phylogenetics, for example to infer the occurrence of recombination events (Suchard et al., 2002), to compare different phylogenetic hypothesis (Huelsenbeck and Imennov, 2002; Huelsenbeck et al., 2000; Suchard et al., 2003b) and for model selection (Aris-Brosou and Yang, 2002; Huelsenbeck et al., 2004; Nylander et al., 2004; Suchard et al., 2001).

Posterior Probabilities

When multiple models are considered, the usual Bayesian solution is to choose the model with the highest posterior probability (Kass and Raftery, 1995; Raftery, 1996; Wasserman, 2000). For R models, the posterior probability of the ith model is:

A word is needed about model prior probabilities P(Mi). Although models are commonly assigned equal prior probabilities, in phylogenetics we may have prior beliefs stating that some models are more probable than others. For example, we have enough information about the process of mitochondrial sequence evolution to believe that the JC69 model is less probable in this case than the HKY85 model with a gamma distribution for rates among sites (see Yang, 1996a). Ideally, this information should be reflected in the model priors, and although considerable Bayesian research exists on eliciting prior information (Kadane and Wolfson, 1998; Madigan et al., 1995), it still seems be very difficult to quantify. Fortunately, if the signal in the data, conveyed through the likelihood, is strong enough, then the prior distributions should not have a large influence on the posterior distribution. Indeed, posterior probabilities of trees are already being used to estimate phylogenies (Holder and Lewis, 2003; Huelsenbeck et al., 2001, 2002; Larget and Simon, 1999; Mau and Newton, 1997; Mau et al., 1999; Yang and Rannala, 1997).

When the priors for the parameters in the complex model are very diffuse, Bayesian approaches tend to support the null model in contradiction to significance tests (e.g., LRTs) as sample size increases—the so called Jeffreys-Lindley's paradox (Bartlett, 1957; Jeffreys, 1939; Lindley, 1957; Shafer, 1982). If the diffuseness of these priors arises because of mere ignorance of the values these parameters can take, this conflict highlights a disadvantage of Bayesian approaches, especially in the case of Bayesian Information Criterion (BIC) (see below), which assume flat, improper priors. In any case, Jeffreys-Lindley's paradox illustrates the relevance, for good or for bad, of the priors we choose for the model parameters (Huelsenbeck et al., 2002). Moreover, in some situations Bayesian approaches and standard significance tests can also be irreconcilable when testing point (or sharp) null hypotheses, for example, H0: ti/tv = 0.5 versus H1: ti/tv ≠ 0.5 (Berger and Sellke, 1987) (ti/tv is the transition/transversion ratio).

Bayesian Information Criterion

In order to calculate model likelihoods, Bayesian methods often require computationally intensive techniques like Markov chain Monte Carlo (Gilks et al., 1996; Hastings, 1970; Metropolis et al., 1953). Although easy to implement, Bayes factor calculations do exist for some nested models via the Savage-Dickey ratio (Suchard et al., 2001; Verdinelli and Wasserman, 1995). However, there is a computationally more tractable approach, the Bayesian Information Criterion (BIC) (Schwarz, 1978):
where K is the number of estimable parameters, and n is the sample size (for now we assume that n can be approximated by the total number of characters in the alignment). The BIC was developed as an approximation to the log marginal likelihood of a model, and therefore, the difference between two BIC estimates may be a good approximation to the natural log of the Bayes factor (Kass and Wasserman, 1995). Given equal priors for all competing models, choosing the model with the smallest BIC is equivalent to selecting the model with the maximum posterior probability. The BIC assumes that the (parameters) prior is the unit information prior (i.e., a multivariate normal prior with mean at the maximum likelihood estimate and variance equal to the expected information matrix for one observation) (Kass and Wasserman, 1995), which can be thought of as a prior distribution that contains the same amount of information as a single, typical observation. This prior is quite diffuse, so the BIC tends to select models that are less complex than Bayes factors (for discussion see Raftery, 1999; Weakliem, 1999), and if n > 8, the BIC selects simpler models than the AIC (Forster and Sober, 2004). However, Burnham and Anderson (2003, pp. 302–305) suggest that the BIC can be used more generally with any prior.

A collection of BIC statistics contains the same information as a collection of pairwise Bayes factors. However, when choosing among several models, the BIC statistics are easier to interpret by visual inspection, as they allow for the simultaneous comparison of multiple models, so the best-fit models can be immediately identified. On the other hand, selecting the best-fit model from a collection of multiple pairwise Bayes factors could be more burdensome, and such procedure might suffer from some of the problems described above for the hLRTs. Nevertheless, the BIC approximation might not be appropriate when the posterior mode occurs at the boundary of the parameter space (Hsiao, 1997; Ota et al., 2000).

Decision Theoretic Approaches

Recently, Minin et al. (2003) applied decision theory (Bernardo and Smith, 1994) to develop a novel model selection strategy (the DT method) that extends the BIC. Minin et al. (2003) argue that there is no guarantee that the best-fit models will produce the best estimates of phylogeny, and therefore propose a model selection method that incorporates some measure of phylogenetic performance. They assess models through a penalty or loss function, related to how dissimilar the branch length estimates are across models, and pick the model with the minimum posterior loss. As expected, simulations suggested that models selected with this criterion result in slightly more accurate branch length estimates than those obtained under models selected by the hLRTs.

Model Selection Uncertainty

Once we have selected a model it is very important that we are able to assess how confident we are in that selection (see Chatfield, 1995). We would like to be able to rank the models and to know whether the model selected is much better than the other candidate models. At the same time, we should be interested to learn whether we would select the same model if several other independent samples were available. The assessment of model selection uncertainty has a long tradition within the Bayesian community and posterior probabilities can be naturally used to take account of model uncertainty (Kass and Raftery, 1995; Madigan and Raftery, 1994). For example, models can be ranked according to their posterior probabilities and 95% credible intervals (Occam's Window) can easily be constructed by summing these probabilities (Madigan and Raftery, 1994). Although computing posterior probabilities can be hard and time consuming, in theory we could approximate those probabilities with the BIC. Furthermore, we could also use the BIC values or posterior risks of the DT method (Minin et al., 2003) in the same way that we use the AIC below above to assess model selection uncertainty, although this could be considered ad hoc (see Hoeting et al., 1999).

Model Averaging

Although in general model selection is concerned with the selection of just the best fit model, Bayesian approaches allow us to make inferences based on the entire set of candidate models, or model averaging (Hoeting et al., 1999; Madigan and Raftery, 1994; Raftery, 1996; Wasserman, 2000). Indeed, obtaining model averaged phylogenetic estimates is straightforward (Posada, 2003). If we consider, for example, G models that include the gamma distribution for rate variation among sites (Yang, 1996a), the overall posterior mean of the shape of the gamma distribution (α) would be:
where formula is the estimate of α for model i.
Because not all parameters have the same interpretation across models, we should be careful when calculating and interpreting model-averaged parameter estimates. For example, the gamma shape parameter describing among-site rate variation has a different interpretation depending on whether the model also includes a proportion of invariable sites, because in such a case only the rates at variable sites, and not at all sites, are gamma-distributed. To facilitate a correct interpretation we could obtain two separate model-averaged estimates of the gamma shape parameter, one from models that include a proportion of invariable sites, and another from models that do not include a proportion of invariable sites. Moreover, from the above formulation we can see that it would be easy to estimate the relativeimportance of any parameter by summing the posterior probabilities across all models that included the parameters we are interested in. For example, the relative importance (w+) for the shape of the gamma distribution across all candidate models is simply:
where

We also need to be careful when interpreting the relative importance of parameters. When the number of candidate models is less than the number of possible combinations of parameters, the presence-absence of some pairs of parameters can be correlated, and so their relative importances. In other words, if parameter ɛ actually has a high relative importance, then a second parameter η might yield a high relative importance simply because the presence-absence of parameters ɛ and η among models is positively correlated. For the 56 models in Table 1, the presence of the different base frequencies parameters (π) is completely correlated, whereas the presence of several substitution rates (ϕ) show complete or high levels of correlation. The presence of parameter κ is inversely correlated with that of several substitution rate parameters (e.g., ϕAG). The presence of α, the shape of the gamma distribution for rate variation among sites, or pinv, the proportion of invariable sites, is not correlated with that of any other parameter.

Table 1.

AICc values, AICc differences (Δ), and Akaike weights (w) for the carabid beetles Ohomopterus mitochondrial DNA data set from Sota and Vogler (2001). Because branch lengths were estimated for each candidate model, the number of branches was included in the penalty parameter K (= number of parameters). ℓ are the maximized log likelihoods and Cum(w) are the cumulative Akaike weights

ModelKAICcΔ AICcwCum(w)
TN93+I+Γ5441.46007811045.58880.00000.52210.5221
TIM+I+Γ5441.37657911047.59652.00770.19130.7134
HKY85+I+Γ5443.67297711047.84222.25340.16920.8826
K81uf+I+Γ5443.55667811049.78214.19340.06410.9468
GTR+I+Γ5440.91508111051.03015.44130.03440.9811
TVM+I+Γ5442.73938011052.49916.91030.01650.9976
TN93+Γ5448.67927711057.854912.26610.00110.9988
HKY85+Γ5450.50687611059.340213.75140.00050.9993
TIM+Γ5448.65777811059.984314.39550.00040.9997
K81uf+Γ5450.48837711061.473015.88430.00020.9999
GTR+Γ5448.02988011063.080217.49140.00011.0000
TVM+Γ5449.66857911064.180418.59170.00001.0000
TN93+I5470.75687711102.010256.42140.00001.0000
TIM+I5470.74177811104.152258.56350.00001.0000
GTR+I5470.34528011107.711062.12230.00001.0000
HKY85+I5476.84967611112.025766.43700.00001.0000
K81uf+I5476.82087711114.138168.54930.00001.0000
TVM+I5476.16507911117.173671.58490.00001.0000
F81+I+Γ5769.11187611696.5501650.96140.00001.0000
F81+Γ5782.05667511720.2721674.68340.00001.0000
F81+I5807.49277511771.1442725.55540.00001.0000
GTR5805.05767911774.9588729.37000.00001.0000
TVM5808.47277811779.6141734.02540.00001.0000
TIM5810.41027711781.3168735.72800.00001.0000
TN935813.47807611785.2825739.69380.00001.0000
K81uf5813.51907611785.3646739.77580.00001.0000
HKY855816.58947511789.3375743.74880.00001.0000
SYM+I+Γ5861.08597811884.8407839.25200.00001.0000
TVMef+I+Γ5867.61287711895.7221850.13330.00001.0000
SYM+Γ5876.78037711914.0570868.46830.00001.0000
TVMef+Γ5884.42727611927.1810881.59220.00001.0000
TIMef+I+Γ5885.06847611928.4632882.87450.00001.0000
K81+I+Γ5893.76427511943.6872898.09840.00001.0000
TN93ef+I+Γ5897.75297511951.6647906.07590.00001.0000
TIMef+Γ5899.25887511954.6764909.08770.00001.0000
K80+I+Γ5906.23297411966.4593920.87060.00001.0000
K81+Γ5908.78767411971.5687925.98000.00001.0000
TN93ef+Γ5911.56597411977.1254931.53660.00001.0000
SYM+I5908.70217711977.9008932.31200.00001.0000
TVMef+I5917.61287611993.5521947.96330.00001.0000
K80+Γ5920.90387311993.6382948.04940.00001.0000
TIMef+I5928.96297512014.0846968.49590.00001.0000
K81+I5938.01377412030.0209984.43210.00001.0000
TN93ef+I5940.73837412035.4701989.88130.00001.0000
K80+I5949.51867312050.86771005.27890.00001.0000
F816088.22277412330.43881284.85010.00001.0000
JC69+I+Γ6101.26567312354.36181308.77300.00001.0000
JC69+Γ6114.84087212379.35151333.76280.00001.0000
JC69+I6142.17197212434.01371388.42490.00001.0000
SYM6170.89167612500.10971454.52090.00001.0000
TVMef6190.33947512536.83751491.24880.00001.0000
TIMef6194.58067412543.15471497.56590.00001.0000
TN93ef6210.63537312573.10111527.51230.00001.0000
K816214.11527312580.06101534.47230.00001.0000
K806230.21007212610.08981564.50110.00001.0000
JC696411.51617112970.54381924.95510.00001.0000
ModelKAICcΔ AICcwCum(w)
TN93+I+Γ5441.46007811045.58880.00000.52210.5221
TIM+I+Γ5441.37657911047.59652.00770.19130.7134
HKY85+I+Γ5443.67297711047.84222.25340.16920.8826
K81uf+I+Γ5443.55667811049.78214.19340.06410.9468
GTR+I+Γ5440.91508111051.03015.44130.03440.9811
TVM+I+Γ5442.73938011052.49916.91030.01650.9976
TN93+Γ5448.67927711057.854912.26610.00110.9988
HKY85+Γ5450.50687611059.340213.75140.00050.9993
TIM+Γ5448.65777811059.984314.39550.00040.9997
K81uf+Γ5450.48837711061.473015.88430.00020.9999
GTR+Γ5448.02988011063.080217.49140.00011.0000
TVM+Γ5449.66857911064.180418.59170.00001.0000
TN93+I5470.75687711102.010256.42140.00001.0000
TIM+I5470.74177811104.152258.56350.00001.0000
GTR+I5470.34528011107.711062.12230.00001.0000
HKY85+I5476.84967611112.025766.43700.00001.0000
K81uf+I5476.82087711114.138168.54930.00001.0000
TVM+I5476.16507911117.173671.58490.00001.0000
F81+I+Γ5769.11187611696.5501650.96140.00001.0000
F81+Γ5782.05667511720.2721674.68340.00001.0000
F81+I5807.49277511771.1442725.55540.00001.0000
GTR5805.05767911774.9588729.37000.00001.0000
TVM5808.47277811779.6141734.02540.00001.0000
TIM5810.41027711781.3168735.72800.00001.0000
TN935813.47807611785.2825739.69380.00001.0000
K81uf5813.51907611785.3646739.77580.00001.0000
HKY855816.58947511789.3375743.74880.00001.0000
SYM+I+Γ5861.08597811884.8407839.25200.00001.0000
TVMef+I+Γ5867.61287711895.7221850.13330.00001.0000
SYM+Γ5876.78037711914.0570868.46830.00001.0000
TVMef+Γ5884.42727611927.1810881.59220.00001.0000
TIMef+I+Γ5885.06847611928.4632882.87450.00001.0000
K81+I+Γ5893.76427511943.6872898.09840.00001.0000
TN93ef+I+Γ5897.75297511951.6647906.07590.00001.0000
TIMef+Γ5899.25887511954.6764909.08770.00001.0000
K80+I+Γ5906.23297411966.4593920.87060.00001.0000
K81+Γ5908.78767411971.5687925.98000.00001.0000
TN93ef+Γ5911.56597411977.1254931.53660.00001.0000
SYM+I5908.70217711977.9008932.31200.00001.0000
TVMef+I5917.61287611993.5521947.96330.00001.0000
K80+Γ5920.90387311993.6382948.04940.00001.0000
TIMef+I5928.96297512014.0846968.49590.00001.0000
K81+I5938.01377412030.0209984.43210.00001.0000
TN93ef+I5940.73837412035.4701989.88130.00001.0000
K80+I5949.51867312050.86771005.27890.00001.0000
F816088.22277412330.43881284.85010.00001.0000
JC69+I+Γ6101.26567312354.36181308.77300.00001.0000
JC69+Γ6114.84087212379.35151333.76280.00001.0000
JC69+I6142.17197212434.01371388.42490.00001.0000
SYM6170.89167612500.10971454.52090.00001.0000
TVMef6190.33947512536.83751491.24880.00001.0000
TIMef6194.58067412543.15471497.56590.00001.0000
TN93ef6210.63537312573.10111527.51230.00001.0000
K816214.11527312580.06101534.47230.00001.0000
K806230.21007212610.08981564.50110.00001.0000
JC696411.51617112970.54381924.95510.00001.0000
Table 1.

AICc values, AICc differences (Δ), and Akaike weights (w) for the carabid beetles Ohomopterus mitochondrial DNA data set from Sota and Vogler (2001). Because branch lengths were estimated for each candidate model, the number of branches was included in the penalty parameter K (= number of parameters). ℓ are the maximized log likelihoods and Cum(w) are the cumulative Akaike weights

ModelKAICcΔ AICcwCum(w)
TN93+I+Γ5441.46007811045.58880.00000.52210.5221
TIM+I+Γ5441.37657911047.59652.00770.19130.7134
HKY85+I+Γ5443.67297711047.84222.25340.16920.8826
K81uf+I+Γ5443.55667811049.78214.19340.06410.9468
GTR+I+Γ5440.91508111051.03015.44130.03440.9811
TVM+I+Γ5442.73938011052.49916.91030.01650.9976
TN93+Γ5448.67927711057.854912.26610.00110.9988
HKY85+Γ5450.50687611059.340213.75140.00050.9993
TIM+Γ5448.65777811059.984314.39550.00040.9997
K81uf+Γ5450.48837711061.473015.88430.00020.9999
GTR+Γ5448.02988011063.080217.49140.00011.0000
TVM+Γ5449.66857911064.180418.59170.00001.0000
TN93+I5470.75687711102.010256.42140.00001.0000
TIM+I5470.74177811104.152258.56350.00001.0000
GTR+I5470.34528011107.711062.12230.00001.0000
HKY85+I5476.84967611112.025766.43700.00001.0000
K81uf+I5476.82087711114.138168.54930.00001.0000
TVM+I5476.16507911117.173671.58490.00001.0000
F81+I+Γ5769.11187611696.5501650.96140.00001.0000
F81+Γ5782.05667511720.2721674.68340.00001.0000
F81+I5807.49277511771.1442725.55540.00001.0000
GTR5805.05767911774.9588729.37000.00001.0000
TVM5808.47277811779.6141734.02540.00001.0000
TIM5810.41027711781.3168735.72800.00001.0000
TN935813.47807611785.2825739.69380.00001.0000
K81uf5813.51907611785.3646739.77580.00001.0000
HKY855816.58947511789.3375743.74880.00001.0000
SYM+I+Γ5861.08597811884.8407839.25200.00001.0000
TVMef+I+Γ5867.61287711895.7221850.13330.00001.0000
SYM+Γ5876.78037711914.0570868.46830.00001.0000
TVMef+Γ5884.42727611927.1810881.59220.00001.0000
TIMef+I+Γ5885.06847611928.4632882.87450.00001.0000
K81+I+Γ5893.76427511943.6872898.09840.00001.0000
TN93ef+I+Γ5897.75297511951.6647906.07590.00001.0000
TIMef+Γ5899.25887511954.6764909.08770.00001.0000
K80+I+Γ5906.23297411966.4593920.87060.00001.0000
K81+Γ5908.78767411971.5687925.98000.00001.0000
TN93ef+Γ5911.56597411977.1254931.53660.00001.0000
SYM+I5908.70217711977.9008932.31200.00001.0000
TVMef+I5917.61287611993.5521947.96330.00001.0000
K80+Γ5920.90387311993.6382948.04940.00001.0000
TIMef+I5928.96297512014.0846968.49590.00001.0000
K81+I5938.01377412030.0209984.43210.00001.0000
TN93ef+I5940.73837412035.4701989.88130.00001.0000
K80+I5949.51867312050.86771005.27890.00001.0000
F816088.22277412330.43881284.85010.00001.0000
JC69+I+Γ6101.26567312354.36181308.77300.00001.0000
JC69+Γ6114.84087212379.35151333.76280.00001.0000
JC69+I6142.17197212434.01371388.42490.00001.0000
SYM6170.89167612500.10971454.52090.00001.0000
TVMef6190.33947512536.83751491.24880.00001.0000
TIMef6194.58067412543.15471497.56590.00001.0000
TN93ef6210.63537312573.10111527.51230.00001.0000
K816214.11527312580.06101534.47230.00001.0000
K806230.21007212610.08981564.50110.00001.0000
JC696411.51617112970.54381924.95510.00001.0000
ModelKAICcΔ AICcwCum(w)
TN93+I+Γ5441.46007811045.58880.00000.52210.5221
TIM+I+Γ5441.37657911047.59652.00770.19130.7134
HKY85+I+Γ5443.67297711047.84222.25340.16920.8826
K81uf+I+Γ5443.55667811049.78214.19340.06410.9468
GTR+I+Γ5440.91508111051.03015.44130.03440.9811
TVM+I+Γ5442.73938011052.49916.91030.01650.9976
TN93+Γ5448.67927711057.854912.26610.00110.9988
HKY85+Γ5450.50687611059.340213.75140.00050.9993
TIM+Γ5448.65777811059.984314.39550.00040.9997
K81uf+Γ5450.48837711061.473015.88430.00020.9999
GTR+Γ5448.02988011063.080217.49140.00011.0000
TVM+Γ5449.66857911064.180418.59170.00001.0000
TN93+I5470.75687711102.010256.42140.00001.0000
TIM+I5470.74177811104.152258.56350.00001.0000
GTR+I5470.34528011107.711062.12230.00001.0000
HKY85+I5476.84967611112.025766.43700.00001.0000
K81uf+I5476.82087711114.138168.54930.00001.0000
TVM+I5476.16507911117.173671.58490.00001.0000
F81+I+Γ5769.11187611696.5501650.96140.00001.0000
F81+Γ5782.05667511720.2721674.68340.00001.0000
F81+I5807.49277511771.1442725.55540.00001.0000
GTR5805.05767911774.9588729.37000.00001.0000
TVM5808.47277811779.6141734.02540.00001.0000
TIM5810.41027711781.3168735.72800.00001.0000
TN935813.47807611785.2825739.69380.00001.0000
K81uf5813.51907611785.3646739.77580.00001.0000
HKY855816.58947511789.3375743.74880.00001.0000
SYM+I+Γ5861.08597811884.8407839.25200.00001.0000
TVMef+I+Γ5867.61287711895.7221850.13330.00001.0000
SYM+Γ5876.78037711914.0570868.46830.00001.0000
TVMef+Γ5884.42727611927.1810881.59220.00001.0000
TIMef+I+Γ5885.06847611928.4632882.87450.00001.0000
K81+I+Γ5893.76427511943.6872898.09840.00001.0000
TN93ef+I+Γ5897.75297511951.6647906.07590.00001.0000
TIMef+Γ5899.25887511954.6764909.08770.00001.0000
K80+I+Γ5906.23297411966.4593920.87060.00001.0000
K81+Γ5908.78767411971.5687925.98000.00001.0000
TN93ef+Γ5911.56597411977.1254931.53660.00001.0000
SYM+I5908.70217711977.9008932.31200.00001.0000
TVMef+I5917.61287611993.5521947.96330.00001.0000
K80+Γ5920.90387311993.6382948.04940.00001.0000
TIMef+I5928.96297512014.0846968.49590.00001.0000
K81+I5938.01377412030.0209984.43210.00001.0000
TN93ef+I5940.73837412035.4701989.88130.00001.0000
K80+I5949.51867312050.86771005.27890.00001.0000
F816088.22277412330.43881284.85010.00001.0000
JC69+I+Γ6101.26567312354.36181308.77300.00001.0000
JC69+Γ6114.84087212379.35151333.76280.00001.0000
JC69+I6142.17197212434.01371388.42490.00001.0000
SYM6170.89167612500.10971454.52090.00001.0000
TVMef6190.33947512536.83751491.24880.00001.0000
TIMef6194.58067412543.15471497.56590.00001.0000
TN93ef6210.63537312573.10111527.51230.00001.0000
K816214.11527312580.06101534.47230.00001.0000
K806230.21007212610.08981564.50110.00001.0000
JC696411.51617112970.54381924.95510.00001.0000

Indeed, the averaged parameter could be the topology itself, so we could construct a model-averaged estimate of phylogeny. We will come back to this later.

Akaike Information Criterion

A different approach to model selection is the Akaike Information Criterion (AIC) (Akaike, 1973, 1974; and see Sakamoto et al., 1986). The AIC is an asymptotically unbiased estimator of the expected relative Kullback-Leibler information quantity or distance (K-L) (Kullback and Leibler, 1951), which represents the amount of information lost when we use model g to approximate model f (Fig. 4):
The Kullback-Leibler distance. The K-L distance aims to represent how close a model is to the truth. Here, M2 is the candidate model that best approximates truth and therefore it is the model with the smallest K-L distance. The AIC chooses the candidate model with the smallest expected K-L distance.
Figure 4

The Kullback-Leibler distance. The K-L distance aims to represent how close a model is to the truth. Here, M2 is the candidate model that best approximates truth and therefore it is the model with the smallest K-L distance. The AIC chooses the candidate model with the smallest expected K-L distance.

The AIC for a given model is a function of its maximized log-likelihood (ℓ) and the number of estimable parameters (K):

In the context of phylogenetics we can think of the AIC as the amount of information lost when we use, say HKY85, to approximate the real process of nucleotide substitution. Hence, we prefer the model with the smallest AIC. The second term K includes the parameters from the substitution model, like base frequencies, substitution rates, proportion of invariable sites, or rate variation among sites. If branch lengths are estimated de novo for every model, K should also include the number of branches (for an unrooted bifurcated tree, twice the number of taxa minus three). Although the inclusion of the number of branches, constant for all models, does not change the order of the AIC values, it will change their relative magnitude.

In the AIC, as more parameters are added to the model the first term becomes smaller, representing an increased fit, whereas the second component, or penalty term, becomes larger. Indeed, when the sample is large, the number of adjustable parameters makes a negligible difference, and more complex models will be favored (Forster and Sober, 1994). It is important to note that although the AIC formula appears to be superficially very simple, its derivation is well founded on information theory (de Leeuw, 1992), and the so called “penalty term” 2K is not an arbitrary value (Burnham and Anderson, 2003, pp. 64). When sample size (n) is small compared to the number of parameters (say, n/K < 40) the use of a second-order AIC, AICc (Hurvich and Tsai, 1989; Sugiura, 1978), is recommended:
where sample size is approximated by the total number of characters in the alignment (see below for discussion). Note that in this case the inclusion of branch lengths as estimated parameters can change the order of the AICc values, and therefore, the selected model.
Because the AIC is on a relative scale, it is critical to compute and present the AIC differences (Δ AIC), rather than actual AIC values, over all candidate models (Buckley and Cunningham, 2002; Burnham and Anderson, 2003, pp. 70–72). For the ith model, the AIC difference is:
where min AIC is the smallest AIC value among all candidate models.

The AIC is designed to estimate the predictiveaccuracy of competing hypotheses (Forster, 2002; Sober, 2002b), which is the expected performance of a model when predicting new data. The prediction of new data is a common application in phylogenetics, for example in parametric bootstrapping or simulation studies. It seems that the AIC was first applied in the context of phylogenetics by Hasegawa and collaborators (1990a; 1990b; Kishino and Hasegawa, 1989), and although several phylogenetics programs implement the AIC, like Molphy (Adachi and Hasegawa, 1996) and Modeltest (Posada, 2003; Posada and Crandall, 1998), the use of the AIC is much less common than that of the hLRTs.

The AIC makes several assumptions. First, there is the assumption of “uniformity of nature” (Forster and Sober, 1994), that is, that all data sets (future and past) are drawn from the same underlying process. Second, the AIC assumes that the sample size is large enough to ensure that the likelihood function will approximate its asymptotic properties. Finally the AIC assumes that the true distribution of parameter estimates, when the number of data n is sufficiently large, follows a multivariate normal distribution. In principle, these assumptions (on the other hand, common in statistical phylogenetics) should not be unduly restrictive (Forster and Sober, 1994, 2004), but the implications of potential violations need to be studied. It has been argued that constraining parameters at their boundaries, for example setting the proportion of invariable sites to be zero, might violate the derivation of the AIC (and the BIC) (Ota et al., 2000).

Model Selection Uncertainty with the AIC

The AIC differences allow for an immediate ranking of the candidate models. The larger the AIC difference for a model, the less probable that it is the best K-L model. As a rough rule of thumb, Burnham and Anderson (2003, p. 70) propose that models for which Δi ≤ 2 receive substantial support and are considered when making inferences, models having 4 ≤ Δi ≤ 7 have considerably less support, and models having Δi > 10 receive no support. However, they also warn that these guidelines are not expected to hold when observations are not independent but are assumed so, as is usually the case in phylogenetics.

Akaike (1983) also suggested that the exp (−1/2Δi) approximates the relative likelihood of the models given the dataL(Mi| D), which are then normalized to obtain a positive set of Akaike weights (w). The Akaike weight for the ith model in a set of R candidate models is:

Akaike weights are very useful for assessing model-selection uncertainty without having to use computer intensive methods like Monte Carlo simulation or bootstrapping (Buckland et al., 1997; see Buckley et al., 2002, for an example). We can establish a 95% confidence set of models for the best K-L model by summing the Akaike weights from largest to smallest until the sum is just 0.95; the corresponding subset of models is a type of confidence set on the best K-L model (Burnham and Anderson, 1998, pp. 169–171; 2003). We can also assess the relative likelihoods of model i versus model j as simply the ratio of the two Akaike weights, which are called evidence ratios (Anderson et al., 2000; Burnham and Anderson, 2003, pp. 77–79). Techniques exist to compare whether two AICs differ significantly (Linhart, 1988; Shimodaira, 1997; Vuong, 1989), and multiple comparison techniques can be used to construct a confidence set of models that minimize the sampling error of the AIC (Shimodaira, 1998). Such techniques have already been proposed to construct a confidence sets of trees (Shimodaira, 2001; Shimodaira and Hasegawa, 1999).

There is a Bayesian basis for interpreting the Akaike weights as being the probability that a model is the expected best K-L model (Akaike, 1981). In fact, the Akaike weights can be generalized to also include prior information (ρi):

(Burnham and Anderson, 2003, p. 76). However, the above is not a true Bayesian approach, because these priors only refer to the model, and not to the prior probability distribution of the parameters of the model. Neither do these priors refer to the belief that Mi is the true model, but rather to the belief that model Mi is the best K-L model for the data (Burnham and Anderson, 1998, 2003). Usually ρi is set to 1/R for every model.

Model Averaging with the AIC

Within the AIC framework, it is straightforward to obtain a model-averaged estimate of any parameter (Posada, 2003). For example, a model-averaged estimate of the substitution rate between adenine and cytosine (ϕAC) using the Akaike weights (w) for R candidate models would be:
where
and

Again, the caveats described above about interpreting model-averaged parameter estimates apply. Likewise, it is again easy to estimate the relative importance of any parameter by summing the Akaike weights across all models that include the parameters we are interested in. For example, the relative importance of the substitution rate between adenine and cytosine across all candidate models is simply the denominator above, w+AC).

Model-Averaged Estimation of Phylogenies

As discussed above, model averaging can also be applied to the estimation of phylogenetic trees (Posada, 2003). This can be easily accomplished in programs like PAUP* (Swofford, 1998), and perhaps the only limitation is the time we want to dedicate to the analysis. We start by estimating a tree for each candidate model and then build a consensus tree using model weights as tree weights (these model weights can be Akaike weights, BIC weights, or model likelihoods from a Bayesian analysis) (see Jermiin et al., 1997). In a Bayesian framework one could also directly obtain a model-averaged estimate of phylogeny by using reversible-jump MCMC, an algorithm that moves through both parameter and model space (Green, 1995), and very recently implemented by Huelsenbeck et al. (2004), for phylogenetic model selection. It is also interesting to note that the AIC and Bayesian approaches allow for the direct comparison of trees estimated under different models because likelihoods calculated on different trees and on different models are comparable (e.g., ML-JC69 versus ML-HKY) In this sense, the AIC has already been used as an extension of the likelihood optimality criterion for phylogenetic estimation (Kishino and Hasegawa, 1989; Ogishima et al., 2000; Sober, 2002b; Sober and Steel, 2002; Tanaka et al., 1999), and nothing prevents the BIC from also being considered as another phylogenetic criterion. Posterior probabilities for different trees inferred under different models are also directly comparable if they fall under the same posterior distribution.

We have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001). This alignment contains 1927 sites, 301 of which are variable. We took three approaches to selecting the best-fit model. First, we optimized the likelihood and model parameters for the 56 substitution models currently implemented in the program Modeltest (Posada and Crandall, 1998) on a neighbor-joining tree estimated from Jukes and Cantor (1969) distances. We then used the AIC and AICc to select the best-fit model from these likelihoods. Second, we took these model parameters and performed a tree search under each of the 56 models so as to find the tree with the highest likelihood under each of these optimized models. Again, the AIC and AICc was used to chose the best-fit model. The second approach is superior to the first approach because it involves a more thorough search for the maximum likelihood under each model; however, the computational burden is much greater. Third, we also used the specific hLRT strategy implemented in Modeltest (Posada and Crandall, 1998). From the likelihood values we calculated AICc values, Akaike weights, the relative importance of different parameters, and model averaged estimates of parameters and topology. In addition, we performed a bootstrap analysis on the data using the best AICc model with 500 replicates. All tree searches used five random addition replicates followed by TBR branch swapping. All likelihood calculations and tree searches were performed using PAUP*4.0b10 (Swofford, 2000).

Examining the AICc values and Akaike weights for the models optimized on the NJ tree we immediately observe that only 11 out of the 56 models received noticeable support from the data (Table 1). Importantly, this confidence set of models, and the ranking of models within this set is almost identical to that obtained from optimizing the topology (data not shown) (see also Nylander, 2004). All of the supported models incorporated the gamma distribution for among site rate variation and the best-supported models also included a proportion of invariable sites. Models that assumed equal base frequencies fitted the data poorly and received essentially no support (i.e., their Akaike weights are close to zero). The TN93+I+Γ model had the smallest AICc value, but there was considerable uncertainty in identifying the most appropriate number of different substitution rates between nucleotides. The Akaike weights calculated from the AICc values were very similar to those calculated from the AIC. This is because the n/K ratio, 37.14, is close to the value of 40, which Burnham and Anderson (2003, p. 66) recommend as the cut-off for preferring AICc. Indeed, when n/K is relatively large the AICc converges back to the AIC, and so it is still appropriate to use the AICc instead of the AIC. The hLRT approach led to selection of the HKY +I+Γ model, which only received an AICc weight of 0.1692 (Table 1), but was contained within the 95% AIC confidence set of models. The ML tree under the HKY+I+Γ model differs by a symmetrical distance (Foulds et al., 1979) of 4 and 5 from the two trees estimated under the TN93+I+Γ model.

In total 23 unique tree topologies were estimated from all of the models; however, only 8 unique topologies were contained in the set of trees that were estimated from models that received greater than or equal to 0.00001 support from the AICc weights. Some tree searches under the among-site rate variation models recovered two topologies, where one of these topologies had an internal branch collapsed to zero length. The weighted AICc consensus topology (Fig. 5A) was almost identical to the topology estimated under the best AICc model (TN93+I+Γ) (Fig. 5B), but due to the model selection uncertainty there is considerable ambiguity in selecting the best point estimate of topology for these data. The bootstrap analysis under the best AICc model indicates that the nodes that are not supported under all of the models also have low bootstrap support (Fig. 5). This observation is important because it suggests that in this case if we had ignored model selection uncertainty our conclusion as to what hypotheses were well supported by the data would be the same. It is worth mentioning that the numbers above branches in Figure 5A describe the uncertainty of branches due to uncertainty on the models of molecular evolution. This is in contrast with the bootstrap values in Figure 5B, which describe uncertainty due to the stochasticity of molecular evolution. The former numbers can be regarded as “bootstrap proportions” obtained by resampling models with probabilities proportional to the Akaike weights. The phylogenetic relationships among the Ohomopterus carabid beetles are very similar to those estimated by Sota and Vogler (2001) using maximum parsimony.

Multimodel phylogeny of Ohomopterus carabid beetles. (A) Consensus of trees estimated under 56 candidate models, and constructed using Akaike weights (with the AICc) as tree weights. The values above branches represent the weights for each branch. All branches without a number received a weight of 100%. (B) Consensus of the two maximum likelihood trees under the best AICc model (TN93+I+Γ), one of which had a branch of zero length. Numbers above nodes are nonparametric bootstrap proportions. Nodes that received less than 50% are not indicated. The five species groups are indicated by shaded boxes.
Figure 5

Multimodel phylogeny of Ohomopterus carabid beetles. (A) Consensus of trees estimated under 56 candidate models, and constructed using Akaike weights (with the AICc) as tree weights. The values above branches represent the weights for each branch. All branches without a number received a weight of 100%. (B) Consensus of the two maximum likelihood trees under the best AICc model (TN93+I+Γ), one of which had a branch of zero length. Numbers above nodes are nonparametric bootstrap proportions. Nodes that received less than 50% are not indicated. The five species groups are indicated by shaded boxes.

We examined the association between pairwise AICc differences and pairwise tree distances (Foulds et al., 1979) for the 11 models included in the 99% confidence set (Fig. 6). This relationship shows a weak but significant correlation (r2 = 0.2394; P = 0.00015) between the improvement of fit of a model to the data and differences in topology. This graph supports, to a limited extent, the intuition that models with similar fits to the data tend to support similar trees.

AIC differences and phylogeny estimation. For each pair of models out of the 11 models with noticeable AICc support, we calculated the differences in AIC scores (Pairwise AICc distances) and the Robinson and Foulds (1981) tree distances (Pairwise tree distances) using AICc scores calculated on a NJ-JC tree.
Figure 6

AIC differences and phylogeny estimation. For each pair of models out of the 11 models with noticeable AICc support, we calculated the differences in AIC scores (Pairwise AICc distances) and the Robinson and Foulds (1981) tree distances (Pairwise tree distances) using AICc scores calculated on a NJ-JC tree.

The model averaged parameter estimates are very similar to the maximum likelihood estimates under the best-fit models (Table 2) because models with similar likelihoods, and thus low AIC differences tend to result in similar parameter estimates. The variability between the model averaged and best-fit model parameter estimates is unlikely to have a large effect on estimation of topology. The greatest variability between the model averaged parameter and best-fit model parameter estimates is observed for the transversion rate parameters. This is not surprising given that relatively few transversions have occurred in these data and therefore there is not much information from which to gain stable estimates.

Table 2.

Model-averaged estimates of nucleotide substitution parameters. These estimates were obtained from the carabid beetles Ohomopterus mitochondrial DNA data set using the Akaike weights (wi) derived from the AICc for models with wi > 0.0001. Which estimates contributed from which models are indicated in Table 3. Included also are the estimates corresponding to the best AICc model (TN93+I+Γ) and to the model selected by the hLRT procedure (HKY85+I+Γ). πA− πT: base frequencies; κ: transition/transversion parameter; ϕAC− ϕAT: substitution rates; α: shape of the gamma distribution for rate variation among sites; α (I+Γ) shape of the gamma distribution for rate variation among sites under an I+Γ model; pinv (I+Γ) proportion of invariable sites under an I < eqid18 > Γ model

ParameterModel-averaged estimateAICc model estimatehLRT model estimate
πA0.33300.33420.3303
πC0.06830.06670.0725
πG0.13620.13690.1335
πT0.46250.46220.4637
κ14.848314.847614.8476
ϕAC0.62901.0
ϕAG13.411113.1823
ϕAT1.05361.0
ϕCG0.41891.0
ϕCT20.055319.7583
α0.1011
α(I+Γ)0.71490.76580.5849
pinv(I+Γ)0.68740.70380.6644
ParameterModel-averaged estimateAICc model estimatehLRT model estimate
πA0.33300.33420.3303
πC0.06830.06670.0725
πG0.13620.13690.1335
πT0.46250.46220.4637
κ14.848314.847614.8476
ϕAC0.62901.0
ϕAG13.411113.1823
ϕAT1.05361.0
ϕCG0.41891.0
ϕCT20.055319.7583
α0.1011
α(I+Γ)0.71490.76580.5849
pinv(I+Γ)0.68740.70380.6644
Table 2.

Model-averaged estimates of nucleotide substitution parameters. These estimates were obtained from the carabid beetles Ohomopterus mitochondrial DNA data set using the Akaike weights (wi) derived from the AICc for models with wi > 0.0001. Which estimates contributed from which models are indicated in Table 3. Included also are the estimates corresponding to the best AICc model (TN93+I+Γ) and to the model selected by the hLRT procedure (HKY85+I+Γ). πA− πT: base frequencies; κ: transition/transversion parameter; ϕAC− ϕAT: substitution rates; α: shape of the gamma distribution for rate variation among sites; α (I+Γ) shape of the gamma distribution for rate variation among sites under an I+Γ model; pinv (I+Γ) proportion of invariable sites under an I < eqid18 > Γ model

ParameterModel-averaged estimateAICc model estimatehLRT model estimate
πA0.33300.33420.3303
πC0.06830.06670.0725
πG0.13620.13690.1335
πT0.46250.46220.4637
κ14.848314.847614.8476
ϕAC0.62901.0
ϕAG13.411113.1823
ϕAT1.05361.0
ϕCG0.41891.0
ϕCT20.055319.7583
α0.1011
α(I+Γ)0.71490.76580.5849
pinv(I+Γ)0.68740.70380.6644
ParameterModel-averaged estimateAICc model estimatehLRT model estimate
πA0.33300.33420.3303
πC0.06830.06670.0725
πG0.13620.13690.1335
πT0.46250.46220.4637
κ14.848314.847614.8476
ϕAC0.62901.0
ϕAG13.411113.1823
ϕAT1.05361.0
ϕCG0.41891.0
ϕCT20.055319.7583
α0.1011
α(I+Γ)0.71490.76580.5849
pinv(I+Γ)0.68740.70380.6644

Not all model parameters have the same importance for this data set (Table 3). The alpha shape parameter from the gamma distribution of among-site rate variation and the base frequency parameters have a relative importance of 1.0 because they appear in all of the supported models. The proportion of invariable sites is also a very important parameter although a few models with low weight without this parameter are supported. This observation suggests that these properties of the evolutionary process are very important for obtaining a good model fit. The ϕAG and ϕCT substitution rate parameters have higher relative importance values that the transversion parameters. This indicates that for these data it is important to allow the two transition types to have different rates, more so than the transversion types. The results shown in Table 2 make sense in light of our current knowledge of the dynamics of animal mitochondrial DNA evolution (e.g., Brown et al. 1982; Tamura and Nei 1993; Buckley et al. 2001a).

Table 3.

Relative parameter importance. Included here are Akaike weights (wi) and relative parameter importance values for the Ohomopterus carabid beetles mitochondrial DNA data set, for models with wi > 0.0001. Where a model contains a free parameter it is indicated with a black dot (note that ϕGT is often set to equal 1)

wiπAπCπGπTκϕACϕAGϕATϕCGϕCTϕGTαpinv
TN93+I+Γ0.5221
TIM+I+Γ0.1913
HKY85+I+Γ0.1692
K81uf+I+Γ0.0642
GTR+I+Γ0.0344
TVM+I+Γ0.0165
TN93+Γ0.0011
HKY85+Γ0.0005
TIM+Γ0.0004
K81uf+Γ0.0002
GTR+Γ0.0001
Relative parameter importance1.01.01.01.00.1700.0510.7490.0510.0510.7490.0511.00.997
wiπAπCπGπTκϕACϕAGϕATϕCGϕCTϕGTαpinv
TN93+I+Γ0.5221
TIM+I+Γ0.1913
HKY85+I+Γ0.1692
K81uf+I+Γ0.0642
GTR+I+Γ0.0344
TVM+I+Γ0.0165
TN93+Γ0.0011
HKY85+Γ0.0005
TIM+Γ0.0004
K81uf+Γ0.0002
GTR+Γ0.0001
Relative parameter importance1.01.01.01.00.1700.0510.7490.0510.0510.7490.0511.00.997
Table 3.

Relative parameter importance. Included here are Akaike weights (wi) and relative parameter importance values for the Ohomopterus carabid beetles mitochondrial DNA data set, for models with wi > 0.0001. Where a model contains a free parameter it is indicated with a black dot (note that ϕGT is often set to equal 1)

wiπAπCπGπTκϕACϕAGϕATϕCGϕCTϕGTαpinv
TN93+I+Γ0.5221
TIM+I+Γ0.1913
HKY85+I+Γ0.1692
K81uf+I+Γ0.0642
GTR+I+Γ0.0344
TVM+I+Γ0.0165
TN93+Γ0.0011
HKY85+Γ0.0005
TIM+Γ0.0004
K81uf+Γ0.0002
GTR+Γ0.0001
Relative parameter importance1.01.01.01.00.1700.0510.7490.0510.0510.7490.0511.00.997
wiπAπCπGπTκϕACϕAGϕATϕCGϕCTϕGTαpinv
TN93+I+Γ0.5221
TIM+I+Γ0.1913
HKY85+I+Γ0.1692
K81uf+I+Γ0.0642
GTR+I+Γ0.0344
TVM+I+Γ0.0165
TN93+Γ0.0011
HKY85+Γ0.0005
TIM+Γ0.0004
K81uf+Γ0.0002
GTR+Γ0.0001
Relative parameter importance1.01.01.01.00.1700.0510.7490.0510.0510.7490.0511.00.997

Lastly, model averaging could also be applied to other problems in evolutionary biology in which inferences can be drawn from several models, for example as in the detection of positive selection from sequence alignments (Yang et al., 2000), and the estimation of divergence times using relaxed molecular clocks (Aris-Brosou and Yang, 2002), where different models can frequently yield different results.

Philosophical Considerations on Model Selection

There is still an important philosophical debate about model selection in general (Burnham and Anderson, 1998, 2003; Forster and Sober, 1994, 2004; Forster, 2000, 2001; Kass and Raftery, 1995; Kieseppä, 2002; Myrvold and Harper, 2002; Popper, 1959; Sober, 2002a; Wasserman, 2000), and here we do not attempt to address all the issues, but just those we think are most relevant. The information-theoretic and the Bayesian approaches represent different philosophical approaches to the problem of model selection (Forster and Sober, 1994; Kuha, 2003; Sober, 2002a). The AIC is designed to choose the model that best approximates reality. The conclusions of AIC are never about the truth or falsity of a hypothesis, but about its closeness to the truth (Forster and Sober, 2004). On the other hand, Bayesian approaches are designed to identify the true model, given the data. Both the AIC and Bayesian approaches have been criticized on different grounds.

That Bayesian approaches are designed to identify the true model can be surprising when surely we know that all models of evolution are false (i.e., their probability is zero). The standard interpretation of P(Mi|D) is that it is the probability that Mi is the true model given the data, even though we know that this statement is false a priori (Gelfand, 1996). A common response to this criticism is that we can hope that at least one of the models is approximately true, and that the posterior distributions allows us to compare the relative merits of the models (Wasserman 2000). On the other hand, it has been argued that the derivation of the BIC does not require that the true model is contained within the set of candidate models (Burnham and Anderson, 2003, pp. 293–295; Cavanaugh and Neath, 1999). Interestingly, it is possible to obtain the AIC as a Bayesian result if a particular prior (the so called K-L prior) is used with the BIC (Burnham and Anderson, 2003, pp. 302–305).

It has been alleged in the statistical literature that, under certain conditions, the BIC is statistically consistent (it does converge to truth as more data is added), whereas the AIC is not (but see Bozdogan, 1987; Findley, 1991; Keuzenkamp and McAleer, 1995; Nishii, 1984, 1988; Shibata, 1986; Woodroofe, 1982) but the relevance of statistical consistency in this context is not clear (Forster, 2002).

We can think of a model as a set or family of sharp hypotheses. For example, the K80 model contains all hypotheses representing different values of the transition/transversion parameter, κ. The JC69 model, however, contains only one hypothesis, as all its parameters are fixed (equal base frequencies and equal rates for transitions or transversions). The AIC and the BIC work with maximized likelihoods, and therefore they are comparing the best point hypothesis within each model. However, it might be unwise to compare models based only on the merits of a single point, even if this point is optimal, and that is why Bayesians prefer models for which the sum of the likelihoods of all contained point hypotheses is largest (Holder and Lewis, 2003).

Which Model Selection Method is Best for Phylogenetics?

The use of different model selection strategies may lead to the selection of different models of evolution (Posada and Crandall, 2001a), and we know that model choice affects all aspects of phylogenetic analysis. Here we have attempted to compare different model selection strategies from a theoretical and practical point of view, in the context of phylogenetics. Previous Monte Carlo simulations on the performance of model selection in phylogenetics (Posada, 2001; Posada and Crandall, 2001b) showed that these methods work well when the aim is to identify the generating model. However, these simulations missed the point that the true model of evolution will never be one of the candidate models. It would be more useful to generate data from a model much more complex than any of the candidate models, and then study how well the selected models approximate this complex generating model (e.g., Minin et al., 2003). Clearly, we should seek models that are good approximations to the truth and from which therefore we can make valid inferences concerning the real process of molecular evolution. Too often we read expressions like “The best-fit model was selected with the program Modeltest” without any reference to which model selection strategy was used (in this case, hLRT or AIC). When a method of model selection is used, this should be explicitly reported.

From the discussion above it should be clear that the Bayesian and AIC approaches present several important advantages over the hLRTs for model selection (see also Table 4). Namely, they are able to simultaneously compare multiple nested or nonnested models (see Chamberlain, 1890), account for model selection uncertainty, and allow for model-averaged inference. Although model selection uncertainty tools do not exist within the standard hLRTs framework, there are extensions of the LRT framework that allow for the specification of confidence sets of models. Evidence for a model can be also estimated by the “expected likelihood weights” (Strimmer, 2001; Strimmer and Rambaut, 2001). Criteria like the AIC or BIC are very simple to calculate from the maximum likelihood estimate, although they do rely on point estimates and do not take in account topological uncertainty (Bollback, 2002). The importance of the later effect has yet to be examined (but see Posada and Crandall, 2001b), as well as the potential impact of comparing models with parameters fixed at the boundary of their ranges (e.g., α = ∝) in the AIC and BIC.

Table 4.

Comparison of model selection strategies for phylogenetics. Indicated are what the authors think are good properties for a model section procedure. Exceptions to these may exist and the comments below are generalizations

Good properties for model selection methodshLRTBayesianAIC
Applies easily to nonnested modelsNoYesYes
Allows for the simultaneous comparison of multiple modelsNoYesYes
Does not depend on a subjective significance levelNoYes§Yes
Incorporates topological uncertaintyNoYes*No
Easy to computeYesNo*Yes
Assesses model selection uncertaintyNoYesYes
Allows model averagingNoYesYes
Provides the possibility of specifying prior information for modelsNoYes*Yes
Provides the possibility of specifying prior information for model parametersNoYes*No
Designed to approximate, rather than to identify, truthNoNoYes
Good properties for model selection methodshLRTBayesianAIC
Applies easily to nonnested modelsNoYesYes
Allows for the simultaneous comparison of multiple modelsNoYesYes
Does not depend on a subjective significance levelNoYes§Yes
Incorporates topological uncertaintyNoYes*No
Easy to computeYesNo*Yes
Assesses model selection uncertaintyNoYesYes
Allows model averagingNoYesYes
Provides the possibility of specifying prior information for modelsNoYes*Yes
Provides the possibility of specifying prior information for model parametersNoYes*No
Designed to approximate, rather than to identify, truthNoNoYes
*

Not the BIC.

§

In a sense, the interpretation of Bayes factors could be considered as subjective.

Table 4.

Comparison of model selection strategies for phylogenetics. Indicated are what the authors think are good properties for a model section procedure. Exceptions to these may exist and the comments below are generalizations

Good properties for model selection methodshLRTBayesianAIC
Applies easily to nonnested modelsNoYesYes
Allows for the simultaneous comparison of multiple modelsNoYesYes
Does not depend on a subjective significance levelNoYes§Yes
Incorporates topological uncertaintyNoYes*No
Easy to computeYesNo*Yes
Assesses model selection uncertaintyNoYesYes
Allows model averagingNoYesYes
Provides the possibility of specifying prior information for modelsNoYes*Yes
Provides the possibility of specifying prior information for model parametersNoYes*No
Designed to approximate, rather than to identify, truthNoNoYes
Good properties for model selection methodshLRTBayesianAIC
Applies easily to nonnested modelsNoYesYes
Allows for the simultaneous comparison of multiple modelsNoYesYes
Does not depend on a subjective significance levelNoYes§Yes
Incorporates topological uncertaintyNoYes*No
Easy to computeYesNo*Yes
Assesses model selection uncertaintyNoYesYes
Allows model averagingNoYesYes
Provides the possibility of specifying prior information for modelsNoYes*Yes
Provides the possibility of specifying prior information for model parametersNoYes*No
Designed to approximate, rather than to identify, truthNoNoYes
*

Not the BIC.

§

In a sense, the interpretation of Bayes factors could be considered as subjective.

The possibility of inferring model-averaging phylogenies will eliminate some of the criticisms that model-based methods are contingent on the single best-fit model selected. Obviously, the methods described above can facilitate model-averaged hypothesis testing, as one could test for the monophyly of a group by considering all models available. Sanderson and Kim (2000) already hinted at the possibility of model-averaging phylogenies, but claimed that such a composite solution would be computationally prohibitive. However, this computational burden will depend on the size of the data set (especially on the number of taxa) and the number of models considered (but one could work with the 95% confidence or credible set of models), and in some cases it will certainly be feasible.

Selecting a set of candidate models is not easy; there are 203 “standard” time-reversible models of nucleotide substitution, but model selection in phylogenetics is commonly limited to a subset of these (Huelsenbeck et al., 2004). Indeed, evaluating a large number of models is more problematic for the hLRT than for the AIC and Bayesian approaches for the reasons explained above. The implications of conditioning model selection on a subset of the possible set of models is currently unknown.

Selection bias (Zucchini, 2000) may occur when the number of candidate models is large. In such cases random fluctuations in the data will increase the score of some models more than others and therefore the chance that the best model won for spurious reasons increases. Indeed, the set of candidate models influences model choice, and a careful a priori selection of candidate models is very important.

Both in the AICc and the BIC descriptions above, the total number of characters was used as an estimate of sample size. However, effective sample sizes in phylogenetic studies are poorly understood, and depend on the quantity of interest (Churchill et al., 1992; Goldman, 1998; Morozov et al., 2000). Characters in an alignment will often not be independent, so using the total number of characters as a surrogate for sample size (Minin et al., 2003; Posada and Crandall, 2001b) could be an overestimate. Using only the number of variable sites as an estimate of sample size is a more conservative approach, but could be an underestimate (note that all sites are used when estimating base frequencies or the proportion of invariable sites). Indeed, sample size also depends on the number of taxa. Importantly, sample size can have an effect on the outcome of model selection with the AICc. In our example above, if we were to use the number of variable characters (301 sites) as the sample size, instead of the total number of characters (1927 sites), the best AICc model would not change, but the second and third AICc models would exchange their rankings. Furthermore, because the LRT, the AIC, and the BIC strategies rely on large sample asymptotics, it is also important to decide when a sample should be considered small. Although the AICc was derived under Gaussian assumptions, Burnham et al. (1994) found that this second order expression performed well in product multinomial models for open population capture-recapture. Burnham and Anderson (2003, p. 66) suggest using this correction when the sample size is small compared to the number of adjustable parameters, n/K < 40. Alternatively, and because AICc converges to the AIC with increasing n/K ratios, one could always use the AICc (D. Anderson, personal communications). Phylogenetic characters are mostly discrete, and the unconstrained model in phylogenetics is multinomial (Goldman, 1993). One may think of an alignment of nucleotide characters as a large and sparse contingency table with 4T bins, where T is the number of taxa. For large sample asymptotics to hold in a contingency table every cell should contain, in general, more than 5 observations (see Agresti, 1990, p. 49, 244–250), which gives a rule of thumb of n/4T > 5. Clearly, more research is needed on sample size in phylogenetics.

Other model selection methods exist, like cross-validation and the bootstrap (see Browne, 2000; Efron and Tibshirani, 1993; Linhart and Zucchini, 1986), but they seem too time-consuming—note that cross validation is asymptotically equivalent to the AIC (Stone, 1977)—for the selection of substitution models. There is an important role for more general tests of model fit and accuracy within the process of model selection. For example, tests of base frequency stationarity (Rzhetsky and Nei, 1995; Van Den Bussche et al., 1998) should be standard before a phylogenetic analysis. In addition, the global tests of Goldman (1993) and Bollback (2001) are useful for detecting model misspecification. When tests such as these indicate that the final model selected still does not fit the data well, our results must be interpreted with caution as the possibility remains that some vital evolutionary process has not been accounted for, which could potentially be misleading.

Model selection is a useful tool for research, but it is not a substitute for careful thinking and common sense reasoning (Browne, 2000). There are examples in the phylogenetic literature where the best-fit models have led to phylogenetic estimates that are clearly incorrect (Buckley and Cunningham, 2002; Posada and Crandall, 2001c). Consideration of model selection uncertainty and multimodel inference should lead to equal or better estimates of phylogenies and substitution parameters, and we should see more applications of these ideas in the future (see also Nylander, 2004). Computation of AIC differences, Akaike weights, model-averaged estimates, and relative parameter importance is currently implemented in the program Modeltest (Posada and Crandall, 1998). Further developments will allow for the simultaneous use of different models for different partitions of the data (Nylander et al., 2004; Pupko et al., 2002; Suchard et al., 2003a; Yang, 1996b). It is now time to start thinking about how we will select those. Model selection in phylogenetics is indeed still an open area for research (Huelsenbeck et al., 2002).

1

Occam's (ca. 1280–1349) parsimony principle or Occam's razor was stated as “Pluralitas non est ponenda sine necessitate,” which translates literally into English as “plurality should not be posited without necessity.”

2

For continuous functions.

Acknowledgements

We are undoubtedly indebted to Kenneth Burnham and David Anderson for their enlightening book. David Anderson, Elliot Sober, and Carsten Wiuf provided very insightful comments on the manuscript. Robert Weiss, Janet Sinsheimer, Paul Lewis, Paul Joyce, Hidetoshi Shimodaira, and Rissa Ota helped clarify some ideas on Bayesian model selection. Nick Goldman and two anonymous referees provided useful comments on a first version. Jeff Thorne, Hirohisa Kishino, and two anonymous referees provide very valuable comments that considerably improved the manuscript. Thanks to David Swofford and Jack Sullivan for many valuable conversations on model selection throughout the years. DP was funded by the Spanish Ministry of Science and Technology, while funding for TRB was provided by the New Zealand Foundation for Research, Science, and Technology.

References

Adachi
J.
Hasegawa
M.
MOLPHY version 2.3.: Programs for molecular phylogenetics based in maximum likelihood
Comput. Sci. Monogr.
1996
, vol. 
28
 (pg. 
1
-
150
)
Agresti
A.
Categorical data analysis, 2nd edition
1990
New York
Wiley
Akaike
H.
Information theory and an extension of the maximum likelihood principle
Second International Symposium on Information Theory
1973
Budapest
Akademiai Kiado
(pg. 
267
-
281
Pages
Akaike
H.
A new look at the statistical model identification
IEEE Trans. Aut. Control
1974
, vol. 
19
 (pg. 
716
-
723
)
Akaike
H.
Likelihood of a model and information criteria
J. Econometrics
1981
, vol. 
16
 (pg. 
3
-
14
)
Akaike
H.
Information measures and model selection
Int. Stat. Inst.
1983
, vol. 
22
 (pg. 
277
-
291
)
Anderson
D. R.
Burnham
K. P.
Thompson
W. L.
Null hypothesis testing: Problems, prevalence, and an alternative
J. Wildl. Manage
2000
, vol. 
64
 (pg. 
912
-
923
)
Aris-Brosou
S.
Yang
Z.
Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny
Syst. Biol.
2002
, vol. 
51
 (pg. 
703
-
714
)
Bartlett
M. S.
A comment on D
V. Lindley's statistical paradox. Biometrika
1957
, vol. 
44
 (pg. 
533
-
534
)
Berger
J. O.
Sellke
T.
Testing a point null hypothesis: The irreconcilability of P values and evidence
J. Am. Stat. Assoc.
1987
, vol. 
82
 (pg. 
112
-
122
)
Bernardo
J. M.
Smith
A. F. M.
Bayesian theory
1994
New York
Wiley and Sons
Bollback
J. P.
Bayesian model adequacy and choice in phylogenetics
Mol. Biol. Evol.
2002
, vol. 
19
 (pg. 
1171
-
1180
)
Box
G. E. P.
Science and statistics
J. Am. Stat. Assoc.
1976
, vol. 
71
 (pg. 
791
-
799
)
Bozdogan
H.
Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions
Psychometrika
1987
, vol. 
52
 (pg. 
345
-
370
)
Browne
M.
Cross-validation methods
J. Math. Psychol.
2000
, vol. 
44
 (pg. 
108
-
132
)
Bruno
W. J.
Halpern
A. L.
Topological bias and inconsistency of maximum likelihood using wrong models
Mol. Biol. Evol.
1999
, vol. 
16
 (pg. 
564
-
566
)
Buckley
T. R.
Model misspecification and probabilistic tests of topology: Evidence from empirical data sets
Syst. Biol.
2002
, vol. 
51
 (pg. 
509
-
523
)
Buckley
T. R.
Arensburger
P.
Simon
C.
Chambers
G. K.
Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera
Syst. Biol.
2002
, vol. 
51
 (pg. 
4
-
18
)
Buckland
S. T.
Burnham
K. P.
Augustin
N. H.
Model selection uncertainty: An integral part of inference
Biometrics
1997
, vol. 
53
 (pg. 
603
-
618
)
Buckley
T. R.
Cunningham
C. W.
The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support
Mol. Biol. Evol.
2002
, vol. 
19
 (pg. 
394
-
405
)
Buckley
T. R.
Simon
C.
Chambers
G. K.
Exploring among-site rate variation models in a maximum likelihood framework using empirical data: The effects of model assumptions on estimates of topology, branch lengths, and bootstrap support
Syst. Biol.
2001
, vol. 
50
 (pg. 
67
-
86
)
Burnham
K. P.
Anderson
D. R.
Model selection and inference: A practical information-theoretic approach, 1st ed
1998
New York
Springer-Verlag
Burnham
K. P.
Anderson
D. R.
Model selection and multimodel inference: A practical information-theoretic approach, 2nd ed
2003
New York
Springer-Verlag
Burnham
K. P.
Anderson
D. R.
White
G. C.
Evaluation of the Kullback-Leibler discrepancy for model selection in open population capture-recapture models
Biometrica J.
1994
, vol. 
36
 (pg. 
299
-
315
)
Cavanaugh
J. E.
Neath
A. A.
Generalizing the derivation of the Schwarz information criterion
Commun. Stat. Theory Methods
1999
, vol. 
28
 (pg. 
49
-
66
)
Chamberlain
T. C.
The method of multiple working hypotheses
Science
1890
, vol. 
15
 pg. 
93
 
Chatfield
C.
Model uncertainty, data mining and statistical inference
J. R. Stat. Soc. A
1995
, vol. 
158
 (pg. 
419
-
466
)
Churchill
G. A.
Von Haeseler
A.
Navidi
W. C.
Sample size for a phylogenetic inference
Mol. Biol. Evol.
1992
, vol. 
9
 (pg. 
753
-
769
)
Deleeuw
J.
Kotz
S.
Johnson
N. L.
Introduction to Akaike 1973 information theory and an extension of the maximum likelihood principle
Breakthroughs in statistics
1992
London
Springer-Verlag
(pg. 
599
-
609
Pages
Edwards
A. W. F.
Likelihood
1972
Cambridge, UK
Cambridge University Press
Efron
B.
Tibshirani
R. J.
An Introduction to the Bootstrap
1993
New York
Chapman and Hall
Felsenstein
J.
Cases in which parsimony or compatibility methods will be positively misleading
Syst. Zool.
1978
, vol. 
27
 (pg. 
401
-
410
)
Felsenstein
J.
Evolutionary trees from DNA sequences: A maximum likelihood approach
J. Mol. Evol.
1981a
, vol. 
17
 (pg. 
368
-
376
)
Felsenstein
J.
A likelihood approach to character weighting and what it tells us about parsimony and compatibility
Biol. J. Linnaean Soc.
1981b
, vol. 
16
 (pg. 
183
-
196
)
Findley
D. F.
Counterexamples to parsimony and BIC
Ann. Inst. Stat. Math.
1991
, vol. 
43
 (pg. 
505
-
514
)
Fisher
R. A.
On the ‘probable error’ of a coefficient of correlation deduced from a small sample
Metron I, part
1921
, vol. 
4
 (pg. 
3
-
32
)
Forster
M. R.
Key Concepts in model selection: Performance and generalizability
J. Math. Psychol.
2000
, vol. 
44
 (pg. 
205
-
231
)
Forster
M. R.
Zeller
A.
Keuzenkamp
H. A.
McAleer
M.
The new science of simplicity
Simplicity, inference and modeling
2001
Cambridge, UK
Cambridge University Press
(pg. 
83
-
119
Pages
Forster
M. R.
Predictive accuracy as am achievable goal of science
Phil. Sci.
2002
, vol. 
69
 (pg. 
S124
-
S134
)
Forster
M.
Sober
E.
How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions
Br. J. Phil. Sci.
1994
, vol. 
45
 (pg. 
1
-
35
)
Forster
M. R.
Sober
E.
Taper
M.
Lele
S.
Why likelihood?
Likelihood and Evidence
2004
Chicago
University of Chicago Press
Foulds
L. R.
Hendy
M. D.
Penny
D.
A graph theoretic approach to the development of minimal phylogenetic trees
J. Mol. Evol.
1979
, vol. 
13
 (pg. 
127
-
149
)
Foutz
R. V.
Srivastava
R. C.
The performance of the likelihood ratio test when the model is incorrect
Ann. Stat.
1977
, vol. 
5
 (pg. 
1183
-
1194
)
Frati
F.
Simon
C.
Sullivan
J.
Swofford
D. L.
Gene evolution and phylogeny of the mitochondrial cytochrome oxidase gene in Collembola
J. Mol. Evol.
1997
, vol. 
44
 (pg. 
145
-
158
)
Gelfand
A. E.
Gilks
W. R.
Richardson
S.
Spiegelhalter
D. J.
Model determination using sampling-based methods
Markov chain Monte Carlo in practice
1996
London, New York
Chapman & Hall
(pg. 
145
-
161
Pages
Gilks
W. R.
Richardson
S.
Spiegelhalter
D. J.
Markov chain Monte Carlo in practice
1996
London, New York
Chapman & Hall
Golden
R. M.
Making correct statistical inferences using a wrong probability model
J. Math. Psychol.
1995
, vol. 
38
 (pg. 
3
-
20
)
Goldman
N.
Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses
Syst. Zool.
1990
, vol. 
39
 (pg. 
345
-
361
)
Goldman
N.
Statistical tests of models of DNA substitution
J. Mol. Evol.
1993
, vol. 
36
 (pg. 
182
-
198
)
Goldman
N.
Phylogenetic information and experimental design in molecular systematics
Proc. R. Soc. Lond. B Biol. Sci.
1998
, vol. 
265
 (pg. 
1779
-
1786
)
Goldman
N.
Whelan
S.
Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics
Mol. Biol. Evol.
2000
, vol. 
17
 (pg. 
975
-
978
)
Green
P. J.
Reversible jump MCMC computation and Bayesian model determination
Biometrika
1995
, vol. 
92
 (pg. 
711
-
732
)
Hasegawa
M.
Mitochondrial DNA evolution in primates: Transition rate has been extremely low in the lemur
J. Mol. Evol.
1990a
, vol. 
31
 (pg. 
113
-
121
)
Hasegawa
M.
Phylogeny and molecular evolution in primates
Jpn. J. Genet.
1990b
, vol. 
65
 (pg. 
243
-
266
)
Hasegawa
M.
Kishino
K.
Yano
T.
Dating the human-ape splitting by a molecular clock of mitochondrial DNA
J. Mol. Evol.
1985
, vol. 
22
 (pg. 
160
-
174
)
Hastings
W. K.
Monte Carlo sampling methods using Markov chains and their applications
Biometrika
1970
, vol. 
57
 (pg. 
97
-
109
)
Hochberg
Y.
A sharper Bonferroni procedure for multiple tests of significance
Biometrika
1988
, vol. 
75
 (pg. 
800
-
802
)
Hoeting
J. A.
Madigan
D.
Raftery
A. E.
Bayesian model averaging: A tutorial
Stat. Sci.
1999
, vol. 
14
 (pg. 
382
-
417
)
Holder
M.
Lewis
P. O.
Phylogeny estimation: Traditional and Bayesian approaches
Nat. Rev. Genet.
2003
, vol. 
4
 (pg. 
275
-
284
)
Hsiao
C. K.
Approximate Bayes factors when a mode occurs on the boundary
J. Am. Stat. Assoc.
1997
, vol. 
92
 (pg. 
656
-
663
)
Huelsenbeck
J. P.
Crandall
K. A.
Phylogeny estimation and hypothesis testing using maximum likelihood
Annu. Rev. Ecol. Syst.
1997
, vol. 
28
 (pg. 
437
-
466
)
Huelsenbeck
J. P.
Hillis
D. M.
Success of phylogenetic methods in the four-taxon case
Syst. Biol.
1993
, vol. 
42
 (pg. 
247
-
264
)
Huelsenbeck
J. P.
Imennov
N. S.
Geographic origin of human mitochondrial DNA: Accommodating phylogenetic uncertainty and model comparison
Syst. Biol.
2002
, vol. 
51
 (pg. 
155
-
165
)
Huelsenbeck
J. P.
Larget
B.
Alfaro
M. E.
Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo
Mol. Biol. Evol.
2004
, vol. 
21
 (pg. 
1123
-
1133
)
Huelsenbeck
J. P.
Larget
B.
Miller
R. E.
Ronquist
F.
Potential applications and pitfalls of Bayesian inference of phylogeny
Syst. Biol.
2002
, vol. 
51
 (pg. 
673
-
688
)
Huelsenbeck
J. P.
Rannala
B.
Larget
B.
A Bayesian framework for the analysis of cospeciation
Evol. Int. J. Org. Evol.
2000
, vol. 
54
 (pg. 
352
-
364
)
Huelsenbeck
J. P.
Ronquist
F.
Nielsen
R.
Bollback
J. P.
Bayesian inference of phylogeny and its impact on evolutionary biology
Science
2001
, vol. 
294
 (pg. 
2310
-
2314
)
Hurvich
C. M.
Tsai
C.-L.
Regression and time series model selection in small samples
Biometrika
1989
, vol. 
76
 (pg. 
297
-
307
)
Jeffreys
H.
Theory of probability
1939
Oxford
Oxford University Press
Jermiin
L. S.
Olsen
G. J.
Mengersen
K. L.
Easteal
S.
Majority-rule consensus of phylogenetic trees obtained by maximum-likelihood analysis
Mol. Biol. Evol.
1997
, vol. 
14
 (pg. 
1296
-
1302
)
Johnson
J. B.
Omland
K. S.
Model selection in ecology and evolution
Trends Ecol. Evol.
2003
, vol. 
19
 (pg. 
101
-
108
)
Jukes
T. H.
Cantor
C. R.
Munro
H. M.
Evolution of protein molecules
Mammalian protein metabolism
1969
New York
Academic Press
(pg. 
21
-
132
Pages
Kadane
J. B.
Wolfson
L. J.
Experiencies in elicitation
J. R. Stat. Soc. D 47 Part
1998
, vol. 
1
 (pg. 
3
-
19
)
Kass
R. E.
Raftery
A. E.
Bayes factors
J. Am. Stat. Assoc.
1995
, vol. 
90
 (pg. 
773
-
795
)
Kass
R. E.
Wasserman
L.
A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion
J. Am. Stat. Assoc.
1995
, vol. 
90
 (pg. 
928
-
934
)
Kelsey
C. R.
Crandall
K. A.
Voevodin
A. F.
Different models, different trees: The geographic origin of PTLV-I
Mol. Phylogenet. Evol.
1999
, vol. 
13
 (pg. 
336
-
347
)
Kendall
M.
Stuart
A.
The advanced theory of statistics, 4th edition
1979
London
Charles Griffin
Kent
J. T.
Robust properties of likelihood ratio tests
Biometrika
1982
, vol. 
69
 (pg. 
19
-
27
)
Keuzenkamp
H.
McAleer
M.
Simplicity, scientific inference and economic modeling
Econ. J.
1995
, vol. 
105
 (pg. 
1
-
21
)
Kieseppä
I. A.
Statistical model selection and Bayesianism
Phil. Sci.
2002
, vol. 
68
 (pg. 
S141
-
S152
)
Kimura
M.
A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences
J. Mol. Evol.
1980
, vol. 
16
 (pg. 
111
-
120
)
Kimura
M.
Estimation of evolutionary distances between homologous nucleotide sequences
Proc. Nat. Acad. Sci. USA
1981
, vol. 
78
 (pg. 
454
-
458
)
Kishino
H.
Hasegawa
M.
Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea
J. Mol. Evol.
1989
, vol. 
29
 (pg. 
170
-
179
)
Kuha
J.
AIC and BIC: Comparisons of assumptions and performance
Sociol. Methods Res.
2003
 
Submitted
Kullback
S.
Leibler
R. A.
On information and sufficiency
Ann. Math. Stat.
1951
, vol. 
22
 (pg. 
79
-
86
)
Larget
B.
Simon
D.
Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees
Mol. Biol. Evol.
1999
, vol. 
16
 (pg. 
750
-
759
)
Lindley
D. V.
A statistical paradox
Biometrika
1957
, vol. 
44
 (pg. 
187
-
192
)
Linhart
H.
A test whether two AIC's differ significantly
S. Afr. Stat. J.
1988
, vol. 
22
 (pg. 
153
-
161
)
Linhart
H.
Zucchini
W.
Model selection
1986
New York
Wiley
Madigan
D.
Gavrin
J.
Raftery
A. E.
Eliciting prior information to enhance the predictive performance of Bayesian graphical models
Commun. Stat. Theory Methods
1995
, vol. 
24
 (pg. 
2271
-
2292
)
Madigan
D. M.
Raftery
A. E.
Model selection and accounting for model uncertainty in graphical models using Occam's Window
J. Am. Stat. Assoc.
1994
, vol. 
89
 (pg. 
1335
-
1346
)
Mau
B.
Newton
M. A.
Phylogenetic inference for binary data on dendrograms using Markov chain Monte Carlo
J. Comp. Grap. Stat.
1997
Mau
B.
Newton
M. A.
Larget
B.
Bayesian phylogenetic inference via Markov chain Monte Carlo methods
Biometrics
1999
, vol. 
55
 (pg. 
1
-
12
)
Metropolis
N.
Rosenbluth
A.
Rosenbluth
M.
Teller
A.
Teller
E.
Equations of state calculations by fast computing machines
J. Chem. Phys.
1953
, vol. 
21
 (pg. 
1087
-
1092
)
Miller
A. J.
Subset Selection in Regression, 2nd edition edition
2002
New York
Chapman & Hall/CRC
Minin
V.
Abdo
Z.
Joyce
P.
Sullivan
J.
Performance-based selection of likelihood models for phylogeny estimation
Syst. Biol.
2003
, vol. 
52
 (pg. 
674
-
683
)
Morozov
P.
Sitnikova
T.
Churchill
G.
Ayala
F. J.
Rzhetsky
A.
A new method for characterizing replacement rate variation in molecular sequences: Application of the Fourier and Wavelet models to Drosophila and mammalian proteins
Genetics
2000
, vol. 
154
 (pg. 
381
-
395
)
Myrvold
W. C.
Harper
W. L.
Model Selection, Simplicity, and Scientific Inference
Philos. Sci.
2002
, vol. 
69
 (pg. 
S135
-
S149
)
Nishii
R.
Asymptotic properties of criteria for selection of variables in multiple regression
Ann. Stat.
1984
, vol. 
12
 (pg. 
758
-
765
)
Nishii
R.
Maximum likelihood principle and model selection when the true model is unspecified
J. Multivar. Ana.
1988
, vol. 
27
 
Nylander
J. A.
Bayesian Phylogenetics and the Evolution of Gall Wasps
Acta Universitatis Upsaliensis
2004
Uppsala, Sweden
Uppsala University
pg. 
43
  
Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 937
Nylander
J. A.
Ronquist
F.
Huelsenbeck
J. P.
Nieves-Aldrey
J. L.
Bayesian phylogenetic analysis of combined data
Syst. Biol.
2004
, vol. 
53
 (pg. 
47
-
67
)
Occam
W.
Scriptum in Librum Primum Sententiarum, Opera Theologica, I
ca.1320
Ogishima
S.
Ren
F.
Tanaka
H.
Efficiencies of information criteria for topology selection in reconstructing molecular phylogenetic tree in Proceedings of International Symposium on Artificial Life and Robotics
2000
, vol. 
2000
 (pg. 
745
-
748
)
Ota
R.
Waddell
P. J.
Hasegawa
M.
Shimodaira
H.
Kishino
H.
Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters
Mol. Biol. Evol.
2000
, vol. 
17
 (pg. 
798
-
803
)
Penny
D.
Lockhart
P. J.
Steel
M. A.
Hendy
M. D.
Scotland
R. W.
Siebert
D. J.
Williams
D. M.
The role of models in reconstructing evolutionary trees
Models in Phylogenetic Reconstruction
1994
Oxford
Clarendon Press
(pg. 
211
-
230
Pages
Pol
D.
Empirical problems of the hierarchical likelihood ratio test for model selection
Syst. Biol.
 
in press
Popper
K. R.
Logic of scientific discovery
1959
London
Hutchinson
Posada
D.
The effect of branch length variation on the selection of models of molecular evolution
J. Mol. Evol.
2001
, vol. 
52
 (pg. 
434
-
444
)
Posada
D.
Baxevanis
A. D.
Davison
D. B.
Page
R. D. M.
Petsko
G. A.
Stein
L. D.
Stormo
G. D.
Using Modeltest and PAUP* to select a model of nucleotide substitution
Current Protocols in Bioinformatics
2003
John Wiley & Sons, Inc.
(pg. 
6.5.1
-
6.5.14
Pages
Posada
D.
Crandall
K. A.
Modeltest: Testing the model of DNA substitution
Bioinformatics
1998
, vol. 
14
 (pg. 
817
-
818
)
Posada
D.
Crandall
K. A.
Selecting models of nucleotide substitution: An application to human immunodeficiency virus 1 (HIV-1)
Mol. Biol. Evol.
2001a.
, vol. 
18
 (pg. 
897
-
906
)
Posada
D.
Crandall
K. A.
Selecting the best-fit model of nucleotide substitution
Syst. Biol.
2001b.
, vol. 
50
 (pg. 
580
-
601
)
Posada
D.
Crandall
K. A.
Simple (wrong) models for complex trees: Empirical Bias
Mol. Biol. Evol.
2001c.
, vol. 
18
 (pg. 
271
-
275
)
Pupko
T.
Huchon
D.
Cao
Y.
Okada
N.
Hasegawa
M.
Combining multiple data sets in a likelihood analysis: Which models are the best? Mol
Biol. Evol.
2002
, vol. 
19
 (pg. 
2294
-
2307
)
Raftery
A. E.
Gilks
W. R.
Richardson
S.
Spiegelhalter
D. J.
Hypothesis testing and model selection
Markov chain Monte Carlo in practice
1996
London, New York
Chapman & Hall
(pg. 
163
-
187
Pages
Raftery
A. E.
Bayes factors and BIC: Comment on “A critique of the Bayesian information criterion for model selection”
Sociol. Methods Res.
1999
, vol. 
27
 (pg. 
411
-
427
)
Robinson
D. F.
Foulds
L. R.
Comparison of phylogenetic trees
Math. Biosci.
1981
, vol. 
53
 (pg. 
131
-
147
)
Rzhetsky
A.
Nei
M.
Tests of applicability of several substitution models for DNA sequence data
Mol. Biol. Evol.
1995
, vol. 
12
 (pg. 
131
-
151
)
Sakamoto
Y.
Ishiguro
M.
Kitagawa
G.
Akaike information criterion statistics
1986
New York
Springer
Sanderson
M. J.
Kim
J.
Parametric phylogenetics? Syst
Biol.
2000
, vol. 
49
 (pg. 
817
-
829
)
Schwarz
G.
Estimating the dimension of a model
Ann. Stat.
1978
, vol. 
6
 (pg. 
461
-
464
)
Shafer
G.
Lindley's paradox (with discussion)
J. Am. Stat. Assoc.
1982
, vol. 
77
 (pg. 
325
-
351
)
Shibata
R.
Consistency of model selection and parameter estimation
J. Appl. Prob.
1986
, vol. 
23A
 (pg. 
127
-
141
)
Shimodaira
H.
Assessing the error probability of the model selection test
Ann. Inst. Stat. Math.
1997
, vol. 
49
 (pg. 
395
-
410
)
Shimodaira
H.
An application of multiple comparison techniques to model selection
Ann. Inst. Stat. Math.
1998
, vol. 
1
 (pg. 
1
-
13
)
Shimodaira
H.
Multiple comparisons of log-likelihoods and combining nonnested models with applications to phylogenetic tree selection
Commun. Stat. Theory Methods
2001
, vol. 
30
 (pg. 
1751
-
1772
)
Shimodaira
H.
Hasegawa
M.
Multiple comparisons of log-likelihoods with applications to phylogenetic inference
Mol. Biol. Evol.
1999
, vol. 
16
 (pg. 
1114
-
1234
)
Sober
E.
Swinburne
R.
Bayesianism—its scope and limits
Bayes's Theorem
2002a
Oxford
Oxford University Press
(pg. 
21
-
38
Pages
Sober
E.
Instrumentalism, parsimony, and the Akaike framework
Phil. Sci.
2002b
, vol. 
69
 (pg. 
S112
-
S123
)
Sober
E.
Steel
M.
Testing the hypothesis of common ancestry
J. Theoret. Biol.
2002
, vol. 
218
 (pg. 
395
-
408
)
Sota
T.
Vogler
A. P.
Incongruence of mitochondrial and nuclear gene trees in the Carabid beetles Ohomopterus
Syst. Biol.
2001
, vol. 
50
 (pg. 
39
-
59
)
Steel
M.
Penny
D.
Parsimony, likelihood, and the role of models in molecular phylogenetics
Mol. Biol. Evol.
2000
, vol. 
17
 (pg. 
839
-
850
)
Stone
M.
An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion
J. R. Stat. Soc.
1977
, vol. 
39
 (pg. 
44
-
47
)
Strimmer
K.
Model selection using expected likelihood weights: A Bayes-frequentist compromise
2001
 
Strimmer
K.
Rambaut
A.
Inferring confidence sets of possibly misspecified gene trees
Proc. R. Soc. Lond. B Biol. Sci.
2001
, vol. 
269
 (pg. 
137
-
142
)
Suchard
M. A.
Kitchen
C. M.
Sinsheimer
J. S.
Weiss
R. E.
Hierarchical phylogenetic models for analyzing multipartite sequence data
Syst. Biol.
2003a.
, vol. 
52
 (pg. 
649
-
664
)
Suchard
M. A.
Weiss
R. E.
Dorman
K. S.
Sinsheimer
J. S.
Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage
Syst. Biol.
2002
, vol. 
51
 (pg. 
715
-
728
)
Suchard
M. A.
Weiss
R. E.
Sinsheimer
J. S.
Bayesian selection of continuous-time Markov chain evolutionary models
Mol. Biol. Evol.
2001
, vol. 
18
 (pg. 
1001
-
1013
)
Suchard
M. A.
Weiss
R. E.
Sinsheimer
J. S.
Testing a molecular clock without an outgroup: Derivations of induced priors on branch-Length restrictions in a Bayesian framework
Syst. Biol.
2003b.
, vol. 
52
 (pg. 
48
-
54
)
Sugiura
N.
Further analysis of the data by Akaike's information criterion and the finite corrections
Commun. Stat. Theory Methods A
1978
, vol. 
7
 (pg. 
13
-
26
)
Sullivan
J.
Swofford
D. L.
Are guinea pigs rodents? The importance of adequate models in molecular phylogenies
J. Mamm. Evol.
1997
, vol. 
4
 (pg. 
77
-
86
)
Sullivan
J.
Swofford
D. L.
Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst
Biol.
2001
, vol. 
50
 (pg. 
723
-
729
)
Suzuki
Y.
Glazko
G. V.
Nei
M.
Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics
Proc. Natl. Acad. Sci. USA
2002
, vol. 
99
 (pg. 
16138
-
16143
)
Swofford
D. L.
PAUP* Phylogenetic analysis using parsimony and other methods, version 4.0. beta
1998
Sunderland, Massachusetts
Sinauer Associates
Swofford
D. L.
PAUP* Phylogenetic analysis using parsimony (*and other methods). version 4
2000
Sunderland, Massachusetts
Sinauer Associates
Tamura
K.
Model selection in the estimation of the number of nucleotide substitutions
Mol. Biol. Evol.
1994
, vol. 
11
 (pg. 
154
-
157
)
Tamura
K.
Nei
M.
Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees
Mol. Biol. Evol.
1993
, vol. 
10
 (pg. 
512
-
526
)
Tanaka
H.
Ren
F.
Okayama
T.
Gojobori
T.
Topology selection in unrooted molecular phylogenetic tree by minimum model-based complexity method
Pac. Symp. Biocomput.
1999
, vol. 
4
 (pg. 
326
-
337
)
Tavaré
S.
Miura
R. M.
Some probabilistic and statistical problems in the analysis of DNA sequences
Some mathematical questions in biology—DNA sequence analysis
1986
American Mathematical Society
(pg. 
57
-
86
Pages
 
Providence, Rhode Island
Van Den Bussche
R. A.
Baker
R. J.
Huelsenbeck
J. P.
Hillis
D. M.
Base compositional bias and phylogenetic analyses: A test of the “flying DNA” hypothesis
Mol. Phylogenet. Evol.
1998
, vol. 
10
 (pg. 
408
-
416
)
Verdinelli
I.
Wasserman
L.
Computing Bayes factors using a generalization of the Savage-Dickey density ratio
J. Am. Stat. Assoc.
1995
, vol. 
90
 (pg. 
614
-
618
)
Vuong
Q. H.
Likelihood ratio tests for model selection and non-nested hypotheses
Econometrica
1989
, vol. 
57
 (pg. 
307
-
333
)
Wasserman
L.
Bayesian model selection and model averaging
J. Math. Psychol.
2000
, vol. 
44
 (pg. 
92
-
107
)
Weakliem
D. L.
A critique of the Bayesian information criterion for model selection
Sociol. Methods Res.
1999
, vol. 
27
 (pg. 
359
-
397
)
Whelan
S.
Goldman
N.
Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics
Mol. Biol. Evol.
1999
, vol. 
16
 (pg. 
1292
-
1299
)
Woodroofe
M.
On the model selection and the arc sine laws
Ann. Stat.
1982
, vol. 
10
 (pg. 
1182
-
1194
)
Yang
Z.
Among-site rate variation and its impact on phylogenetic analysis
Trends Ecol. Evol.
1996a
, vol. 
11
 (pg. 
367
-
372
)
Yang
Z.
Maximum-likelihood models for combined analyses of multiple sequence data
J. Mol. Evol.
1996b
, vol. 
42
 (pg. 
587
-
596
)
Yang
Z.
Goldman
N.
Friday
A.
Maximum likelihood trees from DNA sequences: A peculiar statistical estimation problem
Syst. Biol.
1995
, vol. 
44
 (pg. 
384
-
399
)
Yang
Z.
Nielsen
R.
Goldman
N.
Pedersen
A.-M. K.
Codon-substitution models for heterogeneous selection pressure at amino acid sites
Genetics
2000
, vol. 
155
 (pg. 
431
-
449
)
Yang
Z.
Rannala
B.
Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method
Mol. Biol. Evol.
1997
, vol. 
14
 (pg. 
717
-
724
)
Zhang
J.
Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models
Mol. Biol. Evol.
1999
, vol. 
16
 (pg. 
868
-
875
)
Zharkikh
A.
Estimation of evolutionary distances between nucleotide sequences
J. Mol. Evol.
1994
, vol. 
39
 (pg. 
315
-
329
)
Zucchini
W.
An introduction to model selection
J. Math. Psychol.
2000
, vol. 
44
 (pg. 
41
-
46
)
Associate Editor: Jeffrey Thorne
Jeffrey Thorne
Associate Editor
Search for other works by this author on: