Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests

Posada, David; Buckley, Thomas R.

doi:10.1080/10635150490522304

Abstract

Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001).

AIC, Bayes factors, BIC, likelihood ratio tests, model averaging, model uncertainty, model selection, multimodel inference

It is clear that models of nucleotide substitution (henceforth models of evolution) play a significant role in molecular phylogenetics, particularly in the context of distance, maximum likelihood (ML), and Bayesian estimation. We know that the use of one or other model affects many, if not all, stages of phylogenetic inference. For example, estimates of phylogeny, substitution rates, bootstrap values, posterior probabilities, or tests of the molecular clock are clearly influenced by the model of evolution used in the analysis (Buckley, 2002; Buckley and Cunningham, 2002; Buckley et al., 2001; Kelsey et al., 1999; Pupko et al., 2002; Sullivan and Swofford, 1997, 2001; Suzuki et al., 2002; Tamura, 1994; Yang et al., 1995; Zhang, 1999). We can argue, in general, that phylogenetic methods are less accurate (that is, they recover an incorrect phylogeny more often), or become inconsistent (converging to an incorrect tree with increasing number of characters) when the model of evolution assumed is wrong (Bruno and Halpern, 1999; Felsenstein, 1978; Huelsenbeck and Hillis, 1993; Penny et al., 1994). It is evident that the use of appropriate models is essential if we are to be confident in the results of a phylogenetic analysis, and indeed, several strategies for model choice have been proposed in the context of phylogenetics. We refer the reader to Johnson and Omland (2003), Posada and Crandall (2001b) and Posada (2001) for a detailed introduction, and for an evaluation of the performance of these methods to recover the model generating the data. Computer programs exist that implement these methods (Adachi and Hasegawa, 1996; Posada and Crandall, 1998). Among the available methods for model selection in phylogenetics, hierarchical likelihood ratio tests (hLRTs) are the most popular. However, here we argue that the hLRTs approach is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two allow for assessment of model selection uncertainty and model averaging.

Model Selection

Before proceeding further, it is worth reiterating the fact that any model of evolution we can construct is never going to be the “true model” that generated the data we observe. In other words, the set of models is misspecified. All models are wrong but some are useful (Box, 1976), and model selection is best seen as a way of approximating, rather than identifying, full reality (Burnham and Anderson, 2003, pp. 20–23). Statistical model selection is commonly based on William of Occam's (ca.1320) parsimony principle,1 by which hypotheses should be kept as simple as possible. In statistical terms, this is a trade-off between bias (distance between the average estimate and truth) and variance (spread of the estimates around the truth) (Fig. 1). The idea is that by adding parameters to a model we obtain improvement in fit (see below) to some degree, but at the same time parameter estimates are “worse” because we have less data (i.e., information) per parameter. In addition, the computations typically require more time. So the question is how complex should the model be for a given problem.

Figure 1

The principle of parsimony. Model selection is more or less based on the trade-off between bias and variance versus the number of estimable parameters in the model. The principle of parsimony tells us that as we increase the number of parameters in a model the bias decreases but the variance increases. This principle underlies all model selection approaches.

Open in new tab Download slide

The Likelihood Function

We referred above to the fit of a model to the data, but we have not yet explained how we measure this fit. In most cases, the fit of a model is measured by the likelihood function (see Edwards, 1972; Fisher, 1921), and in phylogenetics (see Felsenstein, 1981a; Goldman, 1990) we define the likelihood (L) as (proportional to) the probability of the data (D) given a model of evolution (M), a vector of K model parameters θ = (θ₁, θ₂, …,θ_K), a tree topology (τ), and a vector of S branch lengths, ν = (ν₁, ν₂, …,ν_S):

If the goal is to compute the likelihood of a given model, then θ, T, and ν are nuisance parameters—they affect the likelihood calculation but they are not really what we want to infer—and they should somehow be eliminated from the inference. A common strategy to remove nuisance parameters is to assume that they take those values that maximize the overall likelihood, thus reducing the likelihood to a function of the parameters of interest. What is usually done in practice is to estimate a tree (topology and branch lengths) from the data and then—implicitly assuming that this tree is the maximum likelihood tree for every candidate model—calculate maximum likelihood estimates of all model parameters, including the branch lengths, for every model given this tree. In this way we obtain the maximized (log) likelihood under model M:

where ^ means “estimate of” (⁠

is an estimate of θ). The strategy just described is sometimes called joint estimation. A different strategy to remove nuisance parameters is to assign them prior probabilities and integrate them out to obtain the marginalprobability of the data given only the model, that is, the model likelihood (also called integrative, marginal, or predictive likelihood):

However, this multidimensional integral can be very difficult to compute, and it is typically approximated using computationally intensive techniques like Markov chain Monte Carlo (MCMC) (Gilks et al., 1996; Hastings, 1970; Metropolis et al., 1953). Steel and Penny (2000) and Holder and Lewis (2003) provide an instructive discussion on joint and marginal estimation in the context of phylogenetics.

Hierarchical Likelihood Ratio Tests

The most popular strategy for model selection in phylogenetics are the hierarchical likelihood ratio tests (hLRTs) (Frati et al., 1997; Huelsenbeck and Crandall, 1997; Posada and Crandall, 1998) (Fig. 2). This method usually consists of performing pairwise likelihood ratio tests in a specific sequence until a final model is converged on that cannot be rejected. By means of the LRTs, we compare the maximized log-likelihoods of the null (ℓ₀) and the alternative (ℓ₁) models, and if the associated P-value is smaller than the predefined threshold (the significance level, usually 0.05), we say that alternative model fits the data significantly better than the null model (i.e., we reject the null model), and vice versa.

Figure 2

Hierarchical likelihood ratio tests (hLRTs). This figure illustrates an arbitrary hierarchy of LRTs for six different models. Within each LRT, the null model is depicted above the alternative model. When the LRT is not significant, the null model (above) is accepted (A), and it becomes the null model of the next LRT. When the LRT is significant, the null model is rejected (R) and the alternative model (below) becomes the null model of the next LRT. There are six possible paths depending on the outcome of the individual LRTs, and each path results in the selection of a different model. JC69: Jukes-Cantor model (Jukes and Cantor, 1969); K80: Kimura 1980 model (Kimura, 1980), also known as K2P; F81: Felsenstein 81 model (Felsenstein, 1981b); HKY85: Hasegawa-Kishino-Yano model (Hasegawa et al., 1985); SYM, symmetrical model (Zharkikh, 1994); GTR: general-time reversible model (Tavaré, 1986), also known as REV.

Open in new tab Download slide

The approximation of this P-value is straightforward for nested models, using a standard or mixed χ² distribution (Goldman, 1993; Goldman and Whelan, 2000; Kendall and Stuart, 1979; Ota et al., 2000). Two models are nested when one of them, the null model, is a special case of the other, the alternative model. For example, the Jukes-Cantor model (Jukes and Cantor, 1969) (JC69) is nested within the Kimura two-parameter model (Kimura, 1980) (K80), because if we assume that transitions and transversions occur at the same rate (i.e., κ = 1), K80 collapses to JC69. However, obtaining correctP-values for the LRT statistics can be difficult. LRTs implicitly assume that at least one of the models compared is correct, and when the models are misspecified these tests can often be incorrect (Foutz and Srivastava, 1977; Golden, 1995; Kent, 1982). Although proper LRTs can be constructed when models are wrong (Vuong, 1989), standard LRTs in phylogenetics are not robust to model misspecification (Zhang, 1999). When the models are non-nested, the χ² approximation is not longer valid, and more computationally intensive Monte Carlo methods are needed (Goldman, 1993; Whelan and Goldman, 1999). In addition, when sample size is small the usual asymptotic approximation on which P-values are based no longer applies.

Furthermore, LRTs were designed for hypothesis testing, and although classical hypothesis testing is commonly used as a model selection strategy, it has been argued that hypothesis testing and model selection are distinct issues (Burnham and Anderson, 2003, pp. 132–134). A stepwise procedure like the hLRTs, in which we sequentially decide whether to add (or remove) certain parameters, is analogous to forward and backward selections in best-subset linear regression (Miller, 2002, pp. 39–46), which do not guarantee finding the optimal model. As pointed out by Sanderson and Kim (2000), we can identify several potential problems with the use of hLRTs for model selection in phylogenetics. There exist situations in which an optimal model may not exist for the hLRTs procedure. This kind of situation occurs, for example, if the general time-reversible model (Tavaré, 1986) (GTR) is not significantly better than the Hasegawa et al. model (1985) (HKY85), HKY85 is not significantly better than JC69, but GTR is significantly better than JC69. Even if an optimal model exists, it will be always a function of the significance level, and the outcome of the model choice procedure may vary accordingly. In addition, the hLRTs approach performs multiple tests with the same data, and this will increase the rate of false positives (that is, to reject the null hypothesis when it is true): the probability of falsely rejecting the null hypothesis at least once in n tests is 1−(1−α)ⁿ. Although there are statistical procedures to correct for this effect—like the Bonferroni correction (see Hochberg, 1988)—here the tests are nonindependent, and the appropriate adjustment can be very complex (see also Shimodaira, 1998, 2001; Shimodaira and Hasegawa, 1999). The outcome of the hLRTs might also be affected by the starting model (for the hLRTs procedure we need to select a starting point, usually represented by the simplest or the most complex model in the set of candidate models). In addition, there are cases in which the hLRTs will not select the best model, according to its own criteria, among the candidate models.

Indeed, these problems can have an impact on the analysis of real data sets, and we have analyzed a set of HIV sequences (Posada and Crandall, 2001a) for illustrative purposes (Fig. 3) (Pol, in press). In Figure 3a we can see a case in which an optimal model does not exist, as all of the three models are rejected when compared with one of the other two. However, we will select HKY85 as the best fit (because we did not compare HKY85 and GTR). Also, note that increasing the significance level (Fig. 3b) changes the outcome, as GTR now becomes the best fit model. With a different set of candidate models, and if we start with HKY85, the model selected will be HKY85 (Fig. 3c), which is a suboptimal choice, whereas if we start with GTR the model selected will be GTR (Fig. 3d), which is actually the optimal model. We cannot devise a hierarchy of hLRTs that overcomes all these problems at once, but better approaches exist than simply forward and backward selection (Miller, 2002).

Figure 3

Problems of hLRTs with a real data set. See text for further details. The data set analyzed is an alignment of 12 HIV-1 subtype D sequences of a fragment of 1462 nucleotides from the gag region (Posada and Crandall, 2001a). K81uf is the Kimura 1981 model (Kimura, 1981) with unequal base frequencies. TN93 is the Tamura-Nei model (Tamura and Nei, 1993). Solid arrows indicate the outcome of the LRT performed, whereas discontinuous arrows indicate the outcome of a potential LRT not performed. P is the associated P-value of the LRTs. The underlined model is the starting point of the hLRT, the best model according to all LRTs is indicated with an asterisk, and the model selected is enclosed within a square.

Open in new tab Download slide

Bayesian Model Selection

Model selection is an integral part of Bayesian estimation (Gelfand, 1996; Raftery, 1996; Wasserman, 2000), and within this framework, different strategies exist to accomplish the same tasks.

Bayes Factors

Bayes factors (Kass and Raftery, 1995) are the Bayesian analogue of the LRT (Suchard et al., 2003a). They contrast the evidence provided by the data for two competing models, i and j, as:

Evidence for M_i is considered very strong if B_ij > 150, strong if 12 < B_ij < 150, positive if 3 < B_ij < 12, barely worth mentioning if 1 < B_ij < 3, and negative (supports M_j) if B_ij < 1 (Raftery, 1996). It is important to note that Bayes factors compare model likelihoods orP(D| M), which are calculated by integrating—not maximizing—over all possible parameter values (except in empirical Bayesian approaches, where maximum likelihood estimates can be used instead). Therefore we should not confound them with the log of the maximized likelihoods (ℓ) used in the LRTs and AIC. Bayes factors are already being used in the context of phylogenetics, for example to infer the occurrence of recombination events (Suchard et al., 2002), to compare different phylogenetic hypothesis (Huelsenbeck and Imennov, 2002; Huelsenbeck et al., 2000; Suchard et al., 2003b) and for model selection (Aris-Brosou and Yang, 2002; Huelsenbeck et al., 2004; Nylander et al., 2004; Suchard et al., 2001).

Posterior Probabilities

When multiple models are considered, the usual Bayesian solution is to choose the model with the highest posterior probability (Kass and Raftery, 1995; Raftery, 1996; Wasserman, 2000). For R models, the posterior probability of the ith model is:

A word is needed about model prior probabilities P(M_i). Although models are commonly assigned equal prior probabilities, in phylogenetics we may have prior beliefs stating that some models are more probable than others. For example, we have enough information about the process of mitochondrial sequence evolution to believe that the JC69 model is less probable in this case than the HKY85 model with a gamma distribution for rates among sites (see Yang, 1996a). Ideally, this information should be reflected in the model priors, and although considerable Bayesian research exists on eliciting prior information (Kadane and Wolfson, 1998; Madigan et al., 1995), it still seems be very difficult to quantify. Fortunately, if the signal in the data, conveyed through the likelihood, is strong enough, then the prior distributions should not have a large influence on the posterior distribution. Indeed, posterior probabilities of trees are already being used to estimate phylogenies (Holder and Lewis, 2003; Huelsenbeck et al., 2001, 2002; Larget and Simon, 1999; Mau and Newton, 1997; Mau et al., 1999; Yang and Rannala, 1997).

When the priors for the parameters in the complex model are very diffuse, Bayesian approaches tend to support the null model in contradiction to significance tests (e.g., LRTs) as sample size increases—the so called Jeffreys-Lindley's paradox (Bartlett, 1957; Jeffreys, 1939; Lindley, 1957; Shafer, 1982). If the diffuseness of these priors arises because of mere ignorance of the values these parameters can take, this conflict highlights a disadvantage of Bayesian approaches, especially in the case of Bayesian Information Criterion (BIC) (see below), which assume flat, improper priors. In any case, Jeffreys-Lindley's paradox illustrates the relevance, for good or for bad, of the priors we choose for the model parameters (Huelsenbeck et al., 2002). Moreover, in some situations Bayesian approaches and standard significance tests can also be irreconcilable when testing point (or sharp) null hypotheses, for example, H₀: ti/tv = 0.5 versus H₁: ti/tv ≠ 0.5 (Berger and Sellke, 1987) (ti/tv is the transition/transversion ratio).

Bayesian Information Criterion

In order to calculate model likelihoods, Bayesian methods often require computationally intensive techniques like Markov chain Monte Carlo (Gilks et al., 1996; Hastings, 1970; Metropolis et al., 1953). Although easy to implement, Bayes factor calculations do exist for some nested models via the Savage-Dickey ratio (Suchard et al., 2001; Verdinelli and Wasserman, 1995). However, there is a computationally more tractable approach, the Bayesian Information Criterion (BIC) (Schwarz, 1978):

where K is the number of estimable parameters, and n is the sample size (for now we assume that n can be approximated by the total number of characters in the alignment). The BIC was developed as an approximation to the log marginal likelihood of a model, and therefore, the difference between two BIC estimates may be a good approximation to the natural log of the Bayes factor (Kass and Wasserman, 1995). Given equal priors for all competing models, choosing the model with the smallest BIC is equivalent to selecting the model with the maximum posterior probability. The BIC assumes that the (parameters) prior is the unit information prior (i.e., a multivariate normal prior with mean at the maximum likelihood estimate and variance equal to the expected information matrix for one observation) (Kass and Wasserman, 1995), which can be thought of as a prior distribution that contains the same amount of information as a single, typical observation. This prior is quite diffuse, so the BIC tends to select models that are less complex than Bayes factors (for discussion see Raftery, 1999; Weakliem, 1999), and if n > 8, the BIC selects simpler models than the AIC (Forster and Sober, 2004). However, Burnham and Anderson (2003, pp. 302–305) suggest that the BIC can be used more generally with any prior.

A collection of BIC statistics contains the same information as a collection of pairwise Bayes factors. However, when choosing among several models, the BIC statistics are easier to interpret by visual inspection, as they allow for the simultaneous comparison of multiple models, so the best-fit models can be immediately identified. On the other hand, selecting the best-fit model from a collection of multiple pairwise Bayes factors could be more burdensome, and such procedure might suffer from some of the problems described above for the hLRTs. Nevertheless, the BIC approximation might not be appropriate when the posterior mode occurs at the boundary of the parameter space (Hsiao, 1997; Ota et al., 2000).

Decision Theoretic Approaches

Recently, Minin et al. (2003) applied decision theory (Bernardo and Smith, 1994) to develop a novel model selection strategy (the DT method) that extends the BIC. Minin et al. (2003) argue that there is no guarantee that the best-fit models will produce the best estimates of phylogeny, and therefore propose a model selection method that incorporates some measure of phylogenetic performance. They assess models through a penalty or loss function, related to how dissimilar the branch length estimates are across models, and pick the model with the minimum posterior loss. As expected, simulations suggested that models selected with this criterion result in slightly more accurate branch length estimates than those obtained under models selected by the hLRTs.

Model Selection Uncertainty

Once we have selected a model it is very important that we are able to assess how confident we are in that selection (see Chatfield, 1995). We would like to be able to rank the models and to know whether the model selected is much better than the other candidate models. At the same time, we should be interested to learn whether we would select the same model if several other independent samples were available. The assessment of model selection uncertainty has a long tradition within the Bayesian community and posterior probabilities can be naturally used to take account of model uncertainty (Kass and Raftery, 1995; Madigan and Raftery, 1994). For example, models can be ranked according to their posterior probabilities and 95% credible intervals (Occam's Window) can easily be constructed by summing these probabilities (Madigan and Raftery, 1994). Although computing posterior probabilities can be hard and time consuming, in theory we could approximate those probabilities with the BIC. Furthermore, we could also use the BIC values or posterior risks of the DT method (Minin et al., 2003) in the same way that we use the AIC below above to assess model selection uncertainty, although this could be considered ad hoc (see Hoeting et al., 1999).

Model Averaging

Although in general model selection is concerned with the selection of just the best fit model, Bayesian approaches allow us to make inferences based on the entire set of candidate models, or model averaging (Hoeting et al., 1999; Madigan and Raftery, 1994; Raftery, 1996; Wasserman, 2000). Indeed, obtaining model averaged phylogenetic estimates is straightforward (Posada, 2003). If we consider, for example, G models that include the gamma distribution for rate variation among sites (Yang, 1996a), the overall posterior mean of the shape of the gamma distribution (α) would be:

where

is the estimate of α for model i.

Because not all parameters have the same interpretation across models, we should be careful when calculating and interpreting model-averaged parameter estimates. For example, the gamma shape parameter describing among-site rate variation has a different interpretation depending on whether the model also includes a proportion of invariable sites, because in such a case only the rates at variable sites, and not at all sites, are gamma-distributed. To facilitate a correct interpretation we could obtain two separate model-averaged estimates of the gamma shape parameter, one from models that include a proportion of invariable sites, and another from models that do not include a proportion of invariable sites. Moreover, from the above formulation we can see that it would be easy to estimate the relativeimportance of any parameter by summing the posterior probabilities across all models that included the parameters we are interested in. For example, the relative importance (w₊) for the shape of the gamma distribution across all candidate models is simply:

where

We also need to be careful when interpreting the relative importance of parameters. When the number of candidate models is less than the number of possible combinations of parameters, the presence-absence of some pairs of parameters can be correlated, and so their relative importances. In other words, if parameter ɛ actually has a high relative importance, then a second parameter η might yield a high relative importance simply because the presence-absence of parameters ɛ and η among models is positively correlated. For the 56 models in Table 1, the presence of the different base frequencies parameters (π) is completely correlated, whereas the presence of several substitution rates (ϕ) show complete or high levels of correlation. The presence of parameter κ is inversely correlated with that of several substitution rate parameters (e.g., ϕ_{A − G}). The presence of α, the shape of the gamma distribution for rate variation among sites, or p_inv, the proportion of invariable sites, is not correlated with that of any other parameter.

Table 1.

Open in new tab

AIC_c values, AIC_c differences (Δ), and Akaike weights (w) for the carabid beetles Ohomopterus mitochondrial DNA data set from Sota and Vogler (2001). Because branch lengths were estimated for each candidate model, the number of branches was included in the penalty parameter K (= number of parameters). ℓ are the maximized log likelihoods and Cum(w) are the cumulative Akaike weights

Model	ℓ	K	AIC_c	Δ AIC_c	w	Cum(w)
TN93+I+Γ	5441.4600	78	11045.5888	0.0000	0.5221	0.5221
TIM+I+Γ	5441.3765	79	11047.5965	2.0077	0.1913	0.7134
HKY85+I+Γ	5443.6729	77	11047.8422	2.2534	0.1692	0.8826
K81uf+I+Γ	5443.5566	78	11049.7821	4.1934	0.0641	0.9468
GTR+I+Γ	5440.9150	81	11051.0301	5.4413	0.0344	0.9811
TVM+I+Γ	5442.7393	80	11052.4991	6.9103	0.0165	0.9976
TN93+Γ	5448.6792	77	11057.8549	12.2661	0.0011	0.9988
HKY85+Γ	5450.5068	76	11059.3402	13.7514	0.0005	0.9993
TIM+Γ	5448.6577	78	11059.9843	14.3955	0.0004	0.9997
K81uf+Γ	5450.4883	77	11061.4730	15.8843	0.0002	0.9999
GTR+Γ	5448.0298	80	11063.0802	17.4914	0.0001	1.0000
TVM+Γ	5449.6685	79	11064.1804	18.5917	0.0000	1.0000
TN93+I	5470.7568	77	11102.0102	56.4214	0.0000	1.0000
TIM+I	5470.7417	78	11104.1522	58.5635	0.0000	1.0000
GTR+I	5470.3452	80	11107.7110	62.1223	0.0000	1.0000
HKY85+I	5476.8496	76	11112.0257	66.4370	0.0000	1.0000
K81uf+I	5476.8208	77	11114.1381	68.5493	0.0000	1.0000
TVM+I	5476.1650	79	11117.1736	71.5849	0.0000	1.0000
F81+I+Γ	5769.1118	76	11696.5501	650.9614	0.0000	1.0000
F81+Γ	5782.0566	75	11720.2721	674.6834	0.0000	1.0000
F81+I	5807.4927	75	11771.1442	725.5554	0.0000	1.0000
GTR	5805.0576	79	11774.9588	729.3700	0.0000	1.0000
TVM	5808.4727	78	11779.6141	734.0254	0.0000	1.0000
TIM	5810.4102	77	11781.3168	735.7280	0.0000	1.0000
TN93	5813.4780	76	11785.2825	739.6938	0.0000	1.0000
K81uf	5813.5190	76	11785.3646	739.7758	0.0000	1.0000
HKY85	5816.5894	75	11789.3375	743.7488	0.0000	1.0000
SYM+I+Γ	5861.0859	78	11884.8407	839.2520	0.0000	1.0000
TVMef+I+Γ	5867.6128	77	11895.7221	850.1333	0.0000	1.0000
SYM+Γ	5876.7803	77	11914.0570	868.4683	0.0000	1.0000
TVMef+Γ	5884.4272	76	11927.1810	881.5922	0.0000	1.0000
TIMef+I+Γ	5885.0684	76	11928.4632	882.8745	0.0000	1.0000
K81+I+Γ	5893.7642	75	11943.6872	898.0984	0.0000	1.0000
TN93ef+I+Γ	5897.7529	75	11951.6647	906.0759	0.0000	1.0000
TIMef+Γ	5899.2588	75	11954.6764	909.0877	0.0000	1.0000
K80+I+Γ	5906.2329	74	11966.4593	920.8706	0.0000	1.0000
K81+Γ	5908.7876	74	11971.5687	925.9800	0.0000	1.0000
TN93ef+Γ	5911.5659	74	11977.1254	931.5366	0.0000	1.0000
SYM+I	5908.7021	77	11977.9008	932.3120	0.0000	1.0000
TVMef+I	5917.6128	76	11993.5521	947.9633	0.0000	1.0000
K80+Γ	5920.9038	73	11993.6382	948.0494	0.0000	1.0000
TIMef+I	5928.9629	75	12014.0846	968.4959	0.0000	1.0000
K81+I	5938.0137	74	12030.0209	984.4321	0.0000	1.0000
TN93ef+I	5940.7383	74	12035.4701	989.8813	0.0000	1.0000
K80+I	5949.5186	73	12050.8677	1005.2789	0.0000	1.0000
F81	6088.2227	74	12330.4388	1284.8501	0.0000	1.0000
JC69+I+Γ	6101.2656	73	12354.3618	1308.7730	0.0000	1.0000
JC69+Γ	6114.8408	72	12379.3515	1333.7628	0.0000	1.0000
JC69+I	6142.1719	72	12434.0137	1388.4249	0.0000	1.0000
SYM	6170.8916	76	12500.1097	1454.5209	0.0000	1.0000
TVMef	6190.3394	75	12536.8375	1491.2488	0.0000	1.0000
TIMef	6194.5806	74	12543.1547	1497.5659	0.0000	1.0000
TN93ef	6210.6353	73	12573.1011	1527.5123	0.0000	1.0000
K81	6214.1152	73	12580.0610	1534.4723	0.0000	1.0000
K80	6230.2100	72	12610.0898	1564.5011	0.0000	1.0000
JC69	6411.5161	71	12970.5438	1924.9551	0.0000	1.0000

Model	ℓ	K	AIC_c	Δ AIC_c	w	Cum(w)
TN93+I+Γ	5441.4600	78	11045.5888	0.0000	0.5221	0.5221
TIM+I+Γ	5441.3765	79	11047.5965	2.0077	0.1913	0.7134
HKY85+I+Γ	5443.6729	77	11047.8422	2.2534	0.1692	0.8826
K81uf+I+Γ	5443.5566	78	11049.7821	4.1934	0.0641	0.9468
GTR+I+Γ	5440.9150	81	11051.0301	5.4413	0.0344	0.9811
TVM+I+Γ	5442.7393	80	11052.4991	6.9103	0.0165	0.9976
TN93+Γ	5448.6792	77	11057.8549	12.2661	0.0011	0.9988
HKY85+Γ	5450.5068	76	11059.3402	13.7514	0.0005	0.9993
TIM+Γ	5448.6577	78	11059.9843	14.3955	0.0004	0.9997
K81uf+Γ	5450.4883	77	11061.4730	15.8843	0.0002	0.9999
GTR+Γ	5448.0298	80	11063.0802	17.4914	0.0001	1.0000
TVM+Γ	5449.6685	79	11064.1804	18.5917	0.0000	1.0000
TN93+I	5470.7568	77	11102.0102	56.4214	0.0000	1.0000
TIM+I	5470.7417	78	11104.1522	58.5635	0.0000	1.0000
GTR+I	5470.3452	80	11107.7110	62.1223	0.0000	1.0000
HKY85+I	5476.8496	76	11112.0257	66.4370	0.0000	1.0000
K81uf+I	5476.8208	77	11114.1381	68.5493	0.0000	1.0000
TVM+I	5476.1650	79	11117.1736	71.5849	0.0000	1.0000
F81+I+Γ	5769.1118	76	11696.5501	650.9614	0.0000	1.0000
F81+Γ	5782.0566	75	11720.2721	674.6834	0.0000	1.0000
F81+I	5807.4927	75	11771.1442	725.5554	0.0000	1.0000
GTR	5805.0576	79	11774.9588	729.3700	0.0000	1.0000
TVM	5808.4727	78	11779.6141	734.0254	0.0000	1.0000
TIM	5810.4102	77	11781.3168	735.7280	0.0000	1.0000
TN93	5813.4780	76	11785.2825	739.6938	0.0000	1.0000
K81uf	5813.5190	76	11785.3646	739.7758	0.0000	1.0000
HKY85	5816.5894	75	11789.3375	743.7488	0.0000	1.0000
SYM+I+Γ	5861.0859	78	11884.8407	839.2520	0.0000	1.0000
TVMef+I+Γ	5867.6128	77	11895.7221	850.1333	0.0000	1.0000
SYM+Γ	5876.7803	77	11914.0570	868.4683	0.0000	1.0000
TVMef+Γ	5884.4272	76	11927.1810	881.5922	0.0000	1.0000
TIMef+I+Γ	5885.0684	76	11928.4632	882.8745	0.0000	1.0000
K81+I+Γ	5893.7642	75	11943.6872	898.0984	0.0000	1.0000
TN93ef+I+Γ	5897.7529	75	11951.6647	906.0759	0.0000	1.0000
TIMef+Γ	5899.2588	75	11954.6764	909.0877	0.0000	1.0000
K80+I+Γ	5906.2329	74	11966.4593	920.8706	0.0000	1.0000
K81+Γ	5908.7876	74	11971.5687	925.9800	0.0000	1.0000
TN93ef+Γ	5911.5659	74	11977.1254	931.5366	0.0000	1.0000
SYM+I	5908.7021	77	11977.9008	932.3120	0.0000	1.0000
TVMef+I	5917.6128	76	11993.5521	947.9633	0.0000	1.0000
K80+Γ	5920.9038	73	11993.6382	948.0494	0.0000	1.0000
TIMef+I	5928.9629	75	12014.0846	968.4959	0.0000	1.0000
K81+I	5938.0137	74	12030.0209	984.4321	0.0000	1.0000
TN93ef+I	5940.7383	74	12035.4701	989.8813	0.0000	1.0000
K80+I	5949.5186	73	12050.8677	1005.2789	0.0000	1.0000
F81	6088.2227	74	12330.4388	1284.8501	0.0000	1.0000
JC69+I+Γ	6101.2656	73	12354.3618	1308.7730	0.0000	1.0000
JC69+Γ	6114.8408	72	12379.3515	1333.7628	0.0000	1.0000
JC69+I	6142.1719	72	12434.0137	1388.4249	0.0000	1.0000
SYM	6170.8916	76	12500.1097	1454.5209	0.0000	1.0000
TVMef	6190.3394	75	12536.8375	1491.2488	0.0000	1.0000
TIMef	6194.5806	74	12543.1547	1497.5659	0.0000	1.0000
TN93ef	6210.6353	73	12573.1011	1527.5123	0.0000	1.0000
K81	6214.1152	73	12580.0610	1534.4723	0.0000	1.0000
K80	6230.2100	72	12610.0898	1564.5011	0.0000	1.0000
JC69	6411.5161	71	12970.5438	1924.9551	0.0000	1.0000

Table 1.

Open in new tab

AIC_c values, AIC_c differences (Δ), and Akaike weights (w) for the carabid beetles Ohomopterus mitochondrial DNA data set from Sota and Vogler (2001). Because branch lengths were estimated for each candidate model, the number of branches was included in the penalty parameter K (= number of parameters). ℓ are the maximized log likelihoods and Cum(w) are the cumulative Akaike weights

Model	ℓ	K	AIC_c	Δ AIC_c	w	Cum(w)
TN93+I+Γ	5441.4600	78	11045.5888	0.0000	0.5221	0.5221
TIM+I+Γ	5441.3765	79	11047.5965	2.0077	0.1913	0.7134
HKY85+I+Γ	5443.6729	77	11047.8422	2.2534	0.1692	0.8826
K81uf+I+Γ	5443.5566	78	11049.7821	4.1934	0.0641	0.9468
GTR+I+Γ	5440.9150	81	11051.0301	5.4413	0.0344	0.9811
TVM+I+Γ	5442.7393	80	11052.4991	6.9103	0.0165	0.9976
TN93+Γ	5448.6792	77	11057.8549	12.2661	0.0011	0.9988
HKY85+Γ	5450.5068	76	11059.3402	13.7514	0.0005	0.9993
TIM+Γ	5448.6577	78	11059.9843	14.3955	0.0004	0.9997
K81uf+Γ	5450.4883	77	11061.4730	15.8843	0.0002	0.9999
GTR+Γ	5448.0298	80	11063.0802	17.4914	0.0001	1.0000
TVM+Γ	5449.6685	79	11064.1804	18.5917	0.0000	1.0000
TN93+I	5470.7568	77	11102.0102	56.4214	0.0000	1.0000
TIM+I	5470.7417	78	11104.1522	58.5635	0.0000	1.0000
GTR+I	5470.3452	80	11107.7110	62.1223	0.0000	1.0000
HKY85+I	5476.8496	76	11112.0257	66.4370	0.0000	1.0000
K81uf+I	5476.8208	77	11114.1381	68.5493	0.0000	1.0000
TVM+I	5476.1650	79	11117.1736	71.5849	0.0000	1.0000
F81+I+Γ	5769.1118	76	11696.5501	650.9614	0.0000	1.0000
F81+Γ	5782.0566	75	11720.2721	674.6834	0.0000	1.0000
F81+I	5807.4927	75	11771.1442	725.5554	0.0000	1.0000
GTR	5805.0576	79	11774.9588	729.3700	0.0000	1.0000
TVM	5808.4727	78	11779.6141	734.0254	0.0000	1.0000
TIM	5810.4102	77	11781.3168	735.7280	0.0000	1.0000
TN93	5813.4780	76	11785.2825	739.6938	0.0000	1.0000
K81uf	5813.5190	76	11785.3646	739.7758	0.0000	1.0000
HKY85	5816.5894	75	11789.3375	743.7488	0.0000	1.0000
SYM+I+Γ	5861.0859	78	11884.8407	839.2520	0.0000	1.0000
TVMef+I+Γ	5867.6128	77	11895.7221	850.1333	0.0000	1.0000
SYM+Γ	5876.7803	77	11914.0570	868.4683	0.0000	1.0000
TVMef+Γ	5884.4272	76	11927.1810	881.5922	0.0000	1.0000
TIMef+I+Γ	5885.0684	76	11928.4632	882.8745	0.0000	1.0000
K81+I+Γ	5893.7642	75	11943.6872	898.0984	0.0000	1.0000
TN93ef+I+Γ	5897.7529	75	11951.6647	906.0759	0.0000	1.0000
TIMef+Γ	5899.2588	75	11954.6764	909.0877	0.0000	1.0000
K80+I+Γ	5906.2329	74	11966.4593	920.8706	0.0000	1.0000
K81+Γ	5908.7876	74	11971.5687	925.9800	0.0000	1.0000
TN93ef+Γ	5911.5659	74	11977.1254	931.5366	0.0000	1.0000
SYM+I	5908.7021	77	11977.9008	932.3120	0.0000	1.0000
TVMef+I	5917.6128	76	11993.5521	947.9633	0.0000	1.0000
K80+Γ	5920.9038	73	11993.6382	948.0494	0.0000	1.0000
TIMef+I	5928.9629	75	12014.0846	968.4959	0.0000	1.0000
K81+I	5938.0137	74	12030.0209	984.4321	0.0000	1.0000
TN93ef+I	5940.7383	74	12035.4701	989.8813	0.0000	1.0000
K80+I	5949.5186	73	12050.8677	1005.2789	0.0000	1.0000
F81	6088.2227	74	12330.4388	1284.8501	0.0000	1.0000
JC69+I+Γ	6101.2656	73	12354.3618	1308.7730	0.0000	1.0000
JC69+Γ	6114.8408	72	12379.3515	1333.7628	0.0000	1.0000
JC69+I	6142.1719	72	12434.0137	1388.4249	0.0000	1.0000
SYM	6170.8916	76	12500.1097	1454.5209	0.0000	1.0000
TVMef	6190.3394	75	12536.8375	1491.2488	0.0000	1.0000
TIMef	6194.5806	74	12543.1547	1497.5659	0.0000	1.0000
TN93ef	6210.6353	73	12573.1011	1527.5123	0.0000	1.0000
K81	6214.1152	73	12580.0610	1534.4723	0.0000	1.0000
K80	6230.2100	72	12610.0898	1564.5011	0.0000	1.0000
JC69	6411.5161	71	12970.5438	1924.9551	0.0000	1.0000

Model	ℓ	K	AIC_c	Δ AIC_c	w	Cum(w)
TN93+I+Γ	5441.4600	78	11045.5888	0.0000	0.5221	0.5221
TIM+I+Γ	5441.3765	79	11047.5965	2.0077	0.1913	0.7134
HKY85+I+Γ	5443.6729	77	11047.8422	2.2534	0.1692	0.8826
K81uf+I+Γ	5443.5566	78	11049.7821	4.1934	0.0641	0.9468
GTR+I+Γ	5440.9150	81	11051.0301	5.4413	0.0344	0.9811
TVM+I+Γ	5442.7393	80	11052.4991	6.9103	0.0165	0.9976
TN93+Γ	5448.6792	77	11057.8549	12.2661	0.0011	0.9988
HKY85+Γ	5450.5068	76	11059.3402	13.7514	0.0005	0.9993
TIM+Γ	5448.6577	78	11059.9843	14.3955	0.0004	0.9997
K81uf+Γ	5450.4883	77	11061.4730	15.8843	0.0002	0.9999
GTR+Γ	5448.0298	80	11063.0802	17.4914	0.0001	1.0000
TVM+Γ	5449.6685	79	11064.1804	18.5917	0.0000	1.0000
TN93+I	5470.7568	77	11102.0102	56.4214	0.0000	1.0000
TIM+I	5470.7417	78	11104.1522	58.5635	0.0000	1.0000
GTR+I	5470.3452	80	11107.7110	62.1223	0.0000	1.0000
HKY85+I	5476.8496	76	11112.0257	66.4370	0.0000	1.0000
K81uf+I	5476.8208	77	11114.1381	68.5493	0.0000	1.0000
TVM+I	5476.1650	79	11117.1736	71.5849	0.0000	1.0000
F81+I+Γ	5769.1118	76	11696.5501	650.9614	0.0000	1.0000
F81+Γ	5782.0566	75	11720.2721	674.6834	0.0000	1.0000
F81+I	5807.4927	75	11771.1442	725.5554	0.0000	1.0000
GTR	5805.0576	79	11774.9588	729.3700	0.0000	1.0000
TVM	5808.4727	78	11779.6141	734.0254	0.0000	1.0000
TIM	5810.4102	77	11781.3168	735.7280	0.0000	1.0000
TN93	5813.4780	76	11785.2825	739.6938	0.0000	1.0000
K81uf	5813.5190	76	11785.3646	739.7758	0.0000	1.0000
HKY85	5816.5894	75	11789.3375	743.7488	0.0000	1.0000
SYM+I+Γ	5861.0859	78	11884.8407	839.2520	0.0000	1.0000
TVMef+I+Γ	5867.6128	77	11895.7221	850.1333	0.0000	1.0000
SYM+Γ	5876.7803	77	11914.0570	868.4683	0.0000	1.0000
TVMef+Γ	5884.4272	76	11927.1810	881.5922	0.0000	1.0000
TIMef+I+Γ	5885.0684	76	11928.4632	882.8745	0.0000	1.0000
K81+I+Γ	5893.7642	75	11943.6872	898.0984	0.0000	1.0000
TN93ef+I+Γ	5897.7529	75	11951.6647	906.0759	0.0000	1.0000
TIMef+Γ	5899.2588	75	11954.6764	909.0877	0.0000	1.0000
K80+I+Γ	5906.2329	74	11966.4593	920.8706	0.0000	1.0000
K81+Γ	5908.7876	74	11971.5687	925.9800	0.0000	1.0000
TN93ef+Γ	5911.5659	74	11977.1254	931.5366	0.0000	1.0000
SYM+I	5908.7021	77	11977.9008	932.3120	0.0000	1.0000
TVMef+I	5917.6128	76	11993.5521	947.9633	0.0000	1.0000
K80+Γ	5920.9038	73	11993.6382	948.0494	0.0000	1.0000
TIMef+I	5928.9629	75	12014.0846	968.4959	0.0000	1.0000
K81+I	5938.0137	74	12030.0209	984.4321	0.0000	1.0000
TN93ef+I	5940.7383	74	12035.4701	989.8813	0.0000	1.0000
K80+I	5949.5186	73	12050.8677	1005.2789	0.0000	1.0000
F81	6088.2227	74	12330.4388	1284.8501	0.0000	1.0000
JC69+I+Γ	6101.2656	73	12354.3618	1308.7730	0.0000	1.0000
JC69+Γ	6114.8408	72	12379.3515	1333.7628	0.0000	1.0000
JC69+I	6142.1719	72	12434.0137	1388.4249	0.0000	1.0000
SYM	6170.8916	76	12500.1097	1454.5209	0.0000	1.0000
TVMef	6190.3394	75	12536.8375	1491.2488	0.0000	1.0000
TIMef	6194.5806	74	12543.1547	1497.5659	0.0000	1.0000
TN93ef	6210.6353	73	12573.1011	1527.5123	0.0000	1.0000
K81	6214.1152	73	12580.0610	1534.4723	0.0000	1.0000
K80	6230.2100	72	12610.0898	1564.5011	0.0000	1.0000
JC69	6411.5161	71	12970.5438	1924.9551	0.0000	1.0000

Indeed, the averaged parameter could be the topology itself, so we could construct a model-averaged estimate of phylogeny. We will come back to this later.

Akaike Information Criterion

A different approach to model selection is the Akaike Information Criterion (AIC) (Akaike, 1973, 1974; and see Sakamoto et al., 1986). The AIC is an asymptotically unbiased estimator of the expected relative Kullback-Leibler information quantity or distance (K-L) (Kullback and Leibler, 1951), which represents the amount of information lost when we use model g to approximate model f (Fig. 4):

The Kullback-Leibler distance. The K-L distance aims to represent how close a model is to the truth. Here, M2 is the candidate model that best approximates truth and therefore it is the model with the smallest K-L distance. The AIC chooses the candidate model with the smallest expected K-L distance.

Figure 4

The Kullback-Leibler distance. The K-L distance aims to represent how close a model is to the truth. Here, M₂ is the candidate model that best approximates truth and therefore it is the model with the smallest K-L distance. The AIC chooses the candidate model with the smallest expected K-L distance.

Open in new tab Download slide

The AIC for a given model is a function of its maximized log-likelihood (ℓ) and the number of estimable parameters (K):

In the context of phylogenetics we can think of the AIC as the amount of information lost when we use, say HKY85, to approximate the real process of nucleotide substitution. Hence, we prefer the model with the smallest AIC. The second term K includes the parameters from the substitution model, like base frequencies, substitution rates, proportion of invariable sites, or rate variation among sites. If branch lengths are estimated de novo for every model, K should also include the number of branches (for an unrooted bifurcated tree, twice the number of taxa minus three). Although the inclusion of the number of branches, constant for all models, does not change the order of the AIC values, it will change their relative magnitude.

In the AIC, as more parameters are added to the model the first term becomes smaller, representing an increased fit, whereas the second component, or penalty term, becomes larger. Indeed, when the sample is large, the number of adjustable parameters makes a negligible difference, and more complex models will be favored (Forster and Sober, 1994). It is important to note that although the AIC formula appears to be superficially very simple, its derivation is well founded on information theory (de Leeuw, 1992), and the so called “penalty term” 2K is not an arbitrary value (Burnham and Anderson, 2003, pp. 64). When sample size (n) is small compared to the number of parameters (say, n/K < 40) the use of a second-order AIC, AIC_c (Hurvich and Tsai, 1989; Sugiura, 1978), is recommended:

where sample size is approximated by the total number of characters in the alignment (see below for discussion). Note that in this case the inclusion of branch lengths as estimated parameters can change the order of the AIC_c values, and therefore, the selected model.

Because the AIC is on a relative scale, it is critical to compute and present the AIC differences (Δ AIC), rather than actual AIC values, over all candidate models (Buckley and Cunningham, 2002; Burnham and Anderson, 2003, pp. 70–72). For the ith model, the AIC difference is:

where min AIC is the smallest AIC value among all candidate models.

The AIC is designed to estimate the predictiveaccuracy of competing hypotheses (Forster, 2002; Sober, 2002b), which is the expected performance of a model when predicting new data. The prediction of new data is a common application in phylogenetics, for example in parametric bootstrapping or simulation studies. It seems that the AIC was first applied in the context of phylogenetics by Hasegawa and collaborators (1990a; 1990b; Kishino and Hasegawa, 1989), and although several phylogenetics programs implement the AIC, like Molphy (Adachi and Hasegawa, 1996) and Modeltest (Posada, 2003; Posada and Crandall, 1998), the use of the AIC is much less common than that of the hLRTs.

The AIC makes several assumptions. First, there is the assumption of “uniformity of nature” (Forster and Sober, 1994), that is, that all data sets (future and past) are drawn from the same underlying process. Second, the AIC assumes that the sample size is large enough to ensure that the likelihood function will approximate its asymptotic properties. Finally the AIC assumes that the true distribution of parameter estimates, when the number of data n is sufficiently large, follows a multivariate normal distribution. In principle, these assumptions (on the other hand, common in statistical phylogenetics) should not be unduly restrictive (Forster and Sober, 1994, 2004), but the implications of potential violations need to be studied. It has been argued that constraining parameters at their boundaries, for example setting the proportion of invariable sites to be zero, might violate the derivation of the AIC (and the BIC) (Ota et al., 2000).

Model Selection Uncertainty with the AIC

The AIC differences allow for an immediate ranking of the candidate models. The larger the AIC difference for a model, the less probable that it is the best K-L model. As a rough rule of thumb, Burnham and Anderson (2003, p. 70) propose that models for which Δ_i ≤ 2 receive substantial support and are considered when making inferences, models having 4 ≤ Δ_i ≤ 7 have considerably less support, and models having Δ_i > 10 receive no support. However, they also warn that these guidelines are not expected to hold when observations are not independent but are assumed so, as is usually the case in phylogenetics.

Akaike (1983) also suggested that the exp (−1/2Δ_i) approximates the relative likelihood of the models given the dataL(M_i| D), which are then normalized to obtain a positive set of Akaike weights (w). The Akaike weight for the ith model in a set of R candidate models is:

Akaike weights are very useful for assessing model-selection uncertainty without having to use computer intensive methods like Monte Carlo simulation or bootstrapping (Buckland et al., 1997; see Buckley et al., 2002, for an example). We can establish a 95% confidence set of models for the best K-L model by summing the Akaike weights from largest to smallest until the sum is just 0.95; the corresponding subset of models is a type of confidence set on the best K-L model (Burnham and Anderson, 1998, pp. 169–171; 2003). We can also assess the relative likelihoods of model i versus model j as simply the ratio of the two Akaike weights, which are called evidence ratios (Anderson et al., 2000; Burnham and Anderson, 2003, pp. 77–79). Techniques exist to compare whether two AICs differ significantly (Linhart, 1988; Shimodaira, 1997; Vuong, 1989), and multiple comparison techniques can be used to construct a confidence set of models that minimize the sampling error of the AIC (Shimodaira, 1998). Such techniques have already been proposed to construct a confidence sets of trees (Shimodaira, 2001; Shimodaira and Hasegawa, 1999).

There is a Bayesian basis for interpreting the Akaike weights as being the probability that a model is the expected best K-L model (Akaike, 1981). In fact, the Akaike weights can be generalized to also include prior information (ρ_i):

(Burnham and Anderson, 2003, p. 76). However, the above is not a true Bayesian approach, because these priors only refer to the model, and not to the prior probability distribution of the parameters of the model. Neither do these priors refer to the belief that M_i is the true model, but rather to the belief that model M_i is the best K-L model for the data (Burnham and Anderson, 1998, 2003). Usually ρ_i is set to 1/R for every model.

Model Averaging with the AIC

Within the AIC framework, it is straightforward to obtain a model-averaged estimate of any parameter (Posada, 2003). For example, a model-averaged estimate of the substitution rate between adenine and cytosine (ϕ_{A − C}) using the Akaike weights (w) for R candidate models would be:

where

and

Again, the caveats described above about interpreting model-averaged parameter estimates apply. Likewise, it is again easy to estimate the relative importance of any parameter by summing the Akaike weights across all models that include the parameters we are interested in. For example, the relative importance of the substitution rate between adenine and cytosine across all candidate models is simply the denominator above, w₊ (ϕ_{A − C}).

Model-Averaged Estimation of Phylogenies

As discussed above, model averaging can also be applied to the estimation of phylogenetic trees (Posada, 2003). This can be easily accomplished in programs like PAUP* (Swofford, 1998), and perhaps the only limitation is the time we want to dedicate to the analysis. We start by estimating a tree for each candidate model and then build a consensus tree using model weights as tree weights (these model weights can be Akaike weights, BIC weights, or model likelihoods from a Bayesian analysis) (see Jermiin et al., 1997). In a Bayesian framework one could also directly obtain a model-averaged estimate of phylogeny by using reversible-jump MCMC, an algorithm that moves through both parameter and model space (Green, 1995), and very recently implemented by Huelsenbeck et al. (2004), for phylogenetic model selection. It is also interesting to note that the AIC and Bayesian approaches allow for the direct comparison of trees estimated under different models because likelihoods calculated on different trees and on different models are comparable (e.g., ML-JC69 versus ML-HKY) In this sense, the AIC has already been used as an extension of the likelihood optimality criterion for phylogenetic estimation (Kishino and Hasegawa, 1989; Ogishima et al., 2000; Sober, 2002b; Sober and Steel, 2002; Tanaka et al., 1999), and nothing prevents the BIC from also being considered as another phylogenetic criterion. Posterior probabilities for different trees inferred under different models are also directly comparable if they fall under the same posterior distribution.

We have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001). This alignment contains 1927 sites, 301 of which are variable. We took three approaches to selecting the best-fit model. First, we optimized the likelihood and model parameters for the 56 substitution models currently implemented in the program Modeltest (Posada and Crandall, 1998) on a neighbor-joining tree estimated from Jukes and Cantor (1969) distances. We then used the AIC and AIC_c to select the best-fit model from these likelihoods. Second, we took these model parameters and performed a tree search under each of the 56 models so as to find the tree with the highest likelihood under each of these optimized models. Again, the AIC and AIC_c was used to chose the best-fit model. The second approach is superior to the first approach because it involves a more thorough search for the maximum likelihood under each model; however, the computational burden is much greater. Third, we also used the specific hLRT strategy implemented in Modeltest (Posada and Crandall, 1998). From the likelihood values we calculated AIC_c values, Akaike weights, the relative importance of different parameters, and model averaged estimates of parameters and topology. In addition, we performed a bootstrap analysis on the data using the best AIC_c model with 500 replicates. All tree searches used five random addition replicates followed by TBR branch swapping. All likelihood calculations and tree searches were performed using PAUP*4.0b10 (Swofford, 2000).

Examining the AIC_c values and Akaike weights for the models optimized on the NJ tree we immediately observe that only 11 out of the 56 models received noticeable support from the data (Table 1). Importantly, this confidence set of models, and the ranking of models within this set is almost identical to that obtained from optimizing the topology (data not shown) (see also Nylander, 2004). All of the supported models incorporated the gamma distribution for among site rate variation and the best-supported models also included a proportion of invariable sites. Models that assumed equal base frequencies fitted the data poorly and received essentially no support (i.e., their Akaike weights are close to zero). The TN93+I+Γ model had the smallest AIC_c value, but there was considerable uncertainty in identifying the most appropriate number of different substitution rates between nucleotides. The Akaike weights calculated from the AIC_c values were very similar to those calculated from the AIC. This is because the n/K ratio, 37.14, is close to the value of 40, which Burnham and Anderson (2003, p. 66) recommend as the cut-off for preferring AIC_c. Indeed, when n/K is relatively large the AIC_c converges back to the AIC, and so it is still appropriate to use the AIC_c instead of the AIC. The hLRT approach led to selection of the HKY +I+Γ model, which only received an AIC_c weight of 0.1692 (Table 1), but was contained within the 95% AIC confidence set of models. The ML tree under the HKY+I+Γ model differs by a symmetrical distance (Foulds et al., 1979) of 4 and 5 from the two trees estimated under the TN93+I+Γ model.

In total 23 unique tree topologies were estimated from all of the models; however, only 8 unique topologies were contained in the set of trees that were estimated from models that received greater than or equal to 0.00001 support from the AIC_c weights. Some tree searches under the among-site rate variation models recovered two topologies, where one of these topologies had an internal branch collapsed to zero length. The weighted AIC_c consensus topology (Fig. 5A) was almost identical to the topology estimated under the best AIC_c model (TN93+I+Γ) (Fig. 5B), but due to the model selection uncertainty there is considerable ambiguity in selecting the best point estimate of topology for these data. The bootstrap analysis under the best AIC_c model indicates that the nodes that are not supported under all of the models also have low bootstrap support (Fig. 5). This observation is important because it suggests that in this case if we had ignored model selection uncertainty our conclusion as to what hypotheses were well supported by the data would be the same. It is worth mentioning that the numbers above branches in Figure 5A describe the uncertainty of branches due to uncertainty on the models of molecular evolution. This is in contrast with the bootstrap values in Figure 5B, which describe uncertainty due to the stochasticity of molecular evolution. The former numbers can be regarded as “bootstrap proportions” obtained by resampling models with probabilities proportional to the Akaike weights. The phylogenetic relationships among the Ohomopterus carabid beetles are very similar to those estimated by Sota and Vogler (2001) using maximum parsimony.

Figure 5

Multimodel phylogeny of Ohomopterus carabid beetles. (A) Consensus of trees estimated under 56 candidate models, and constructed using Akaike weights (with the AIC_c) as tree weights. The values above branches represent the weights for each branch. All branches without a number received a weight of 100%. (B) Consensus of the two maximum likelihood trees under the best AIC_c model (TN93+I+Γ), one of which had a branch of zero length. Numbers above nodes are nonparametric bootstrap proportions. Nodes that received less than 50% are not indicated. The five species groups are indicated by shaded boxes.

Open in new tab Download slide

We examined the association between pairwise AIC_c differences and pairwise tree distances (Foulds et al., 1979) for the 11 models included in the 99% confidence set (Fig. 6). This relationship shows a weak but significant correlation (r² = 0.2394; P = 0.00015) between the improvement of fit of a model to the data and differences in topology. This graph supports, to a limited extent, the intuition that models with similar fits to the data tend to support similar trees.

Figure 6

AIC differences and phylogeny estimation. For each pair of models out of the 11 models with noticeable AIC_c support, we calculated the differences in AIC scores (Pairwise AIC_c distances) and the Robinson and Foulds (1981) tree distances (Pairwise tree distances) using AIC_c scores calculated on a NJ-JC tree.

Open in new tab Download slide

The model averaged parameter estimates are very similar to the maximum likelihood estimates under the best-fit models (Table 2) because models with similar likelihoods, and thus low AIC differences tend to result in similar parameter estimates. The variability between the model averaged and best-fit model parameter estimates is unlikely to have a large effect on estimation of topology. The greatest variability between the model averaged parameter and best-fit model parameter estimates is observed for the transversion rate parameters. This is not surprising given that relatively few transversions have occurred in these data and therefore there is not much information from which to gain stable estimates.

Table 2.

Open in new tab

Model-averaged estimates of nucleotide substitution parameters. These estimates were obtained from the carabid beetles Ohomopterus mitochondrial DNA data set using the Akaike weights (w_i) derived from the AIC_c for models with w_i > 0.0001. Which estimates contributed from which models are indicated in Table 3. Included also are the estimates corresponding to the best AIC_c model (TN93+I+Γ) and to the model selected by the hLRT procedure (HKY85+I+Γ). π_A− π_T: base frequencies; κ: transition/transversion parameter; ϕ_{A − C}− ϕ_{A − T}: substitution rates; α: shape of the gamma distribution for rate variation among sites; α (I+Γ) shape of the gamma distribution for rate variation among sites under an I+Γ model; p_inv (I+Γ) proportion of invariable sites under an I < eqid18 > Γ model

Parameter	Model-averaged estimate	AIC_c model estimate	hLRT model estimate
π_A	0.3330	0.3342	0.3303
π_C	0.0683	0.0667	0.0725
π_G	0.1362	0.1369	0.1335
π_T	0.4625	0.4622	0.4637
κ	14.8483	14.8476	14.8476
ϕ_{A − C}	0.6290	1.0	—
ϕ_{A − G}	13.4111	13.1823	—
ϕ_{A − T}	1.0536	1.0	—
ϕ_{C − G}	0.4189	1.0	—
ϕ_{C − T}	20.0553	19.7583	—
α	0.1011	—	—
α(I+Γ)	0.7149	0.7658	0.5849
p_inv(I+Γ)	0.6874	0.7038	0.6644

Parameter	Model-averaged estimate	AIC_c model estimate	hLRT model estimate
π_A	0.3330	0.3342	0.3303
π_C	0.0683	0.0667	0.0725
π_G	0.1362	0.1369	0.1335
π_T	0.4625	0.4622	0.4637
κ	14.8483	14.8476	14.8476
ϕ_{A − C}	0.6290	1.0	—
ϕ_{A − G}	13.4111	13.1823	—
ϕ_{A − T}	1.0536	1.0	—
ϕ_{C − G}	0.4189	1.0	—
ϕ_{C − T}	20.0553	19.7583	—
α	0.1011	—	—
α(I+Γ)	0.7149	0.7658	0.5849
p_inv(I+Γ)	0.6874	0.7038	0.6644

Table 2.

Open in new tab

Model-averaged estimates of nucleotide substitution parameters. These estimates were obtained from the carabid beetles Ohomopterus mitochondrial DNA data set using the Akaike weights (w_i) derived from the AIC_c for models with w_i > 0.0001. Which estimates contributed from which models are indicated in Table 3. Included also are the estimates corresponding to the best AIC_c model (TN93+I+Γ) and to the model selected by the hLRT procedure (HKY85+I+Γ). π_A− π_T: base frequencies; κ: transition/transversion parameter; ϕ_{A − C}− ϕ_{A − T}: substitution rates; α: shape of the gamma distribution for rate variation among sites; α (I+Γ) shape of the gamma distribution for rate variation among sites under an I+Γ model; p_inv (I+Γ) proportion of invariable sites under an I < eqid18 > Γ model

Parameter	Model-averaged estimate	AIC_c model estimate	hLRT model estimate
π_A	0.3330	0.3342	0.3303
π_C	0.0683	0.0667	0.0725
π_G	0.1362	0.1369	0.1335
π_T	0.4625	0.4622	0.4637
κ	14.8483	14.8476	14.8476
ϕ_{A − C}	0.6290	1.0	—
ϕ_{A − G}	13.4111	13.1823	—
ϕ_{A − T}	1.0536	1.0	—
ϕ_{C − G}	0.4189	1.0	—
ϕ_{C − T}	20.0553	19.7583	—
α	0.1011	—	—
α(I+Γ)	0.7149	0.7658	0.5849
p_inv(I+Γ)	0.6874	0.7038	0.6644

Parameter	Model-averaged estimate	AIC_c model estimate	hLRT model estimate
π_A	0.3330	0.3342	0.3303
π_C	0.0683	0.0667	0.0725
π_G	0.1362	0.1369	0.1335
π_T	0.4625	0.4622	0.4637
κ	14.8483	14.8476	14.8476
ϕ_{A − C}	0.6290	1.0	—
ϕ_{A − G}	13.4111	13.1823	—
ϕ_{A − T}	1.0536	1.0	—
ϕ_{C − G}	0.4189	1.0	—
ϕ_{C − T}	20.0553	19.7583	—
α	0.1011	—	—
α(I+Γ)	0.7149	0.7658	0.5849
p_inv(I+Γ)	0.6874	0.7038	0.6644

Not all model parameters have the same importance for this data set (Table 3). The alpha shape parameter from the gamma distribution of among-site rate variation and the base frequency parameters have a relative importance of 1.0 because they appear in all of the supported models. The proportion of invariable sites is also a very important parameter although a few models with low weight without this parameter are supported. This observation suggests that these properties of the evolutionary process are very important for obtaining a good model fit. The ϕ_{A − G} and ϕ_{C − T} substitution rate parameters have higher relative importance values that the transversion parameters. This indicates that for these data it is important to allow the two transition types to have different rates, more so than the transversion types. The results shown in Table 2 make sense in light of our current knowledge of the dynamics of animal mitochondrial DNA evolution (e.g., Brown et al. 1982; Tamura and Nei 1993; Buckley et al. 2001a).

Table 3.

Open in new tab

Relative parameter importance. Included here are Akaike weights (w_i) and relative parameter importance values for the Ohomopterus carabid beetles mitochondrial DNA data set, for models with w_i > 0.0001. Where a model contains a free parameter it is indicated with a black dot (note that ϕ_{G − T} is often set to equal 1)

	w_i	π_A	π_C	π_G	π_T	κ	ϕ_{A − C}	ϕ_{A − G}	ϕ_{A − T}	ϕ_{C − G}	ϕ_{C − T}	ϕ_{G − T}	α	p_inv
TN93+I+Γ	0.5221	•	•	•	•			•			•		•	•
TIM+I+Γ	0.1913	•	•	•	•			•			•		•	•
HKY85+I+Γ	0.1692	•	•	•	•	•							•	•
K81uf+I+Γ	0.0642	•	•	•	•								•	•
GTR+I+Γ	0.0344	•	•	•	•		•	•	•	•	•	•	•	•
TVM+I+Γ	0.0165	•	•	•	•		•		•	•		•	•	•
TN93+Γ	0.0011	•	•	•	•			•			•		•
HKY85+Γ	0.0005	•	•	•	•	•							•
TIM+Γ	0.0004	•	•	•	•			•			•		•
K81uf+Γ	0.0002	•	•	•	•								•
GTR+Γ	0.0001	•	•	•	•		•	•	•	•	•	•	•
Relative parameter importance		1.0	1.0	1.0	1.0	0.170	0.051	0.749	0.051	0.051	0.749	0.051	1.0	0.997

	w_i	π_A	π_C	π_G	π_T	κ	ϕ_{A − C}	ϕ_{A − G}	ϕ_{A − T}	ϕ_{C − G}	ϕ_{C − T}	ϕ_{G − T}	α	p_inv
TN93+I+Γ	0.5221	•	•	•	•			•			•		•	•
TIM+I+Γ	0.1913	•	•	•	•			•			•		•	•
HKY85+I+Γ	0.1692	•	•	•	•	•							•	•
K81uf+I+Γ	0.0642	•	•	•	•								•	•
GTR+I+Γ	0.0344	•	•	•	•		•	•	•	•	•	•	•	•
TVM+I+Γ	0.0165	•	•	•	•		•		•	•		•	•	•
TN93+Γ	0.0011	•	•	•	•			•			•		•
HKY85+Γ	0.0005	•	•	•	•	•							•
TIM+Γ	0.0004	•	•	•	•			•			•		•
K81uf+Γ	0.0002	•	•	•	•								•
GTR+Γ	0.0001	•	•	•	•		•	•	•	•	•	•	•
Relative parameter importance		1.0	1.0	1.0	1.0	0.170	0.051	0.749	0.051	0.051	0.749	0.051	1.0	0.997

Table 3.

Open in new tab

Relative parameter importance. Included here are Akaike weights (w_i) and relative parameter importance values for the Ohomopterus carabid beetles mitochondrial DNA data set, for models with w_i > 0.0001. Where a model contains a free parameter it is indicated with a black dot (note that ϕ_{G − T} is often set to equal 1)

	w_i	π_A	π_C	π_G	π_T	κ	ϕ_{A − C}	ϕ_{A − G}	ϕ_{A − T}	ϕ_{C − G}	ϕ_{C − T}	ϕ_{G − T}	α	p_inv
TN93+I+Γ	0.5221	•	•	•	•			•			•		•	•
TIM+I+Γ	0.1913	•	•	•	•			•			•		•	•
HKY85+I+Γ	0.1692	•	•	•	•	•							•	•
K81uf+I+Γ	0.0642	•	•	•	•								•	•
GTR+I+Γ	0.0344	•	•	•	•		•	•	•	•	•	•	•	•
TVM+I+Γ	0.0165	•	•	•	•		•		•	•		•	•	•
TN93+Γ	0.0011	•	•	•	•			•			•		•
HKY85+Γ	0.0005	•	•	•	•	•							•
TIM+Γ	0.0004	•	•	•	•			•			•		•
K81uf+Γ	0.0002	•	•	•	•								•
GTR+Γ	0.0001	•	•	•	•		•	•	•	•	•	•	•
Relative parameter importance		1.0	1.0	1.0	1.0	0.170	0.051	0.749	0.051	0.051	0.749	0.051	1.0	0.997

	w_i	π_A	π_C	π_G	π_T	κ	ϕ_{A − C}	ϕ_{A − G}	ϕ_{A − T}	ϕ_{C − G}	ϕ_{C − T}	ϕ_{G − T}	α	p_inv
TN93+I+Γ	0.5221	•	•	•	•			•			•		•	•
TIM+I+Γ	0.1913	•	•	•	•			•			•		•	•
HKY85+I+Γ	0.1692	•	•	•	•	•							•	•
K81uf+I+Γ	0.0642	•	•	•	•								•	•
GTR+I+Γ	0.0344	•	•	•	•		•	•	•	•	•	•	•	•
TVM+I+Γ	0.0165	•	•	•	•		•		•	•		•	•	•
TN93+Γ	0.0011	•	•	•	•			•			•		•
HKY85+Γ	0.0005	•	•	•	•	•							•
TIM+Γ	0.0004	•	•	•	•			•			•		•
K81uf+Γ	0.0002	•	•	•	•								•
GTR+Γ	0.0001	•	•	•	•		•	•	•	•	•	•	•
Relative parameter importance		1.0	1.0	1.0	1.0	0.170	0.051	0.749	0.051	0.051	0.749	0.051	1.0	0.997

Lastly, model averaging could also be applied to other problems in evolutionary biology in which inferences can be drawn from several models, for example as in the detection of positive selection from sequence alignments (Yang et al., 2000), and the estimation of divergence times using relaxed molecular clocks (Aris-Brosou and Yang, 2002), where different models can frequently yield different results.

Philosophical Considerations on Model Selection

There is still an important philosophical debate about model selection in general (Burnham and Anderson, 1998, 2003; Forster and Sober, 1994, 2004; Forster, 2000, 2001; Kass and Raftery, 1995; Kieseppä, 2002; Myrvold and Harper, 2002; Popper, 1959; Sober, 2002a; Wasserman, 2000), and here we do not attempt to address all the issues, but just those we think are most relevant. The information-theoretic and the Bayesian approaches represent different philosophical approaches to the problem of model selection (Forster and Sober, 1994; Kuha, 2003; Sober, 2002a). The AIC is designed to choose the model that best approximates reality. The conclusions of AIC are never about the truth or falsity of a hypothesis, but about its closeness to the truth (Forster and Sober, 2004). On the other hand, Bayesian approaches are designed to identify the true model, given the data. Both the AIC and Bayesian approaches have been criticized on different grounds.

That Bayesian approaches are designed to identify the true model can be surprising when surely we know that all models of evolution are false (i.e., their probability is zero). The standard interpretation of P(M_i|D) is that it is the probability that M_i is the true model given the data, even though we know that this statement is false a priori (Gelfand, 1996). A common response to this criticism is that we can hope that at least one of the models is approximately true, and that the posterior distributions allows us to compare the relative merits of the models (Wasserman 2000). On the other hand, it has been argued that the derivation of the BIC does not require that the true model is contained within the set of candidate models (Burnham and Anderson, 2003, pp. 293–295; Cavanaugh and Neath, 1999). Interestingly, it is possible to obtain the AIC as a Bayesian result if a particular prior (the so called K-L prior) is used with the BIC (Burnham and Anderson, 2003, pp. 302–305).

It has been alleged in the statistical literature that, under certain conditions, the BIC is statistically consistent (it does converge to truth as more data is added), whereas the AIC is not (but see Bozdogan, 1987; Findley, 1991; Keuzenkamp and McAleer, 1995; Nishii, 1984, 1988; Shibata, 1986; Woodroofe, 1982) but the relevance of statistical consistency in this context is not clear (Forster, 2002).

We can think of a model as a set or family of sharp hypotheses. For example, the K80 model contains all hypotheses representing different values of the transition/transversion parameter, κ. The JC69 model, however, contains only one hypothesis, as all its parameters are fixed (equal base frequencies and equal rates for transitions or transversions). The AIC and the BIC work with maximized likelihoods, and therefore they are comparing the best point hypothesis within each model. However, it might be unwise to compare models based only on the merits of a single point, even if this point is optimal, and that is why Bayesians prefer models for which the sum of the likelihoods of all contained point hypotheses is largest (Holder and Lewis, 2003).

Which Model Selection Method is Best for Phylogenetics?

The use of different model selection strategies may lead to the selection of different models of evolution (Posada and Crandall, 2001a), and we know that model choice affects all aspects of phylogenetic analysis. Here we have attempted to compare different model selection strategies from a theoretical and practical point of view, in the context of phylogenetics. Previous Monte Carlo simulations on the performance of model selection in phylogenetics (Posada, 2001; Posada and Crandall, 2001b) showed that these methods work well when the aim is to identify the generating model. However, these simulations missed the point that the true model of evolution will never be one of the candidate models. It would be more useful to generate data from a model much more complex than any of the candidate models, and then study how well the selected models approximate this complex generating model (e.g., Minin et al., 2003). Clearly, we should seek models that are good approximations to the truth and from which therefore we can make valid inferences concerning the real process of molecular evolution. Too often we read expressions like “The best-fit model was selected with the program Modeltest” without any reference to which model selection strategy was used (in this case, hLRT or AIC). When a method of model selection is used, this should be explicitly reported.

From the discussion above it should be clear that the Bayesian and AIC approaches present several important advantages over the hLRTs for model selection (see also Table 4). Namely, they are able to simultaneously compare multiple nested or nonnested models (see Chamberlain, 1890), account for model selection uncertainty, and allow for model-averaged inference. Although model selection uncertainty tools do not exist within the standard hLRTs framework, there are extensions of the LRT framework that allow for the specification of confidence sets of models. Evidence for a model can be also estimated by the “expected likelihood weights” (Strimmer, 2001; Strimmer and Rambaut, 2001). Criteria like the AIC or BIC are very simple to calculate from the maximum likelihood estimate, although they do rely on point estimates and do not take in account topological uncertainty (Bollback, 2002). The importance of the later effect has yet to be examined (but see Posada and Crandall, 2001b), as well as the potential impact of comparing models with parameters fixed at the boundary of their ranges (e.g., α = ∝) in the AIC and BIC.

Table 4.

Open in new tab

Comparison of model selection strategies for phylogenetics. Indicated are what the authors think are good properties for a model section procedure. Exceptions to these may exist and the comments below are generalizations

Good properties for model selection methods	hLRT	Bayesian	AIC
Applies easily to nonnested models	No	Yes	Yes
Allows for the simultaneous comparison of multiple models	No	Yes	Yes
Does not depend on a subjective significance level	No	Yes§	Yes
Incorporates topological uncertainty	No	Yes*	No
Easy to compute	Yes	No*	Yes
Assesses model selection uncertainty	No	Yes	Yes
Allows model averaging	No	Yes	Yes
Provides the possibility of specifying prior information for models	No	Yes*	Yes
Provides the possibility of specifying prior information for model parameters	No	Yes*	No
Designed to approximate, rather than to identify, truth	No	No	Yes

Good properties for model selection methods	hLRT	Bayesian	AIC
Applies easily to nonnested models	No	Yes	Yes
Allows for the simultaneous comparison of multiple models	No	Yes	Yes
Does not depend on a subjective significance level	No	Yes§	Yes
Incorporates topological uncertainty	No	Yes*	No
Easy to compute	Yes	No*	Yes
Assesses model selection uncertainty	No	Yes	Yes
Allows model averaging	No	Yes	Yes
Provides the possibility of specifying prior information for models	No	Yes*	Yes
Provides the possibility of specifying prior information for model parameters	No	Yes*	No
Designed to approximate, rather than to identify, truth	No	No	Yes

*

Not the BIC.

§

In a sense, the interpretation of Bayes factors could be considered as subjective.

Table 4.

Open in new tab

Comparison of model selection strategies for phylogenetics. Indicated are what the authors think are good properties for a model section procedure. Exceptions to these may exist and the comments below are generalizations

Good properties for model selection methods	hLRT	Bayesian	AIC
Applies easily to nonnested models	No	Yes	Yes
Allows for the simultaneous comparison of multiple models	No	Yes	Yes
Does not depend on a subjective significance level	No	Yes§	Yes
Incorporates topological uncertainty	No	Yes*	No
Easy to compute	Yes	No*	Yes
Assesses model selection uncertainty	No	Yes	Yes
Allows model averaging	No	Yes	Yes
Provides the possibility of specifying prior information for models	No	Yes*	Yes
Provides the possibility of specifying prior information for model parameters	No	Yes*	No
Designed to approximate, rather than to identify, truth	No	No	Yes

Good properties for model selection methods	hLRT	Bayesian	AIC
Applies easily to nonnested models	No	Yes	Yes
Allows for the simultaneous comparison of multiple models	No	Yes	Yes
Does not depend on a subjective significance level	No	Yes§	Yes
Incorporates topological uncertainty	No	Yes*	No
Easy to compute	Yes	No*	Yes
Assesses model selection uncertainty	No	Yes	Yes
Allows model averaging	No	Yes	Yes
Provides the possibility of specifying prior information for models	No	Yes*	Yes
Provides the possibility of specifying prior information for model parameters	No	Yes*	No
Designed to approximate, rather than to identify, truth	No	No	Yes

*

Not the BIC.

§

In a sense, the interpretation of Bayes factors could be considered as subjective.

The possibility of inferring model-averaging phylogenies will eliminate some of the criticisms that model-based methods are contingent on the single best-fit model selected. Obviously, the methods described above can facilitate model-averaged hypothesis testing, as one could test for the monophyly of a group by considering all models available. Sanderson and Kim (2000) already hinted at the possibility of model-averaging phylogenies, but claimed that such a composite solution would be computationally prohibitive. However, this computational burden will depend on the size of the data set (especially on the number of taxa) and the number of models considered (but one could work with the 95% confidence or credible set of models), and in some cases it will certainly be feasible.

Selecting a set of candidate models is not easy; there are 203 “standard” time-reversible models of nucleotide substitution, but model selection in phylogenetics is commonly limited to a subset of these (Huelsenbeck et al., 2004). Indeed, evaluating a large number of models is more problematic for the hLRT than for the AIC and Bayesian approaches for the reasons explained above. The implications of conditioning model selection on a subset of the possible set of models is currently unknown.

Selection bias (Zucchini, 2000) may occur when the number of candidate models is large. In such cases random fluctuations in the data will increase the score of some models more than others and therefore the chance that the best model won for spurious reasons increases. Indeed, the set of candidate models influences model choice, and a careful a priori selection of candidate models is very important.

Both in the AIC_c and the BIC descriptions above, the total number of characters was used as an estimate of sample size. However, effective sample sizes in phylogenetic studies are poorly understood, and depend on the quantity of interest (Churchill et al., 1992; Goldman, 1998; Morozov et al., 2000). Characters in an alignment will often not be independent, so using the total number of characters as a surrogate for sample size (Minin et al., 2003; Posada and Crandall, 2001b) could be an overestimate. Using only the number of variable sites as an estimate of sample size is a more conservative approach, but could be an underestimate (note that all sites are used when estimating base frequencies or the proportion of invariable sites). Indeed, sample size also depends on the number of taxa. Importantly, sample size can have an effect on the outcome of model selection with the AIC_c. In our example above, if we were to use the number of variable characters (301 sites) as the sample size, instead of the total number of characters (1927 sites), the best AIC_c model would not change, but the second and third AIC_c models would exchange their rankings. Furthermore, because the LRT, the AIC, and the BIC strategies rely on large sample asymptotics, it is also important to decide when a sample should be considered small. Although the AIC_c was derived under Gaussian assumptions, Burnham et al. (1994) found that this second order expression performed well in product multinomial models for open population capture-recapture. Burnham and Anderson (2003, p. 66) suggest using this correction when the sample size is small compared to the number of adjustable parameters, n/K < 40. Alternatively, and because AIC_c converges to the AIC with increasing n/K ratios, one could always use the AIC_c (D. Anderson, personal communications). Phylogenetic characters are mostly discrete, and the unconstrained model in phylogenetics is multinomial (Goldman, 1993). One may think of an alignment of nucleotide characters as a large and sparse contingency table with 4^T bins, where T is the number of taxa. For large sample asymptotics to hold in a contingency table every cell should contain, in general, more than 5 observations (see Agresti, 1990, p. 49, 244–250), which gives a rule of thumb of n/4^T > 5. Clearly, more research is needed on sample size in phylogenetics.

Other model selection methods exist, like cross-validation and the bootstrap (see Browne, 2000; Efron and Tibshirani, 1993; Linhart and Zucchini, 1986), but they seem too time-consuming—note that cross validation is asymptotically equivalent to the AIC (Stone, 1977)—for the selection of substitution models. There is an important role for more general tests of model fit and accuracy within the process of model selection. For example, tests of base frequency stationarity (Rzhetsky and Nei, 1995; Van Den Bussche et al., 1998) should be standard before a phylogenetic analysis. In addition, the global tests of Goldman (1993) and Bollback (2001) are useful for detecting model misspecification. When tests such as these indicate that the final model selected still does not fit the data well, our results must be interpreted with caution as the possibility remains that some vital evolutionary process has not been accounted for, which could potentially be misleading.

Model selection is a useful tool for research, but it is not a substitute for careful thinking and common sense reasoning (Browne, 2000). There are examples in the phylogenetic literature where the best-fit models have led to phylogenetic estimates that are clearly incorrect (Buckley and Cunningham, 2002; Posada and Crandall, 2001c). Consideration of model selection uncertainty and multimodel inference should lead to equal or better estimates of phylogenies and substitution parameters, and we should see more applications of these ideas in the future (see also Nylander, 2004). Computation of AIC differences, Akaike weights, model-averaged estimates, and relative parameter importance is currently implemented in the program Modeltest (Posada and Crandall, 1998). Further developments will allow for the simultaneous use of different models for different partitions of the data (Nylander et al., 2004; Pupko et al., 2002; Suchard et al., 2003a; Yang, 1996b). It is now time to start thinking about how we will select those. Model selection in phylogenetics is indeed still an open area for research (Huelsenbeck et al., 2002).

1

Occam's (ca. 1280–1349) parsimony principle or Occam's razor was stated as “Pluralitas non est ponenda sine necessitate,” which translates literally into English as “plurality should not be posited without necessity.”

2

For continuous functions.

Acknowledgements

We are undoubtedly indebted to Kenneth Burnham and David Anderson for their enlightening book. David Anderson, Elliot Sober, and Carsten Wiuf provided very insightful comments on the manuscript. Robert Weiss, Janet Sinsheimer, Paul Lewis, Paul Joyce, Hidetoshi Shimodaira, and Rissa Ota helped clarify some ideas on Bayesian model selection. Nick Goldman and two anonymous referees provided useful comments on a first version. Jeff Thorne, Hirohisa Kishino, and two anonymous referees provide very valuable comments that considerably improved the manuscript. Thanks to David Swofford and Jack Sullivan for many valuable conversations on model selection throughout the years. DP was funded by the Spanish Ministry of Science and Technology, while funding for TRB was provided by the New Zealand Foundation for Research, Science, and Technology.

References

Adachi

J.

,

Hasegawa

M.

.

MOLPHY version 2.3.: Programs for molecular phylogenetics based in maximum likelihood

,

Comput. Sci. Monogr.

,

1996

, vol.

28

(pg.

1

-

150

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Agresti

A.

.

Categorical data analysis, 2nd edition

,

1990

New York

Wiley

Akaike

H.

.

Information theory and an extension of the maximum likelihood principle

,

Second International Symposium on Information Theory

,

1973

Budapest

Akademiai Kiado

(pg.

267

-

281

)

Pages

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Akaike

H.

.

A new look at the statistical model identification

,

IEEE Trans. Aut. Control

,

1974

, vol.

19

(pg.

716

-

723

)

Google Scholar

Crossref

WorldCat

Akaike

H.

.

Likelihood of a model and information criteria

,

J. Econometrics

,

1981

, vol.

16

(pg.

3

-

14

)

Google Scholar

Crossref

WorldCat

Akaike

H.

.

Information measures and model selection

,

Int. Stat. Inst.

,

1983

, vol.

22

(pg.

277

-

291

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Anderson

D. R.

,

Burnham

K. P.

,

Thompson

W. L.

.

Null hypothesis testing: Problems, prevalence, and an alternative

,

J. Wildl. Manage

,

2000

, vol.

64

(pg.

912

-

923

)

Google Scholar

Crossref

WorldCat

Aris-Brosou

S.

,

Yang

Z.

.

Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny

,

Syst. Biol.

,

2002

, vol.

51

(pg.

703

-

714

)

Bartlett

M. S.

.

A comment on D

,

V. Lindley's statistical paradox. Biometrika

,

1957

, vol.

44

(pg.

533

-

534

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Berger

J. O.

,

Sellke

T.

.

Testing a point null hypothesis: The irreconcilability of P values and evidence

,

J. Am. Stat. Assoc.

,

1987

, vol.

82

(pg.

112

-

122

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Bernardo

J. M.

,

Smith

A. F. M.

.

Bayesian theory

,

1994

New York

Wiley and Sons

Bollback

J. P.

.

Bayesian model adequacy and choice in phylogenetics

,

Mol. Biol. Evol.

,

2002

, vol.

19

(pg.

1171

-

1180

)

Box

G. E. P.

.

Science and statistics

,

J. Am. Stat. Assoc.

,

1976

, vol.

71

(pg.

791

-

799

)

Google Scholar

Crossref

WorldCat

Bozdogan

H.

.

Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions

,

Psychometrika

,

1987

, vol.

52

(pg.

345

-

370

)

Google Scholar

Crossref

WorldCat

Browne

M.

.

Cross-validation methods

,

J. Math. Psychol.

,

2000

, vol.

44

(pg.

108

-

132

)

Bruno

W. J.

,

Halpern

A. L.

.

Topological bias and inconsistency of maximum likelihood using wrong models

,

Mol. Biol. Evol.

,

1999

, vol.

16

(pg.

564

-

566

)

Buckley

T. R.

.

Model misspecification and probabilistic tests of topology: Evidence from empirical data sets

,

Syst. Biol.

,

2002

, vol.

51

(pg.

509

-

523

)

Buckley

T. R.

,

Arensburger

P.

,

Simon

C.

,

Chambers

G. K.

.

Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera

,

Syst. Biol.

,

2002

, vol.

51

(pg.

4

-

18

)

Buckland

S. T.

,

Burnham

K. P.

,

Augustin

N. H.

.

Model selection uncertainty: An integral part of inference

,

Biometrics

,

1997

, vol.

53

(pg.

603

-

618

)

Google Scholar

Crossref

WorldCat

Buckley

T. R.

,

Cunningham

C. W.

.

The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support

,

Mol. Biol. Evol.

,

2002

, vol.

19

(pg.

394

-

405

)

Buckley

T. R.

,

Simon

C.

,

Chambers

G. K.

.

Exploring among-site rate variation models in a maximum likelihood framework using empirical data: The effects of model assumptions on estimates of topology, branch lengths, and bootstrap support

,

Syst. Biol.

,

2001

, vol.

50

(pg.

67

-

86

)

Burnham

K. P.

,

Anderson

D. R.

.

Model selection and inference: A practical information-theoretic approach, 1st ed

,

1998

New York

Springer-Verlag

Burnham

K. P.

,

Anderson

D. R.

.

Model selection and multimodel inference: A practical information-theoretic approach, 2nd ed

,

2003

New York

Springer-Verlag

Burnham

K. P.

,

Anderson

D. R.

,

White

G. C.

.

Evaluation of the Kullback-Leibler discrepancy for model selection in open population capture-recapture models

,

Biometrica J.

,

1994

, vol.

36

(pg.

299

-

315

)

Google Scholar

Crossref

WorldCat

Cavanaugh

J. E.

,

Neath

A. A.

.

Generalizing the derivation of the Schwarz information criterion

,

Commun. Stat. Theory Methods

,

1999

, vol.

28

(pg.

49

-

66

)

Google Scholar

Crossref

WorldCat

Chamberlain

T. C.

.

The method of multiple working hypotheses

,

Science

,

1890

, vol.

15

pg.

93

Google Scholar

OpenURL Placeholder Text

WorldCat

Chatfield

C.

.

Model uncertainty, data mining and statistical inference

,

J. R. Stat. Soc. A

,

1995

, vol.

158

(pg.

419

-

466

)

Google Scholar

Crossref

WorldCat

Churchill

G. A.

,

Von Haeseler

A.

,

Navidi

W. C.

.

Sample size for a phylogenetic inference

,

Mol. Biol. Evol.

,

1992

, vol.

9

(pg.

753

-

769

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Deleeuw

J.

.

Kotz

S.

,

Johnson

N. L.

.

Introduction to Akaike 1973 information theory and an extension of the maximum likelihood principle

,

Breakthroughs in statistics

,

1992

London

Springer-Verlag

(pg.

599

-

609

)

Pages

Edwards

A. W. F.

.

Likelihood

,

1972

Cambridge, UK

Cambridge University Press

Efron

B.

,

Tibshirani

R. J.

.

An Introduction to the Bootstrap

,

1993

New York

Chapman and Hall

Felsenstein

J.

.

Cases in which parsimony or compatibility methods will be positively misleading

,

Syst. Zool.

,

1978

, vol.

27

(pg.

401

-

410

)

Google Scholar

Crossref

WorldCat

Felsenstein

J.

.

Evolutionary trees from DNA sequences: A maximum likelihood approach

,

J. Mol. Evol.

,

1981a

, vol.

17

(pg.

368

-

376

)

Google Scholar

Crossref

WorldCat

Felsenstein

J.

.

A likelihood approach to character weighting and what it tells us about parsimony and compatibility

,

Biol. J. Linnaean Soc.

,

1981b

, vol.

16

(pg.

183

-

196

)

Google Scholar

Crossref

WorldCat

Findley

D. F.

.

Counterexamples to parsimony and BIC

,

Ann. Inst. Stat. Math.

,

1991

, vol.

43

(pg.

505

-

514

)

Google Scholar

Crossref

WorldCat

Fisher

R. A.

.

On the ‘probable error’ of a coefficient of correlation deduced from a small sample

,

Metron I, part

,

1921

, vol.

4

(pg.

3

-

32

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Forster

M. R.

.

Key Concepts in model selection: Performance and generalizability

,

J. Math. Psychol.

,

2000

, vol.

44

(pg.

205

-

231

)

Forster

M. R.

.

Zeller

A.

,

Keuzenkamp

H. A.

,

McAleer

M.

.

The new science of simplicity

,

Simplicity, inference and modeling

,

2001

Cambridge, UK

Cambridge University Press

(pg.

83

-

119

)

Pages

Forster

M. R.

.

Predictive accuracy as am achievable goal of science

,

Phil. Sci.

,

2002

, vol.

69

(pg.

S124

-

S134

)

Google Scholar

Crossref

WorldCat

Forster

M.

,

Sober

E.

.

How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions

,

Br. J. Phil. Sci.

,

1994

, vol.

45

(pg.

1

-

35

)

Google Scholar

Crossref

WorldCat

Forster

M. R.

,

Sober

E.

.

Taper

M.

,

Lele

S.

.

Why likelihood?

,

Likelihood and Evidence

,

2004

Chicago

University of Chicago Press

Foulds

L. R.

,

Hendy

M. D.

,

Penny

D.

.

A graph theoretic approach to the development of minimal phylogenetic trees

,

J. Mol. Evol.

,

1979

, vol.

13

(pg.

127

-

149

)

Foutz

R. V.

,

Srivastava

R. C.

.

The performance of the likelihood ratio test when the model is incorrect

,

Ann. Stat.

,

1977

, vol.

5

(pg.

1183

-

1194

)

Google Scholar

Crossref

WorldCat

Frati

F.

,

Simon

C.

,

Sullivan

J.

,

Swofford

D. L.

.

Gene evolution and phylogeny of the mitochondrial cytochrome oxidase gene in Collembola

,

J. Mol. Evol.

,

1997

, vol.

44

(pg.

145

-

158

)

Gelfand

A. E.

.

Gilks

W. R.

,

Richardson

S.

,

Spiegelhalter

D. J.

.

Model determination using sampling-based methods

,

Markov chain Monte Carlo in practice

,

1996

London, New York

Chapman & Hall

(pg.

145

-

161

)

Pages

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Gilks

W. R.

,

Richardson

S.

,

Spiegelhalter

D. J.

.

Markov chain Monte Carlo in practice

,

1996

London, New York

Chapman & Hall

Golden

R. M.

.

Making correct statistical inferences using a wrong probability model

,

J. Math. Psychol.

,

1995

, vol.

38

(pg.

3

-

20

)

Google Scholar

Crossref

WorldCat

Goldman

N.

.

Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses

,

Syst. Zool.

,

1990

, vol.

39

(pg.

345

-

361

)

Google Scholar

Crossref

WorldCat

Goldman

N.

.

Statistical tests of models of DNA substitution

,

J. Mol. Evol.

,

1993

, vol.

36

(pg.

182

-

198

)

Goldman

N.

.

Phylogenetic information and experimental design in molecular systematics

,

Proc. R. Soc. Lond. B Biol. Sci.

,

1998

, vol.

265

(pg.

1779

-

1786

)

Google Scholar

Crossref

WorldCat

Goldman

N.

,

Whelan

S.

.

Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics

,

Mol. Biol. Evol.

,

2000

, vol.

17

(pg.

975

-

978

)

Green

P. J.

.

Reversible jump MCMC computation and Bayesian model determination

,

Biometrika

,

1995

, vol.

92

(pg.

711

-

732

)

Google Scholar

Crossref

WorldCat

Hasegawa

M.

.

Mitochondrial DNA evolution in primates: Transition rate has been extremely low in the lemur

,

J. Mol. Evol.

,

1990a

, vol.

31

(pg.

113

-

121

)

Google Scholar

Crossref

WorldCat

Hasegawa

M.

.

Phylogeny and molecular evolution in primates

,

Jpn. J. Genet.

,

1990b

, vol.

65

(pg.

243

-

266

)

Google Scholar

Crossref

WorldCat

Hasegawa

M.

,

Kishino

K.

,

Yano

T.

.

Dating the human-ape splitting by a molecular clock of mitochondrial DNA

,

J. Mol. Evol.

,

1985

, vol.

22

(pg.

160

-

174

)

Hastings

W. K.

.

Monte Carlo sampling methods using Markov chains and their applications

,

Biometrika

,

1970

, vol.

57

(pg.

97

-

109

)

Google Scholar

Crossref

WorldCat

Hochberg

Y.

.

A sharper Bonferroni procedure for multiple tests of significance

,

Biometrika

,

1988

, vol.

75

(pg.

800

-

802

)

Google Scholar

Crossref

WorldCat

Hoeting

J. A.

,

Madigan

D.

,

Raftery

A. E.

.

Bayesian model averaging: A tutorial

,

Stat. Sci.

,

1999

, vol.

14

(pg.

382

-

417

)

Google Scholar

Crossref

WorldCat

Holder

M.

,

Lewis

P. O.

.

Phylogeny estimation: Traditional and Bayesian approaches

,

Nat. Rev. Genet.

,

2003

, vol.

4

(pg.

275

-

284

)

Hsiao

C. K.

.

Approximate Bayes factors when a mode occurs on the boundary

,

J. Am. Stat. Assoc.

,

1997

, vol.

92

(pg.

656

-

663

)

Google Scholar

Crossref

WorldCat

Huelsenbeck

J. P.

,

Crandall

K. A.

.

Phylogeny estimation and hypothesis testing using maximum likelihood

,

Annu. Rev. Ecol. Syst.

,

1997

, vol.

28

(pg.

437

-

466

)

Google Scholar

Crossref

WorldCat

Huelsenbeck

J. P.

,

Hillis

D. M.

.

Success of phylogenetic methods in the four-taxon case

,

Syst. Biol.

,

1993

, vol.

42

(pg.

247

-

264

)

Google Scholar

Crossref

WorldCat

Huelsenbeck

J. P.

,

Imennov

N. S.

.

Geographic origin of human mitochondrial DNA: Accommodating phylogenetic uncertainty and model comparison

,

Syst. Biol.

,

2002

, vol.

51

(pg.

155

-

165

)

Huelsenbeck

J. P.

,

Larget

B.

,

Alfaro

M. E.

.

Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo

,

Mol. Biol. Evol.

,

2004

, vol.

21

(pg.

1123

-

1133

)

Huelsenbeck

J. P.

,

Larget

B.

,

Miller

R. E.

,

Ronquist

F.

.

Potential applications and pitfalls of Bayesian inference of phylogeny

,

Syst. Biol.

,

2002

, vol.

51

(pg.

673

-

688

)

Huelsenbeck

J. P.

,

Rannala

B.

,

Larget

B.

.

A Bayesian framework for the analysis of cospeciation

,

Evol. Int. J. Org. Evol.

,

2000

, vol.

54

(pg.

352

-

364

)

Google Scholar

Crossref

WorldCat

Huelsenbeck

J. P.

,

Ronquist

F.

,

Nielsen

R.

,

Bollback

J. P.

.

Bayesian inference of phylogeny and its impact on evolutionary biology

,

Science

,

2001

, vol.

294

(pg.

2310

-

2314

)

Hurvich

C. M.

,

Tsai

C.-L.

.

Regression and time series model selection in small samples

,

Biometrika

,

1989

, vol.

76

(pg.

297

-

307

)

Google Scholar

Crossref

WorldCat

Jeffreys

H.

.

Theory of probability

,

1939

Oxford

Oxford University Press

Jermiin

L. S.

,

Olsen

G. J.

,

Mengersen

K. L.

,

Easteal

S.

.

Majority-rule consensus of phylogenetic trees obtained by maximum-likelihood analysis

,

Mol. Biol. Evol.

,

1997

, vol.

14

(pg.

1296

-

1302

)

Google Scholar

Crossref

WorldCat

Johnson

J. B.

,

Omland

K. S.

.

Model selection in ecology and evolution

,

Trends Ecol. Evol.

,

2003

, vol.

19

(pg.

101

-

108

)

Google Scholar

Crossref

WorldCat

Jukes

T. H.

,

Cantor

C. R.

.

Munro

H. M.

.

Evolution of protein molecules

,

Mammalian protein metabolism

,

1969

New York

Academic Press

(pg.

21

-

132

)

Pages

Kadane

J. B.

,

Wolfson

L. J.

.

Experiencies in elicitation

,

J. R. Stat. Soc. D 47 Part

,

1998

, vol.

1

(pg.

3

-

19

)

Google Scholar

Crossref

WorldCat

Kass

R. E.

,

Raftery

A. E.

.

Bayes factors

,

J. Am. Stat. Assoc.

,

1995

, vol.

90

(pg.

773

-

795

)

Google Scholar

Crossref

WorldCat

Kass

R. E.

,

Wasserman

L.

.

A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion

,

J. Am. Stat. Assoc.

,

1995

, vol.

90

(pg.

928

-

934

)

Google Scholar

Crossref

WorldCat

Kelsey

C. R.

,

Crandall

K. A.

,

Voevodin

A. F.

.

Different models, different trees: The geographic origin of PTLV-I

,

Mol. Phylogenet. Evol.

,

1999

, vol.

13

(pg.

336

-

347

)

Kendall

M.

,

Stuart

A.

.

The advanced theory of statistics, 4th edition

,

1979

London

Charles Griffin

Kent

J. T.

.

Robust properties of likelihood ratio tests

,

Biometrika

,

1982

, vol.

69

(pg.

19

-

27

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Keuzenkamp

H.

,

McAleer

M.

.

Simplicity, scientific inference and economic modeling

,

Econ. J.

,

1995

, vol.

105

(pg.

1

-

21

)

Google Scholar

Crossref

WorldCat

Kieseppä

I. A.

.

Statistical model selection and Bayesianism

,

Phil. Sci.

,

2002

, vol.

68

(pg.

S141

-

S152

)

Google Scholar

Crossref

WorldCat

Kimura

M.

.

A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences

,

J. Mol. Evol.

,

1980

, vol.

16

(pg.

111

-

120

)

Kimura

M.

.

Estimation of evolutionary distances between homologous nucleotide sequences

,

Proc. Nat. Acad. Sci. USA

,

1981

, vol.

78

(pg.

454

-

458

)

Google Scholar

Crossref

WorldCat

Kishino

H.

,

Hasegawa

M.

.

Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea

,

J. Mol. Evol.

,

1989

, vol.

29

(pg.

170

-

179

)

Kuha

J.

.

AIC and BIC: Comparisons of assumptions and performance

,

Sociol. Methods Res.

,

2003

Submitted

Google Scholar

OpenURL Placeholder Text

WorldCat

Kullback

S.

,

Leibler

R. A.

.

On information and sufficiency

,

Ann. Math. Stat.

,

1951

, vol.

22

(pg.

79

-

86

)

Google Scholar

Crossref

WorldCat

Larget

B.

,

Simon

D.

.

Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees

,

Mol. Biol. Evol.

,

1999

, vol.

16

(pg.

750

-

759

)

Google Scholar

Crossref

WorldCat

Lindley

D. V.

.

A statistical paradox

,

Biometrika

,

1957

, vol.

44

(pg.

187

-

192

)

Google Scholar

Crossref

WorldCat

Linhart

H.

.

A test whether two AIC's differ significantly

,

S. Afr. Stat. J.

,

1988

, vol.

22

(pg.

153

-

161

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Linhart

H.

,

Zucchini

W.

.

Model selection

,

1986

New York

Wiley

Madigan

D.

,

Gavrin

J.

,

Raftery

A. E.

.

Eliciting prior information to enhance the predictive performance of Bayesian graphical models

,

Commun. Stat. Theory Methods

,

1995

, vol.

24

(pg.

2271

-

2292

)

Google Scholar

Crossref

WorldCat

Madigan

D. M.

,

Raftery

A. E.

.

Model selection and accounting for model uncertainty in graphical models using Occam's Window

,

J. Am. Stat. Assoc.

,

1994

, vol.

89

(pg.

1335

-

1346

)

Google Scholar

Crossref

WorldCat

Mau

B.

,

Newton

M. A.

.

Phylogenetic inference for binary data on dendrograms using Markov chain Monte Carlo

,

J. Comp. Grap. Stat.

,

1997

Google Scholar

OpenURL Placeholder Text

WorldCat

Mau

B.

,

Newton

M. A.

,

Larget

B.

.

Bayesian phylogenetic inference via Markov chain Monte Carlo methods

,

Biometrics

,

1999

, vol.

55

(pg.

1

-

12

)

Metropolis

N.

,

Rosenbluth

A.

,

Rosenbluth

M.

,

Teller

A.

,

Teller

E.

.

Equations of state calculations by fast computing machines

,

J. Chem. Phys.

,

1953

, vol.

21

(pg.

1087

-

1092

)

Google Scholar

Crossref

WorldCat

Miller

A. J.

.

Subset Selection in Regression, 2nd edition edition

,

2002

New York

Chapman & Hall/CRC

Minin

V.

,

Abdo

Z.

,

Joyce

P.

,

Sullivan

J.

.

Performance-based selection of likelihood models for phylogeny estimation

,

Syst. Biol.

,

2003

, vol.

52

(pg.

674

-

683

)

Morozov

P.

,

Sitnikova

T.

,

Churchill

G.

,

Ayala

F. J.

,

Rzhetsky

A.

.

A new method for characterizing replacement rate variation in molecular sequences: Application of the Fourier and Wavelet models to Drosophila and mammalian proteins

,

Genetics

,

2000

, vol.

154

(pg.

381

-

395

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Myrvold

W. C.

,

Harper

W. L.

.

Model Selection, Simplicity, and Scientific Inference

,

Philos. Sci.

,

2002

, vol.

69

(pg.

S135

-

S149

)

Google Scholar

Crossref

WorldCat

Nishii

R.

.

Asymptotic properties of criteria for selection of variables in multiple regression

,

Ann. Stat.

,

1984

, vol.

12

(pg.

758

-

765

)

Google Scholar

Crossref

WorldCat

Nishii

R.

.

Maximum likelihood principle and model selection when the true model is unspecified

,

J. Multivar. Ana.

,

1988

, vol.

27

Google Scholar

OpenURL Placeholder Text

WorldCat

Nylander

J. A.

.

Bayesian Phylogenetics and the Evolution of Gall Wasps

,

Acta Universitatis Upsaliensis

,

2004

Uppsala, Sweden

Uppsala University

pg.

43

Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 937

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Nylander

J. A.

,

Ronquist

F.

,

Huelsenbeck

J. P.

,

Nieves-Aldrey

J. L.

.

Bayesian phylogenetic analysis of combined data

,

Syst. Biol.

,

2004

, vol.

53

(pg.

47

-

67

)

Occam

W.

.

Scriptum in Librum Primum Sententiarum, Opera Theologica, I

,

ca.1320

Ogishima

S.

,

Ren

F.

,

Tanaka

H.

.

Efficiencies of information criteria for topology selection in reconstructing molecular phylogenetic tree in Proceedings of International Symposium on Artificial Life and Robotics

,

2000

, vol.

2000

(pg.

745

-

748

)

Ota

R.

,

Waddell

P. J.

,

Hasegawa

M.

,

Shimodaira

H.

,

Kishino

H.

.

Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters

,

Mol. Biol. Evol.

,

2000

, vol.

17

(pg.

798

-

803

)

Penny

D.

,

Lockhart

P. J.

,

Steel

M. A.

,

Hendy

M. D.

.

Scotland

R. W.

,

Siebert

D. J.

,

Williams

D. M.

.

The role of models in reconstructing evolutionary trees

,

Models in Phylogenetic Reconstruction

,

1994

Oxford

Clarendon Press

(pg.

211

-

230

)

Pages

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Pol

D.

.

Empirical problems of the hierarchical likelihood ratio test for model selection

,

Syst. Biol.

in press

OpenURL Placeholder Text

WorldCat

Popper

K. R.

.

Logic of scientific discovery

,

1959

London

Hutchinson

Posada

D.

.

The effect of branch length variation on the selection of models of molecular evolution

,

J. Mol. Evol.

,

2001

, vol.

52

(pg.

434

-

444

)

Posada

D.

.

Baxevanis

A. D.

,

Davison

D. B.

,

Page

R. D. M.

,

Petsko

G. A.

,

Stein

L. D.

,

Stormo

G. D.

.

Using Modeltest and PAUP* to select a model of nucleotide substitution

,

Current Protocols in Bioinformatics

,

2003

John Wiley & Sons, Inc.

(pg.

6.5.1

-

6.5.14

)

Pages

Posada

D.

,

Crandall

K. A.

.

Modeltest: Testing the model of DNA substitution

,

Bioinformatics

,

1998

, vol.

14

(pg.

817

-

818

)

Posada

D.

,

Crandall

K. A.

.

Selecting models of nucleotide substitution: An application to human immunodeficiency virus 1 (HIV-1)

,

Mol. Biol. Evol.

,

2001a.

, vol.

18

(pg.

897

-

906

)

Google Scholar

Crossref

WorldCat

Posada

D.

,

Crandall

K. A.

.

Selecting the best-fit model of nucleotide substitution

,

Syst. Biol.

,

2001b.

, vol.

50

(pg.

580

-

601

)

Google Scholar

Crossref

WorldCat

Posada

D.

,

Crandall

K. A.

.

Simple (wrong) models for complex trees: Empirical Bias

,

Mol. Biol. Evol.

,

2001c.

, vol.

18

(pg.

271

-

275

)

Google Scholar

Crossref

WorldCat

Pupko

T.

,

Huchon

D.

,

Cao

Y.

,

Okada

N.

,

Hasegawa

M.

.

Combining multiple data sets in a likelihood analysis: Which models are the best? Mol

,

Biol. Evol.

,

2002

, vol.

19

(pg.

2294

-

2307

)

Google Scholar

Crossref

WorldCat

Raftery

A. E.

.

Gilks

W. R.

,

Richardson

S.

,

Spiegelhalter

D. J.

.

Hypothesis testing and model selection

,

Markov chain Monte Carlo in practice

,

1996

London, New York

Chapman & Hall

(pg.

163

-

187

)

Pages

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Raftery

A. E.

.

Bayes factors and BIC: Comment on “A critique of the Bayesian information criterion for model selection”

,

Sociol. Methods Res.

,

1999

, vol.

27

(pg.

411

-

427

)

Google Scholar

Crossref

WorldCat

Robinson

D. F.

,

Foulds

L. R.

.

Comparison of phylogenetic trees

,

Math. Biosci.

,

1981

, vol.

53

(pg.

131

-

147

)

Google Scholar

Crossref

WorldCat

Rzhetsky

A.

,

Nei

M.

.

Tests of applicability of several substitution models for DNA sequence data

,

Mol. Biol. Evol.

,

1995

, vol.

12

(pg.

131

-

151

)

Sakamoto

Y.

,

Ishiguro

M.

,

Kitagawa

G.

.

Akaike information criterion statistics

,

1986

New York

Springer

Sanderson

M. J.

,

Kim

J.

.

Parametric phylogenetics? Syst

,

Biol.

,

2000

, vol.

49

(pg.

817

-

829

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Schwarz

G.

.

Estimating the dimension of a model

,

Ann. Stat.

,

1978

, vol.

6

(pg.

461

-

464

)

Google Scholar

Crossref

WorldCat

Shafer

G.

.

Lindley's paradox (with discussion)

,

J. Am. Stat. Assoc.

,

1982

, vol.

77

(pg.

325

-

351

)

Google Scholar

Crossref

WorldCat

Shibata

R.

.

Consistency of model selection and parameter estimation

,

J. Appl. Prob.

,

1986

, vol.

23A

(pg.

127

-

141

)

Google Scholar

Crossref

WorldCat

Shimodaira

H.

.

Assessing the error probability of the model selection test

,

Ann. Inst. Stat. Math.

,

1997

, vol.

49

(pg.

395

-

410

)

Google Scholar

Crossref

WorldCat

Shimodaira

H.

.

An application of multiple comparison techniques to model selection

,

Ann. Inst. Stat. Math.

,

1998

, vol.

1

(pg.

1

-

13

)

Google Scholar

Crossref

WorldCat

Shimodaira

H.

.

Multiple comparisons of log-likelihoods and combining nonnested models with applications to phylogenetic tree selection

,

Commun. Stat. Theory Methods

,

2001

, vol.

30

(pg.

1751

-

1772

)

Google Scholar

Crossref

WorldCat

Shimodaira

H.

,

Hasegawa

M.

.

Multiple comparisons of log-likelihoods with applications to phylogenetic inference

,

Mol. Biol. Evol.

,

1999

, vol.

16

(pg.

1114

-

1234

)

Google Scholar

Crossref

WorldCat

Sober

E.

.

Swinburne

R.

.

Bayesianism—its scope and limits

,

Bayes's Theorem

,

2002a

Oxford

Oxford University Press

(pg.

21

-

38

)

Pages

Sober

E.

.

Instrumentalism, parsimony, and the Akaike framework

,

Phil. Sci.

,

2002b

, vol.

69

(pg.

S112

-

S123

)

Google Scholar

Crossref

WorldCat

Sober

E.

,

Steel

M.

.

Testing the hypothesis of common ancestry

,

J. Theoret. Biol.

,

2002

, vol.

218

(pg.

395

-

408

)

Google Scholar

Crossref

WorldCat

Sota

T.

,

Vogler

A. P.

.

Incongruence of mitochondrial and nuclear gene trees in the Carabid beetles Ohomopterus

,

Syst. Biol.

,

2001

, vol.

50

(pg.

39

-

59

)

Steel

M.

,

Penny

D.

.

Parsimony, likelihood, and the role of models in molecular phylogenetics

,

Mol. Biol. Evol.

,

2000

, vol.

17

(pg.

839

-

850

)

Stone

M.

.

An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion

,

J. R. Stat. Soc.

,

1977

, vol.

39

(pg.

44

-

47

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Strimmer

K.

.

Model selection using expected likelihood weights: A Bayes-frequentist compromise

,

2001

Technical Report available at http://www.stat.uni-muenchen.de/∼strimmer/cv.html

Strimmer

K.

,

Rambaut

A.

.

Inferring confidence sets of possibly misspecified gene trees

,

Proc. R. Soc. Lond. B Biol. Sci.

,

2001

, vol.

269

(pg.

137

-

142

)

Google Scholar

Crossref

WorldCat

Suchard

M. A.

,

Kitchen

C. M.

,

Sinsheimer

J. S.

,

Weiss

R. E.

.

Hierarchical phylogenetic models for analyzing multipartite sequence data

,

Syst. Biol.

,

2003a.

, vol.

52

(pg.

649

-

664

)

Google Scholar

Crossref

WorldCat

Suchard

M. A.

,

Weiss

R. E.

,

Dorman

K. S.

,

Sinsheimer

J. S.

.

Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage

,

Syst. Biol.

,

2002

, vol.

51

(pg.

715

-

728

)

Suchard

M. A.

,

Weiss

R. E.

,

Sinsheimer

J. S.

.

Bayesian selection of continuous-time Markov chain evolutionary models

,

Mol. Biol. Evol.

,

2001

, vol.

18

(pg.

1001

-

1013

)

Suchard

M. A.

,

Weiss

R. E.

,

Sinsheimer

J. S.

.

Testing a molecular clock without an outgroup: Derivations of induced priors on branch-Length restrictions in a Bayesian framework

,

Syst. Biol.

,

2003b.

, vol.

52

(pg.

48

-

54

)

Google Scholar

Crossref

WorldCat

Sugiura

N.

.

Further analysis of the data by Akaike's information criterion and the finite corrections

,

Commun. Stat. Theory Methods A

,

1978

, vol.

7

(pg.

13

-

26

)

Google Scholar

Crossref

WorldCat

Sullivan

J.

,

Swofford

D. L.

.

Are guinea pigs rodents? The importance of adequate models in molecular phylogenies

,

J. Mamm. Evol.

,

1997

, vol.

4

(pg.

77

-

86

)

Google Scholar

Crossref

WorldCat

Sullivan

J.

,

Swofford

D. L.

.

Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst

,

Biol.

,

2001

, vol.

50

(pg.

723

-

729

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Suzuki

Y.

,

Glazko

G. V.

,

Nei

M.

.

Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics

,

Proc. Natl. Acad. Sci. USA

,

2002

, vol.

99

(pg.

16138

-

16143

)

Google Scholar

Crossref

WorldCat

Swofford

D. L.

.

PAUP* Phylogenetic analysis using parsimony and other methods, version 4.0. beta

,

1998

Sunderland, Massachusetts

Sinauer Associates

Swofford

D. L.

.

PAUP* Phylogenetic analysis using parsimony (*and other methods). version 4

,

2000

Sunderland, Massachusetts

Sinauer Associates

Tamura

K.

.

Model selection in the estimation of the number of nucleotide substitutions

,

Mol. Biol. Evol.

,

1994

, vol.

11

(pg.

154

-

157

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Tamura

K.

,

Nei

M.

.

Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees

,

Mol. Biol. Evol.

,

1993

, vol.

10

(pg.

512

-

526

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Tanaka

H.

,

Ren

F.

,

Okayama

T.

,

Gojobori

T.

.

Topology selection in unrooted molecular phylogenetic tree by minimum model-based complexity method

,

Pac. Symp. Biocomput.

,

1999

, vol.

4

(pg.

326

-

337

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Tavaré

S.

.

Miura

R. M.

.

Some probabilistic and statistical problems in the analysis of DNA sequences

,

Some mathematical questions in biology—DNA sequence analysis

,

1986

American Mathematical Society

(pg.

57

-

86

)

Pages

Providence, Rhode Island

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Van Den Bussche

R. A.

,

Baker

R. J.

,

Huelsenbeck

J. P.

,

Hillis

D. M.

.

Base compositional bias and phylogenetic analyses: A test of the “flying DNA” hypothesis

,

Mol. Phylogenet. Evol.

,

1998

, vol.

10

(pg.

408

-

416

)

Verdinelli

I.

,

Wasserman

L.

.

Computing Bayes factors using a generalization of the Savage-Dickey density ratio

,

J. Am. Stat. Assoc.

,

1995

, vol.

90

(pg.

614

-

618

)

Google Scholar

Crossref

WorldCat

Vuong

Q. H.

.

Likelihood ratio tests for model selection and non-nested hypotheses

,

Econometrica

,

1989

, vol.

57

(pg.

307

-

333

)

Google Scholar

Crossref

WorldCat

Wasserman

L.

.

Bayesian model selection and model averaging

,

J. Math. Psychol.

,

2000

, vol.

44

(pg.

92

-

107

)

Weakliem

D. L.

.

A critique of the Bayesian information criterion for model selection

,

Sociol. Methods Res.

,

1999

, vol.

27

(pg.

359

-

397

)

Google Scholar

Crossref

WorldCat

Whelan

S.

,

Goldman

N.

.

Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics

,

Mol. Biol. Evol.

,

1999

, vol.

16

(pg.

1292

-

1299

)

Google Scholar

Crossref

WorldCat

Woodroofe

M.

.

On the model selection and the arc sine laws

,

Ann. Stat.

,

1982

, vol.

10

(pg.

1182

-

1194

)

Google Scholar

Crossref

WorldCat

Yang

Z.

.

Among-site rate variation and its impact on phylogenetic analysis

,

Trends Ecol. Evol.

,

1996a

, vol.

11

(pg.

367

-

372

)

Google Scholar

Crossref

WorldCat

Yang

Z.

.

Maximum-likelihood models for combined analyses of multiple sequence data

,

J. Mol. Evol.

,

1996b

, vol.

42

(pg.

587

-

596

)

Google Scholar

Crossref

WorldCat

Yang

Z.

,

Goldman

N.

,

Friday

A.

.

Maximum likelihood trees from DNA sequences: A peculiar statistical estimation problem

,

Syst. Biol.

,

1995

, vol.

44

(pg.

384

-

399

)

Google Scholar

Crossref

WorldCat

Yang

Z.

,

Nielsen

R.

,

Goldman

N.

,

Pedersen

A.-M. K.

.

Codon-substitution models for heterogeneous selection pressure at amino acid sites

,

Genetics

,

2000

, vol.

155

(pg.

431

-

449

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Yang

Z.

,

Rannala

B.

.

Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method

,

Mol. Biol. Evol.

,

1997

, vol.

14

(pg.

717

-

724

)

Zhang

J.

.

Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models

,

Mol. Biol. Evol.

,

1999

, vol.

16

(pg.

868

-

875

)

Zharkikh

A.

.

Estimation of evolutionary distances between nucleotide sequences

,

J. Mol. Evol.

,

1994

, vol.

39

(pg.

315

-

329

)

Zucchini

W.

.

An introduction to model selection

,

J. Math. Psychol.

,

2000

, vol.

44

(pg.

41

-

46

)

Associate Editor:

Download all slides

Month:	Total Views:
January 2017	59
February 2017	160
March 2017	189
April 2017	157
May 2017	149
June 2017	146
July 2017	168
August 2017	151
September 2017	196
October 2017	159
November 2017	226
December 2017	402
January 2018	496
February 2018	414
March 2018	625
April 2018	551
May 2018	570
June 2018	461
July 2018	498
August 2018	436
September 2018	433
October 2018	590
November 2018	717
December 2018	634
January 2019	640
February 2019	621
March 2019	705
April 2019	689
May 2019	607
June 2019	528
July 2019	536
August 2019	490
September 2019	378
October 2019	256
November 2019	368
December 2019	187
January 2020	188
February 2020	219
March 2020	213
April 2020	335
May 2020	212
June 2020	262
July 2020	352
August 2020	367
September 2020	348
October 2020	361
November 2020	581
December 2020	456
January 2021	416
February 2021	464
March 2021	577
April 2021	557
May 2021	371
June 2021	338
July 2021	290
August 2021	369
September 2021	325
October 2021	366
November 2021	349
December 2021	296
January 2022	296
February 2022	334
March 2022	440
April 2022	411
May 2022	306
June 2022	316
July 2022	290
August 2022	274
September 2022	295
October 2022	306
November 2022	300
December 2022	261
January 2023	278
February 2023	271
March 2023	365
April 2023	340
May 2023	402
June 2023	240
July 2023	299
August 2023	242
September 2023	259
October 2023	341
November 2023	272
December 2023	287
January 2024	323
February 2024	299
March 2024	326
April 2024	299
May 2024	299
June 2024	264
July 2024	78

Article Contents

Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests

Abstract

Model Selection

The Likelihood Function

Hierarchical Likelihood Ratio Tests

Bayesian Model Selection

Bayes Factors

Posterior Probabilities

Bayesian Information Criterion

Decision Theoretic Approaches

Model Selection Uncertainty

Model Averaging

Akaike Information Criterion

Model Selection Uncertainty with the AIC

Model Averaging with the AIC

Model-Averaged Estimation of Phylogenies

Philosophical Considerations on Model Selection

Which Model Selection Method is Best for Phylogenetics?

Acknowledgements

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests

Abstract

Model Selection

The Likelihood Function

Hierarchical Likelihood Ratio Tests

Bayesian Model Selection

Bayes Factors

Posterior Probabilities

Bayesian Information Criterion

Decision Theoretic Approaches

Model Selection Uncertainty

Model Averaging

Akaike Information Criterion

Model Selection Uncertainty with the AIC

Model Averaging with the AIC

Model-Averaged Estimation of Phylogenies

Philosophical Considerations on Model Selection

Which Model Selection Method is Best for Phylogenetics?

Acknowledgements

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only