- Split View
-
Views
-
Cite
Cite
David Posada, Thomas R. Buckley, Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests, Systematic Biology, Volume 53, Issue 5, October 2004, Pages 793–808, https://doi.org/10.1080/10635150490522304
- Share Icon Share
Abstract
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001).
It is clear that models of nucleotide substitution (henceforth models of evolution) play a significant role in molecular phylogenetics, particularly in the context of distance, maximum likelihood (ML), and Bayesian estimation. We know that the use of one or other model affects many, if not all, stages of phylogenetic inference. For example, estimates of phylogeny, substitution rates, bootstrap values, posterior probabilities, or tests of the molecular clock are clearly influenced by the model of evolution used in the analysis (Buckley, 2002; Buckley and Cunningham, 2002; Buckley et al., 2001; Kelsey et al., 1999; Pupko et al., 2002; Sullivan and Swofford, 1997, 2001; Suzuki et al., 2002; Tamura, 1994; Yang et al., 1995; Zhang, 1999). We can argue, in general, that phylogenetic methods are less accurate (that is, they recover an incorrect phylogeny more often), or become inconsistent (converging to an incorrect tree with increasing number of characters) when the model of evolution assumed is wrong (Bruno and Halpern, 1999; Felsenstein, 1978; Huelsenbeck and Hillis, 1993; Penny et al., 1994). It is evident that the use of appropriate models is essential if we are to be confident in the results of a phylogenetic analysis, and indeed, several strategies for model choice have been proposed in the context of phylogenetics. We refer the reader to Johnson and Omland (2003), Posada and Crandall (2001b) and Posada (2001) for a detailed introduction, and for an evaluation of the performance of these methods to recover the model generating the data. Computer programs exist that implement these methods (Adachi and Hasegawa, 1996; Posada and Crandall, 1998). Among the available methods for model selection in phylogenetics, hierarchical likelihood ratio tests (hLRTs) are the most popular. However, here we argue that the hLRTs approach is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two allow for assessment of model selection uncertainty and model averaging.
Model Selection
Before proceeding further, it is worth reiterating the fact that any model of evolution we can construct is never going to be the “true model” that generated the data we observe. In other words, the set of models is misspecified. All models are wrong but some are useful (Box, 1976), and model selection is best seen as a way of approximating, rather than identifying, full reality (Burnham and Anderson, 2003, pp. 20–23). Statistical model selection is commonly based on William of Occam's (ca.1320) parsimony principle,1 by which hypotheses should be kept as simple as possible. In statistical terms, this is a trade-off between bias (distance between the average estimate and truth) and variance (spread of the estimates around the truth) (Fig. 1). The idea is that by adding parameters to a model we obtain improvement in fit (see below) to some degree, but at the same time parameter estimates are “worse” because we have less data (i.e., information) per parameter. In addition, the computations typically require more time. So the question is how complex should the model be for a given problem.
The Likelihood Function
However, this multidimensional integral can be very difficult to compute, and it is typically approximated using computationally intensive techniques like Markov chain Monte Carlo (MCMC) (Gilks et al., 1996; Hastings, 1970; Metropolis et al., 1953). Steel and Penny (2000) and Holder and Lewis (2003) provide an instructive discussion on joint and marginal estimation in the context of phylogenetics.
Hierarchical Likelihood Ratio Tests
The approximation of this P-value is straightforward for nested models, using a standard or mixed χ2 distribution (Goldman, 1993; Goldman and Whelan, 2000; Kendall and Stuart, 1979; Ota et al., 2000). Two models are nested when one of them, the null model, is a special case of the other, the alternative model. For example, the Jukes-Cantor model (Jukes and Cantor, 1969) (JC69) is nested within the Kimura two-parameter model (Kimura, 1980) (K80), because if we assume that transitions and transversions occur at the same rate (i.e., κ = 1), K80 collapses to JC69. However, obtaining correctP-values for the LRT statistics can be difficult. LRTs implicitly assume that at least one of the models compared is correct, and when the models are misspecified these tests can often be incorrect (Foutz and Srivastava, 1977; Golden, 1995; Kent, 1982). Although proper LRTs can be constructed when models are wrong (Vuong, 1989), standard LRTs in phylogenetics are not robust to model misspecification (Zhang, 1999). When the models are non-nested, the χ2 approximation is not longer valid, and more computationally intensive Monte Carlo methods are needed (Goldman, 1993; Whelan and Goldman, 1999). In addition, when sample size is small the usual asymptotic approximation on which P-values are based no longer applies.
Furthermore, LRTs were designed for hypothesis testing, and although classical hypothesis testing is commonly used as a model selection strategy, it has been argued that hypothesis testing and model selection are distinct issues (Burnham and Anderson, 2003, pp. 132–134). A stepwise procedure like the hLRTs, in which we sequentially decide whether to add (or remove) certain parameters, is analogous to forward and backward selections in best-subset linear regression (Miller, 2002, pp. 39–46), which do not guarantee finding the optimal model. As pointed out by Sanderson and Kim (2000), we can identify several potential problems with the use of hLRTs for model selection in phylogenetics. There exist situations in which an optimal model may not exist for the hLRTs procedure. This kind of situation occurs, for example, if the general time-reversible model (Tavaré, 1986) (GTR) is not significantly better than the Hasegawa et al. model (1985) (HKY85), HKY85 is not significantly better than JC69, but GTR is significantly better than JC69. Even if an optimal model exists, it will be always a function of the significance level, and the outcome of the model choice procedure may vary accordingly. In addition, the hLRTs approach performs multiple tests with the same data, and this will increase the rate of false positives (that is, to reject the null hypothesis when it is true): the probability of falsely rejecting the null hypothesis at least once in n tests is 1−(1−α)n. Although there are statistical procedures to correct for this effect—like the Bonferroni correction (see Hochberg, 1988)—here the tests are nonindependent, and the appropriate adjustment can be very complex (see also Shimodaira, 1998, 2001; Shimodaira and Hasegawa, 1999). The outcome of the hLRTs might also be affected by the starting model (for the hLRTs procedure we need to select a starting point, usually represented by the simplest or the most complex model in the set of candidate models). In addition, there are cases in which the hLRTs will not select the best model, according to its own criteria, among the candidate models.
Indeed, these problems can have an impact on the analysis of real data sets, and we have analyzed a set of HIV sequences (Posada and Crandall, 2001a) for illustrative purposes (Fig. 3) (Pol, in press). In Figure 3a we can see a case in which an optimal model does not exist, as all of the three models are rejected when compared with one of the other two. However, we will select HKY85 as the best fit (because we did not compare HKY85 and GTR). Also, note that increasing the significance level (Fig. 3b) changes the outcome, as GTR now becomes the best fit model. With a different set of candidate models, and if we start with HKY85, the model selected will be HKY85 (Fig. 3c), which is a suboptimal choice, whereas if we start with GTR the model selected will be GTR (Fig. 3d), which is actually the optimal model. We cannot devise a hierarchy of hLRTs that overcomes all these problems at once, but better approaches exist than simply forward and backward selection (Miller, 2002).
Bayesian Model Selection
Model selection is an integral part of Bayesian estimation (Gelfand, 1996; Raftery, 1996; Wasserman, 2000), and within this framework, different strategies exist to accomplish the same tasks.
Bayes Factors
Evidence for Mi is considered very strong if Bij > 150, strong if 12 < Bij < 150, positive if 3 < Bij < 12, barely worth mentioning if 1 < Bij < 3, and negative (supports Mj) if Bij < 1 (Raftery, 1996). It is important to note that Bayes factors compare model likelihoods orP(D| M), which are calculated by integrating—not maximizing—over all possible parameter values (except in empirical Bayesian approaches, where maximum likelihood estimates can be used instead). Therefore we should not confound them with the log of the maximized likelihoods (ℓ) used in the LRTs and AIC. Bayes factors are already being used in the context of phylogenetics, for example to infer the occurrence of recombination events (Suchard et al., 2002), to compare different phylogenetic hypothesis (Huelsenbeck and Imennov, 2002; Huelsenbeck et al., 2000; Suchard et al., 2003b) and for model selection (Aris-Brosou and Yang, 2002; Huelsenbeck et al., 2004; Nylander et al., 2004; Suchard et al., 2001).
Posterior Probabilities
A word is needed about model prior probabilities P(Mi). Although models are commonly assigned equal prior probabilities, in phylogenetics we may have prior beliefs stating that some models are more probable than others. For example, we have enough information about the process of mitochondrial sequence evolution to believe that the JC69 model is less probable in this case than the HKY85 model with a gamma distribution for rates among sites (see Yang, 1996a). Ideally, this information should be reflected in the model priors, and although considerable Bayesian research exists on eliciting prior information (Kadane and Wolfson, 1998; Madigan et al., 1995), it still seems be very difficult to quantify. Fortunately, if the signal in the data, conveyed through the likelihood, is strong enough, then the prior distributions should not have a large influence on the posterior distribution. Indeed, posterior probabilities of trees are already being used to estimate phylogenies (Holder and Lewis, 2003; Huelsenbeck et al., 2001, 2002; Larget and Simon, 1999; Mau and Newton, 1997; Mau et al., 1999; Yang and Rannala, 1997).
When the priors for the parameters in the complex model are very diffuse, Bayesian approaches tend to support the null model in contradiction to significance tests (e.g., LRTs) as sample size increases—the so called Jeffreys-Lindley's paradox (Bartlett, 1957; Jeffreys, 1939; Lindley, 1957; Shafer, 1982). If the diffuseness of these priors arises because of mere ignorance of the values these parameters can take, this conflict highlights a disadvantage of Bayesian approaches, especially in the case of Bayesian Information Criterion (BIC) (see below), which assume flat, improper priors. In any case, Jeffreys-Lindley's paradox illustrates the relevance, for good or for bad, of the priors we choose for the model parameters (Huelsenbeck et al., 2002). Moreover, in some situations Bayesian approaches and standard significance tests can also be irreconcilable when testing point (or sharp) null hypotheses, for example, H0: ti/tv = 0.5 versus H1: ti/tv ≠ 0.5 (Berger and Sellke, 1987) (ti/tv is the transition/transversion ratio).
Bayesian Information Criterion
A collection of BIC statistics contains the same information as a collection of pairwise Bayes factors. However, when choosing among several models, the BIC statistics are easier to interpret by visual inspection, as they allow for the simultaneous comparison of multiple models, so the best-fit models can be immediately identified. On the other hand, selecting the best-fit model from a collection of multiple pairwise Bayes factors could be more burdensome, and such procedure might suffer from some of the problems described above for the hLRTs. Nevertheless, the BIC approximation might not be appropriate when the posterior mode occurs at the boundary of the parameter space (Hsiao, 1997; Ota et al., 2000).
Decision Theoretic Approaches
Recently, Minin et al. (2003) applied decision theory (Bernardo and Smith, 1994) to develop a novel model selection strategy (the DT method) that extends the BIC. Minin et al. (2003) argue that there is no guarantee that the best-fit models will produce the best estimates of phylogeny, and therefore propose a model selection method that incorporates some measure of phylogenetic performance. They assess models through a penalty or loss function, related to how dissimilar the branch length estimates are across models, and pick the model with the minimum posterior loss. As expected, simulations suggested that models selected with this criterion result in slightly more accurate branch length estimates than those obtained under models selected by the hLRTs.
Model Selection Uncertainty
Once we have selected a model it is very important that we are able to assess how confident we are in that selection (see Chatfield, 1995). We would like to be able to rank the models and to know whether the model selected is much better than the other candidate models. At the same time, we should be interested to learn whether we would select the same model if several other independent samples were available. The assessment of model selection uncertainty has a long tradition within the Bayesian community and posterior probabilities can be naturally used to take account of model uncertainty (Kass and Raftery, 1995; Madigan and Raftery, 1994). For example, models can be ranked according to their posterior probabilities and 95% credible intervals (Occam's Window) can easily be constructed by summing these probabilities (Madigan and Raftery, 1994). Although computing posterior probabilities can be hard and time consuming, in theory we could approximate those probabilities with the BIC. Furthermore, we could also use the BIC values or posterior risks of the DT method (Minin et al., 2003) in the same way that we use the AIC below above to assess model selection uncertainty, although this could be considered ad hoc (see Hoeting et al., 1999).
Model Averaging
We also need to be careful when interpreting the relative importance of parameters. When the number of candidate models is less than the number of possible combinations of parameters, the presence-absence of some pairs of parameters can be correlated, and so their relative importances. In other words, if parameter ɛ actually has a high relative importance, then a second parameter η might yield a high relative importance simply because the presence-absence of parameters ɛ and η among models is positively correlated. For the 56 models in Table 1, the presence of the different base frequencies parameters (π) is completely correlated, whereas the presence of several substitution rates (ϕ) show complete or high levels of correlation. The presence of parameter κ is inversely correlated with that of several substitution rate parameters (e.g., ϕA − G). The presence of α, the shape of the gamma distribution for rate variation among sites, or pinv, the proportion of invariable sites, is not correlated with that of any other parameter.
Model . | ℓ . | K . | AICc . | Δ AICc . | w . | Cum(w) . |
---|---|---|---|---|---|---|
TN93+I+Γ | 5441.4600 | 78 | 11045.5888 | 0.0000 | 0.5221 | 0.5221 |
TIM+I+Γ | 5441.3765 | 79 | 11047.5965 | 2.0077 | 0.1913 | 0.7134 |
HKY85+I+Γ | 5443.6729 | 77 | 11047.8422 | 2.2534 | 0.1692 | 0.8826 |
K81uf+I+Γ | 5443.5566 | 78 | 11049.7821 | 4.1934 | 0.0641 | 0.9468 |
GTR+I+Γ | 5440.9150 | 81 | 11051.0301 | 5.4413 | 0.0344 | 0.9811 |
TVM+I+Γ | 5442.7393 | 80 | 11052.4991 | 6.9103 | 0.0165 | 0.9976 |
TN93+Γ | 5448.6792 | 77 | 11057.8549 | 12.2661 | 0.0011 | 0.9988 |
HKY85+Γ | 5450.5068 | 76 | 11059.3402 | 13.7514 | 0.0005 | 0.9993 |
TIM+Γ | 5448.6577 | 78 | 11059.9843 | 14.3955 | 0.0004 | 0.9997 |
K81uf+Γ | 5450.4883 | 77 | 11061.4730 | 15.8843 | 0.0002 | 0.9999 |
GTR+Γ | 5448.0298 | 80 | 11063.0802 | 17.4914 | 0.0001 | 1.0000 |
TVM+Γ | 5449.6685 | 79 | 11064.1804 | 18.5917 | 0.0000 | 1.0000 |
TN93+I | 5470.7568 | 77 | 11102.0102 | 56.4214 | 0.0000 | 1.0000 |
TIM+I | 5470.7417 | 78 | 11104.1522 | 58.5635 | 0.0000 | 1.0000 |
GTR+I | 5470.3452 | 80 | 11107.7110 | 62.1223 | 0.0000 | 1.0000 |
HKY85+I | 5476.8496 | 76 | 11112.0257 | 66.4370 | 0.0000 | 1.0000 |
K81uf+I | 5476.8208 | 77 | 11114.1381 | 68.5493 | 0.0000 | 1.0000 |
TVM+I | 5476.1650 | 79 | 11117.1736 | 71.5849 | 0.0000 | 1.0000 |
F81+I+Γ | 5769.1118 | 76 | 11696.5501 | 650.9614 | 0.0000 | 1.0000 |
F81+Γ | 5782.0566 | 75 | 11720.2721 | 674.6834 | 0.0000 | 1.0000 |
F81+I | 5807.4927 | 75 | 11771.1442 | 725.5554 | 0.0000 | 1.0000 |
GTR | 5805.0576 | 79 | 11774.9588 | 729.3700 | 0.0000 | 1.0000 |
TVM | 5808.4727 | 78 | 11779.6141 | 734.0254 | 0.0000 | 1.0000 |
TIM | 5810.4102 | 77 | 11781.3168 | 735.7280 | 0.0000 | 1.0000 |
TN93 | 5813.4780 | 76 | 11785.2825 | 739.6938 | 0.0000 | 1.0000 |
K81uf | 5813.5190 | 76 | 11785.3646 | 739.7758 | 0.0000 | 1.0000 |
HKY85 | 5816.5894 | 75 | 11789.3375 | 743.7488 | 0.0000 | 1.0000 |
SYM+I+Γ | 5861.0859 | 78 | 11884.8407 | 839.2520 | 0.0000 | 1.0000 |
TVMef+I+Γ | 5867.6128 | 77 | 11895.7221 | 850.1333 | 0.0000 | 1.0000 |
SYM+Γ | 5876.7803 | 77 | 11914.0570 | 868.4683 | 0.0000 | 1.0000 |
TVMef+Γ | 5884.4272 | 76 | 11927.1810 | 881.5922 | 0.0000 | 1.0000 |
TIMef+I+Γ | 5885.0684 | 76 | 11928.4632 | 882.8745 | 0.0000 | 1.0000 |
K81+I+Γ | 5893.7642 | 75 | 11943.6872 | 898.0984 | 0.0000 | 1.0000 |
TN93ef+I+Γ | 5897.7529 | 75 | 11951.6647 | 906.0759 | 0.0000 | 1.0000 |
TIMef+Γ | 5899.2588 | 75 | 11954.6764 | 909.0877 | 0.0000 | 1.0000 |
K80+I+Γ | 5906.2329 | 74 | 11966.4593 | 920.8706 | 0.0000 | 1.0000 |
K81+Γ | 5908.7876 | 74 | 11971.5687 | 925.9800 | 0.0000 | 1.0000 |
TN93ef+Γ | 5911.5659 | 74 | 11977.1254 | 931.5366 | 0.0000 | 1.0000 |
SYM+I | 5908.7021 | 77 | 11977.9008 | 932.3120 | 0.0000 | 1.0000 |
TVMef+I | 5917.6128 | 76 | 11993.5521 | 947.9633 | 0.0000 | 1.0000 |
K80+Γ | 5920.9038 | 73 | 11993.6382 | 948.0494 | 0.0000 | 1.0000 |
TIMef+I | 5928.9629 | 75 | 12014.0846 | 968.4959 | 0.0000 | 1.0000 |
K81+I | 5938.0137 | 74 | 12030.0209 | 984.4321 | 0.0000 | 1.0000 |
TN93ef+I | 5940.7383 | 74 | 12035.4701 | 989.8813 | 0.0000 | 1.0000 |
K80+I | 5949.5186 | 73 | 12050.8677 | 1005.2789 | 0.0000 | 1.0000 |
F81 | 6088.2227 | 74 | 12330.4388 | 1284.8501 | 0.0000 | 1.0000 |
JC69+I+Γ | 6101.2656 | 73 | 12354.3618 | 1308.7730 | 0.0000 | 1.0000 |
JC69+Γ | 6114.8408 | 72 | 12379.3515 | 1333.7628 | 0.0000 | 1.0000 |
JC69+I | 6142.1719 | 72 | 12434.0137 | 1388.4249 | 0.0000 | 1.0000 |
SYM | 6170.8916 | 76 | 12500.1097 | 1454.5209 | 0.0000 | 1.0000 |
TVMef | 6190.3394 | 75 | 12536.8375 | 1491.2488 | 0.0000 | 1.0000 |
TIMef | 6194.5806 | 74 | 12543.1547 | 1497.5659 | 0.0000 | 1.0000 |
TN93ef | 6210.6353 | 73 | 12573.1011 | 1527.5123 | 0.0000 | 1.0000 |
K81 | 6214.1152 | 73 | 12580.0610 | 1534.4723 | 0.0000 | 1.0000 |
K80 | 6230.2100 | 72 | 12610.0898 | 1564.5011 | 0.0000 | 1.0000 |
JC69 | 6411.5161 | 71 | 12970.5438 | 1924.9551 | 0.0000 | 1.0000 |
Model . | ℓ . | K . | AICc . | Δ AICc . | w . | Cum(w) . |
---|---|---|---|---|---|---|
TN93+I+Γ | 5441.4600 | 78 | 11045.5888 | 0.0000 | 0.5221 | 0.5221 |
TIM+I+Γ | 5441.3765 | 79 | 11047.5965 | 2.0077 | 0.1913 | 0.7134 |
HKY85+I+Γ | 5443.6729 | 77 | 11047.8422 | 2.2534 | 0.1692 | 0.8826 |
K81uf+I+Γ | 5443.5566 | 78 | 11049.7821 | 4.1934 | 0.0641 | 0.9468 |
GTR+I+Γ | 5440.9150 | 81 | 11051.0301 | 5.4413 | 0.0344 | 0.9811 |
TVM+I+Γ | 5442.7393 | 80 | 11052.4991 | 6.9103 | 0.0165 | 0.9976 |
TN93+Γ | 5448.6792 | 77 | 11057.8549 | 12.2661 | 0.0011 | 0.9988 |
HKY85+Γ | 5450.5068 | 76 | 11059.3402 | 13.7514 | 0.0005 | 0.9993 |
TIM+Γ | 5448.6577 | 78 | 11059.9843 | 14.3955 | 0.0004 | 0.9997 |
K81uf+Γ | 5450.4883 | 77 | 11061.4730 | 15.8843 | 0.0002 | 0.9999 |
GTR+Γ | 5448.0298 | 80 | 11063.0802 | 17.4914 | 0.0001 | 1.0000 |
TVM+Γ | 5449.6685 | 79 | 11064.1804 | 18.5917 | 0.0000 | 1.0000 |
TN93+I | 5470.7568 | 77 | 11102.0102 | 56.4214 | 0.0000 | 1.0000 |
TIM+I | 5470.7417 | 78 | 11104.1522 | 58.5635 | 0.0000 | 1.0000 |
GTR+I | 5470.3452 | 80 | 11107.7110 | 62.1223 | 0.0000 | 1.0000 |
HKY85+I | 5476.8496 | 76 | 11112.0257 | 66.4370 | 0.0000 | 1.0000 |
K81uf+I | 5476.8208 | 77 | 11114.1381 | 68.5493 | 0.0000 | 1.0000 |
TVM+I | 5476.1650 | 79 | 11117.1736 | 71.5849 | 0.0000 | 1.0000 |
F81+I+Γ | 5769.1118 | 76 | 11696.5501 | 650.9614 | 0.0000 | 1.0000 |
F81+Γ | 5782.0566 | 75 | 11720.2721 | 674.6834 | 0.0000 | 1.0000 |
F81+I | 5807.4927 | 75 | 11771.1442 | 725.5554 | 0.0000 | 1.0000 |
GTR | 5805.0576 | 79 | 11774.9588 | 729.3700 | 0.0000 | 1.0000 |
TVM | 5808.4727 | 78 | 11779.6141 | 734.0254 | 0.0000 | 1.0000 |
TIM | 5810.4102 | 77 | 11781.3168 | 735.7280 | 0.0000 | 1.0000 |
TN93 | 5813.4780 | 76 | 11785.2825 | 739.6938 | 0.0000 | 1.0000 |
K81uf | 5813.5190 | 76 | 11785.3646 | 739.7758 | 0.0000 | 1.0000 |
HKY85 | 5816.5894 | 75 | 11789.3375 | 743.7488 | 0.0000 | 1.0000 |
SYM+I+Γ | 5861.0859 | 78 | 11884.8407 | 839.2520 | 0.0000 | 1.0000 |
TVMef+I+Γ | 5867.6128 | 77 | 11895.7221 | 850.1333 | 0.0000 | 1.0000 |
SYM+Γ | 5876.7803 | 77 | 11914.0570 | 868.4683 | 0.0000 | 1.0000 |
TVMef+Γ | 5884.4272 | 76 | 11927.1810 | 881.5922 | 0.0000 | 1.0000 |
TIMef+I+Γ | 5885.0684 | 76 | 11928.4632 | 882.8745 | 0.0000 | 1.0000 |
K81+I+Γ | 5893.7642 | 75 | 11943.6872 | 898.0984 | 0.0000 | 1.0000 |
TN93ef+I+Γ | 5897.7529 | 75 | 11951.6647 | 906.0759 | 0.0000 | 1.0000 |
TIMef+Γ | 5899.2588 | 75 | 11954.6764 | 909.0877 | 0.0000 | 1.0000 |
K80+I+Γ | 5906.2329 | 74 | 11966.4593 | 920.8706 | 0.0000 | 1.0000 |
K81+Γ | 5908.7876 | 74 | 11971.5687 | 925.9800 | 0.0000 | 1.0000 |
TN93ef+Γ | 5911.5659 | 74 | 11977.1254 | 931.5366 | 0.0000 | 1.0000 |
SYM+I | 5908.7021 | 77 | 11977.9008 | 932.3120 | 0.0000 | 1.0000 |
TVMef+I | 5917.6128 | 76 | 11993.5521 | 947.9633 | 0.0000 | 1.0000 |
K80+Γ | 5920.9038 | 73 | 11993.6382 | 948.0494 | 0.0000 | 1.0000 |
TIMef+I | 5928.9629 | 75 | 12014.0846 | 968.4959 | 0.0000 | 1.0000 |
K81+I | 5938.0137 | 74 | 12030.0209 | 984.4321 | 0.0000 | 1.0000 |
TN93ef+I | 5940.7383 | 74 | 12035.4701 | 989.8813 | 0.0000 | 1.0000 |
K80+I | 5949.5186 | 73 | 12050.8677 | 1005.2789 | 0.0000 | 1.0000 |
F81 | 6088.2227 | 74 | 12330.4388 | 1284.8501 | 0.0000 | 1.0000 |
JC69+I+Γ | 6101.2656 | 73 | 12354.3618 | 1308.7730 | 0.0000 | 1.0000 |
JC69+Γ | 6114.8408 | 72 | 12379.3515 | 1333.7628 | 0.0000 | 1.0000 |
JC69+I | 6142.1719 | 72 | 12434.0137 | 1388.4249 | 0.0000 | 1.0000 |
SYM | 6170.8916 | 76 | 12500.1097 | 1454.5209 | 0.0000 | 1.0000 |
TVMef | 6190.3394 | 75 | 12536.8375 | 1491.2488 | 0.0000 | 1.0000 |
TIMef | 6194.5806 | 74 | 12543.1547 | 1497.5659 | 0.0000 | 1.0000 |
TN93ef | 6210.6353 | 73 | 12573.1011 | 1527.5123 | 0.0000 | 1.0000 |
K81 | 6214.1152 | 73 | 12580.0610 | 1534.4723 | 0.0000 | 1.0000 |
K80 | 6230.2100 | 72 | 12610.0898 | 1564.5011 | 0.0000 | 1.0000 |
JC69 | 6411.5161 | 71 | 12970.5438 | 1924.9551 | 0.0000 | 1.0000 |
Model . | ℓ . | K . | AICc . | Δ AICc . | w . | Cum(w) . |
---|---|---|---|---|---|---|
TN93+I+Γ | 5441.4600 | 78 | 11045.5888 | 0.0000 | 0.5221 | 0.5221 |
TIM+I+Γ | 5441.3765 | 79 | 11047.5965 | 2.0077 | 0.1913 | 0.7134 |
HKY85+I+Γ | 5443.6729 | 77 | 11047.8422 | 2.2534 | 0.1692 | 0.8826 |
K81uf+I+Γ | 5443.5566 | 78 | 11049.7821 | 4.1934 | 0.0641 | 0.9468 |
GTR+I+Γ | 5440.9150 | 81 | 11051.0301 | 5.4413 | 0.0344 | 0.9811 |
TVM+I+Γ | 5442.7393 | 80 | 11052.4991 | 6.9103 | 0.0165 | 0.9976 |
TN93+Γ | 5448.6792 | 77 | 11057.8549 | 12.2661 | 0.0011 | 0.9988 |
HKY85+Γ | 5450.5068 | 76 | 11059.3402 | 13.7514 | 0.0005 | 0.9993 |
TIM+Γ | 5448.6577 | 78 | 11059.9843 | 14.3955 | 0.0004 | 0.9997 |
K81uf+Γ | 5450.4883 | 77 | 11061.4730 | 15.8843 | 0.0002 | 0.9999 |
GTR+Γ | 5448.0298 | 80 | 11063.0802 | 17.4914 | 0.0001 | 1.0000 |
TVM+Γ | 5449.6685 | 79 | 11064.1804 | 18.5917 | 0.0000 | 1.0000 |
TN93+I | 5470.7568 | 77 | 11102.0102 | 56.4214 | 0.0000 | 1.0000 |
TIM+I | 5470.7417 | 78 | 11104.1522 | 58.5635 | 0.0000 | 1.0000 |
GTR+I | 5470.3452 | 80 | 11107.7110 | 62.1223 | 0.0000 | 1.0000 |
HKY85+I | 5476.8496 | 76 | 11112.0257 | 66.4370 | 0.0000 | 1.0000 |
K81uf+I | 5476.8208 | 77 | 11114.1381 | 68.5493 | 0.0000 | 1.0000 |
TVM+I | 5476.1650 | 79 | 11117.1736 | 71.5849 | 0.0000 | 1.0000 |
F81+I+Γ | 5769.1118 | 76 | 11696.5501 | 650.9614 | 0.0000 | 1.0000 |
F81+Γ | 5782.0566 | 75 | 11720.2721 | 674.6834 | 0.0000 | 1.0000 |
F81+I | 5807.4927 | 75 | 11771.1442 | 725.5554 | 0.0000 | 1.0000 |
GTR | 5805.0576 | 79 | 11774.9588 | 729.3700 | 0.0000 | 1.0000 |
TVM | 5808.4727 | 78 | 11779.6141 | 734.0254 | 0.0000 | 1.0000 |
TIM | 5810.4102 | 77 | 11781.3168 | 735.7280 | 0.0000 | 1.0000 |
TN93 | 5813.4780 | 76 | 11785.2825 | 739.6938 | 0.0000 | 1.0000 |
K81uf | 5813.5190 | 76 | 11785.3646 | 739.7758 | 0.0000 | 1.0000 |
HKY85 | 5816.5894 | 75 | 11789.3375 | 743.7488 | 0.0000 | 1.0000 |
SYM+I+Γ | 5861.0859 | 78 | 11884.8407 | 839.2520 | 0.0000 | 1.0000 |
TVMef+I+Γ | 5867.6128 | 77 | 11895.7221 | 850.1333 | 0.0000 | 1.0000 |
SYM+Γ | 5876.7803 | 77 | 11914.0570 | 868.4683 | 0.0000 | 1.0000 |
TVMef+Γ | 5884.4272 | 76 | 11927.1810 | 881.5922 | 0.0000 | 1.0000 |
TIMef+I+Γ | 5885.0684 | 76 | 11928.4632 | 882.8745 | 0.0000 | 1.0000 |
K81+I+Γ | 5893.7642 | 75 | 11943.6872 | 898.0984 | 0.0000 | 1.0000 |
TN93ef+I+Γ | 5897.7529 | 75 | 11951.6647 | 906.0759 | 0.0000 | 1.0000 |
TIMef+Γ | 5899.2588 | 75 | 11954.6764 | 909.0877 | 0.0000 | 1.0000 |
K80+I+Γ | 5906.2329 | 74 | 11966.4593 | 920.8706 | 0.0000 | 1.0000 |
K81+Γ | 5908.7876 | 74 | 11971.5687 | 925.9800 | 0.0000 | 1.0000 |
TN93ef+Γ | 5911.5659 | 74 | 11977.1254 | 931.5366 | 0.0000 | 1.0000 |
SYM+I | 5908.7021 | 77 | 11977.9008 | 932.3120 | 0.0000 | 1.0000 |
TVMef+I | 5917.6128 | 76 | 11993.5521 | 947.9633 | 0.0000 | 1.0000 |
K80+Γ | 5920.9038 | 73 | 11993.6382 | 948.0494 | 0.0000 | 1.0000 |
TIMef+I | 5928.9629 | 75 | 12014.0846 | 968.4959 | 0.0000 | 1.0000 |
K81+I | 5938.0137 | 74 | 12030.0209 | 984.4321 | 0.0000 | 1.0000 |
TN93ef+I | 5940.7383 | 74 | 12035.4701 | 989.8813 | 0.0000 | 1.0000 |
K80+I | 5949.5186 | 73 | 12050.8677 | 1005.2789 | 0.0000 | 1.0000 |
F81 | 6088.2227 | 74 | 12330.4388 | 1284.8501 | 0.0000 | 1.0000 |
JC69+I+Γ | 6101.2656 | 73 | 12354.3618 | 1308.7730 | 0.0000 | 1.0000 |
JC69+Γ | 6114.8408 | 72 | 12379.3515 | 1333.7628 | 0.0000 | 1.0000 |
JC69+I | 6142.1719 | 72 | 12434.0137 | 1388.4249 | 0.0000 | 1.0000 |
SYM | 6170.8916 | 76 | 12500.1097 | 1454.5209 | 0.0000 | 1.0000 |
TVMef | 6190.3394 | 75 | 12536.8375 | 1491.2488 | 0.0000 | 1.0000 |
TIMef | 6194.5806 | 74 | 12543.1547 | 1497.5659 | 0.0000 | 1.0000 |
TN93ef | 6210.6353 | 73 | 12573.1011 | 1527.5123 | 0.0000 | 1.0000 |
K81 | 6214.1152 | 73 | 12580.0610 | 1534.4723 | 0.0000 | 1.0000 |
K80 | 6230.2100 | 72 | 12610.0898 | 1564.5011 | 0.0000 | 1.0000 |
JC69 | 6411.5161 | 71 | 12970.5438 | 1924.9551 | 0.0000 | 1.0000 |
Model . | ℓ . | K . | AICc . | Δ AICc . | w . | Cum(w) . |
---|---|---|---|---|---|---|
TN93+I+Γ | 5441.4600 | 78 | 11045.5888 | 0.0000 | 0.5221 | 0.5221 |
TIM+I+Γ | 5441.3765 | 79 | 11047.5965 | 2.0077 | 0.1913 | 0.7134 |
HKY85+I+Γ | 5443.6729 | 77 | 11047.8422 | 2.2534 | 0.1692 | 0.8826 |
K81uf+I+Γ | 5443.5566 | 78 | 11049.7821 | 4.1934 | 0.0641 | 0.9468 |
GTR+I+Γ | 5440.9150 | 81 | 11051.0301 | 5.4413 | 0.0344 | 0.9811 |
TVM+I+Γ | 5442.7393 | 80 | 11052.4991 | 6.9103 | 0.0165 | 0.9976 |
TN93+Γ | 5448.6792 | 77 | 11057.8549 | 12.2661 | 0.0011 | 0.9988 |
HKY85+Γ | 5450.5068 | 76 | 11059.3402 | 13.7514 | 0.0005 | 0.9993 |
TIM+Γ | 5448.6577 | 78 | 11059.9843 | 14.3955 | 0.0004 | 0.9997 |
K81uf+Γ | 5450.4883 | 77 | 11061.4730 | 15.8843 | 0.0002 | 0.9999 |
GTR+Γ | 5448.0298 | 80 | 11063.0802 | 17.4914 | 0.0001 | 1.0000 |
TVM+Γ | 5449.6685 | 79 | 11064.1804 | 18.5917 | 0.0000 | 1.0000 |
TN93+I | 5470.7568 | 77 | 11102.0102 | 56.4214 | 0.0000 | 1.0000 |
TIM+I | 5470.7417 | 78 | 11104.1522 | 58.5635 | 0.0000 | 1.0000 |
GTR+I | 5470.3452 | 80 | 11107.7110 | 62.1223 | 0.0000 | 1.0000 |
HKY85+I | 5476.8496 | 76 | 11112.0257 | 66.4370 | 0.0000 | 1.0000 |
K81uf+I | 5476.8208 | 77 | 11114.1381 | 68.5493 | 0.0000 | 1.0000 |
TVM+I | 5476.1650 | 79 | 11117.1736 | 71.5849 | 0.0000 | 1.0000 |
F81+I+Γ | 5769.1118 | 76 | 11696.5501 | 650.9614 | 0.0000 | 1.0000 |
F81+Γ | 5782.0566 | 75 | 11720.2721 | 674.6834 | 0.0000 | 1.0000 |
F81+I | 5807.4927 | 75 | 11771.1442 | 725.5554 | 0.0000 | 1.0000 |
GTR | 5805.0576 | 79 | 11774.9588 | 729.3700 | 0.0000 | 1.0000 |
TVM | 5808.4727 | 78 | 11779.6141 | 734.0254 | 0.0000 | 1.0000 |
TIM | 5810.4102 | 77 | 11781.3168 | 735.7280 | 0.0000 | 1.0000 |
TN93 | 5813.4780 | 76 | 11785.2825 | 739.6938 | 0.0000 | 1.0000 |
K81uf | 5813.5190 | 76 | 11785.3646 | 739.7758 | 0.0000 | 1.0000 |
HKY85 | 5816.5894 | 75 | 11789.3375 | 743.7488 | 0.0000 | 1.0000 |
SYM+I+Γ | 5861.0859 | 78 | 11884.8407 | 839.2520 | 0.0000 | 1.0000 |
TVMef+I+Γ | 5867.6128 | 77 | 11895.7221 | 850.1333 | 0.0000 | 1.0000 |
SYM+Γ | 5876.7803 | 77 | 11914.0570 | 868.4683 | 0.0000 | 1.0000 |
TVMef+Γ | 5884.4272 | 76 | 11927.1810 | 881.5922 | 0.0000 | 1.0000 |
TIMef+I+Γ | 5885.0684 | 76 | 11928.4632 | 882.8745 | 0.0000 | 1.0000 |
K81+I+Γ | 5893.7642 | 75 | 11943.6872 | 898.0984 | 0.0000 | 1.0000 |
TN93ef+I+Γ | 5897.7529 | 75 | 11951.6647 | 906.0759 | 0.0000 | 1.0000 |
TIMef+Γ | 5899.2588 | 75 | 11954.6764 | 909.0877 | 0.0000 | 1.0000 |
K80+I+Γ | 5906.2329 | 74 | 11966.4593 | 920.8706 | 0.0000 | 1.0000 |
K81+Γ | 5908.7876 | 74 | 11971.5687 | 925.9800 | 0.0000 | 1.0000 |
TN93ef+Γ | 5911.5659 | 74 | 11977.1254 | 931.5366 | 0.0000 | 1.0000 |
SYM+I | 5908.7021 | 77 | 11977.9008 | 932.3120 | 0.0000 | 1.0000 |
TVMef+I | 5917.6128 | 76 | 11993.5521 | 947.9633 | 0.0000 | 1.0000 |
K80+Γ | 5920.9038 | 73 | 11993.6382 | 948.0494 | 0.0000 | 1.0000 |
TIMef+I | 5928.9629 | 75 | 12014.0846 | 968.4959 | 0.0000 | 1.0000 |
K81+I | 5938.0137 | 74 | 12030.0209 | 984.4321 | 0.0000 | 1.0000 |
TN93ef+I | 5940.7383 | 74 | 12035.4701 | 989.8813 | 0.0000 | 1.0000 |
K80+I | 5949.5186 | 73 | 12050.8677 | 1005.2789 | 0.0000 | 1.0000 |
F81 | 6088.2227 | 74 | 12330.4388 | 1284.8501 | 0.0000 | 1.0000 |
JC69+I+Γ | 6101.2656 | 73 | 12354.3618 | 1308.7730 | 0.0000 | 1.0000 |
JC69+Γ | 6114.8408 | 72 | 12379.3515 | 1333.7628 | 0.0000 | 1.0000 |
JC69+I | 6142.1719 | 72 | 12434.0137 | 1388.4249 | 0.0000 | 1.0000 |
SYM | 6170.8916 | 76 | 12500.1097 | 1454.5209 | 0.0000 | 1.0000 |
TVMef | 6190.3394 | 75 | 12536.8375 | 1491.2488 | 0.0000 | 1.0000 |
TIMef | 6194.5806 | 74 | 12543.1547 | 1497.5659 | 0.0000 | 1.0000 |
TN93ef | 6210.6353 | 73 | 12573.1011 | 1527.5123 | 0.0000 | 1.0000 |
K81 | 6214.1152 | 73 | 12580.0610 | 1534.4723 | 0.0000 | 1.0000 |
K80 | 6230.2100 | 72 | 12610.0898 | 1564.5011 | 0.0000 | 1.0000 |
JC69 | 6411.5161 | 71 | 12970.5438 | 1924.9551 | 0.0000 | 1.0000 |
Indeed, the averaged parameter could be the topology itself, so we could construct a model-averaged estimate of phylogeny. We will come back to this later.
Akaike Information Criterion
In the context of phylogenetics we can think of the AIC as the amount of information lost when we use, say HKY85, to approximate the real process of nucleotide substitution. Hence, we prefer the model with the smallest AIC. The second term K includes the parameters from the substitution model, like base frequencies, substitution rates, proportion of invariable sites, or rate variation among sites. If branch lengths are estimated de novo for every model, K should also include the number of branches (for an unrooted bifurcated tree, twice the number of taxa minus three). Although the inclusion of the number of branches, constant for all models, does not change the order of the AIC values, it will change their relative magnitude.
The AIC is designed to estimate the predictiveaccuracy of competing hypotheses (Forster, 2002; Sober, 2002b), which is the expected performance of a model when predicting new data. The prediction of new data is a common application in phylogenetics, for example in parametric bootstrapping or simulation studies. It seems that the AIC was first applied in the context of phylogenetics by Hasegawa and collaborators (1990a; 1990b; Kishino and Hasegawa, 1989), and although several phylogenetics programs implement the AIC, like Molphy (Adachi and Hasegawa, 1996) and Modeltest (Posada, 2003; Posada and Crandall, 1998), the use of the AIC is much less common than that of the hLRTs.
The AIC makes several assumptions. First, there is the assumption of “uniformity of nature” (Forster and Sober, 1994), that is, that all data sets (future and past) are drawn from the same underlying process. Second, the AIC assumes that the sample size is large enough to ensure that the likelihood function will approximate its asymptotic properties. Finally the AIC assumes that the true distribution of parameter estimates, when the number of data n is sufficiently large, follows a multivariate normal distribution. In principle, these assumptions (on the other hand, common in statistical phylogenetics) should not be unduly restrictive (Forster and Sober, 1994, 2004), but the implications of potential violations need to be studied. It has been argued that constraining parameters at their boundaries, for example setting the proportion of invariable sites to be zero, might violate the derivation of the AIC (and the BIC) (Ota et al., 2000).
Model Selection Uncertainty with the AIC
The AIC differences allow for an immediate ranking of the candidate models. The larger the AIC difference for a model, the less probable that it is the best K-L model. As a rough rule of thumb, Burnham and Anderson (2003, p. 70) propose that models for which Δi ≤ 2 receive substantial support and are considered when making inferences, models having 4 ≤ Δi ≤ 7 have considerably less support, and models having Δi > 10 receive no support. However, they also warn that these guidelines are not expected to hold when observations are not independent but are assumed so, as is usually the case in phylogenetics.
Akaike weights are very useful for assessing model-selection uncertainty without having to use computer intensive methods like Monte Carlo simulation or bootstrapping (Buckland et al., 1997; see Buckley et al., 2002, for an example). We can establish a 95% confidence set of models for the best K-L model by summing the Akaike weights from largest to smallest until the sum is just 0.95; the corresponding subset of models is a type of confidence set on the best K-L model (Burnham and Anderson, 1998, pp. 169–171; 2003). We can also assess the relative likelihoods of model i versus model j as simply the ratio of the two Akaike weights, which are called evidence ratios (Anderson et al., 2000; Burnham and Anderson, 2003, pp. 77–79). Techniques exist to compare whether two AICs differ significantly (Linhart, 1988; Shimodaira, 1997; Vuong, 1989), and multiple comparison techniques can be used to construct a confidence set of models that minimize the sampling error of the AIC (Shimodaira, 1998). Such techniques have already been proposed to construct a confidence sets of trees (Shimodaira, 2001; Shimodaira and Hasegawa, 1999).
(Burnham and Anderson, 2003, p. 76). However, the above is not a true Bayesian approach, because these priors only refer to the model, and not to the prior probability distribution of the parameters of the model. Neither do these priors refer to the belief that Mi is the true model, but rather to the belief that model Mi is the best K-L model for the data (Burnham and Anderson, 1998, 2003). Usually ρi is set to 1/R for every model.
Model Averaging with the AIC
Again, the caveats described above about interpreting model-averaged parameter estimates apply. Likewise, it is again easy to estimate the relative importance of any parameter by summing the Akaike weights across all models that include the parameters we are interested in. For example, the relative importance of the substitution rate between adenine and cytosine across all candidate models is simply the denominator above, w+ (ϕA − C).
Model-Averaged Estimation of Phylogenies
As discussed above, model averaging can also be applied to the estimation of phylogenetic trees (Posada, 2003). This can be easily accomplished in programs like PAUP* (Swofford, 1998), and perhaps the only limitation is the time we want to dedicate to the analysis. We start by estimating a tree for each candidate model and then build a consensus tree using model weights as tree weights (these model weights can be Akaike weights, BIC weights, or model likelihoods from a Bayesian analysis) (see Jermiin et al., 1997). In a Bayesian framework one could also directly obtain a model-averaged estimate of phylogeny by using reversible-jump MCMC, an algorithm that moves through both parameter and model space (Green, 1995), and very recently implemented by Huelsenbeck et al. (2004), for phylogenetic model selection. It is also interesting to note that the AIC and Bayesian approaches allow for the direct comparison of trees estimated under different models because likelihoods calculated on different trees and on different models are comparable (e.g., ML-JC69 versus ML-HKY) In this sense, the AIC has already been used as an extension of the likelihood optimality criterion for phylogenetic estimation (Kishino and Hasegawa, 1989; Ogishima et al., 2000; Sober, 2002b; Sober and Steel, 2002; Tanaka et al., 1999), and nothing prevents the BIC from also being considered as another phylogenetic criterion. Posterior probabilities for different trees inferred under different models are also directly comparable if they fall under the same posterior distribution.
We have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001). This alignment contains 1927 sites, 301 of which are variable. We took three approaches to selecting the best-fit model. First, we optimized the likelihood and model parameters for the 56 substitution models currently implemented in the program Modeltest (Posada and Crandall, 1998) on a neighbor-joining tree estimated from Jukes and Cantor (1969) distances. We then used the AIC and AICc to select the best-fit model from these likelihoods. Second, we took these model parameters and performed a tree search under each of the 56 models so as to find the tree with the highest likelihood under each of these optimized models. Again, the AIC and AICc was used to chose the best-fit model. The second approach is superior to the first approach because it involves a more thorough search for the maximum likelihood under each model; however, the computational burden is much greater. Third, we also used the specific hLRT strategy implemented in Modeltest (Posada and Crandall, 1998). From the likelihood values we calculated AICc values, Akaike weights, the relative importance of different parameters, and model averaged estimates of parameters and topology. In addition, we performed a bootstrap analysis on the data using the best AICc model with 500 replicates. All tree searches used five random addition replicates followed by TBR branch swapping. All likelihood calculations and tree searches were performed using PAUP*4.0b10 (Swofford, 2000).
Examining the AICc values and Akaike weights for the models optimized on the NJ tree we immediately observe that only 11 out of the 56 models received noticeable support from the data (Table 1). Importantly, this confidence set of models, and the ranking of models within this set is almost identical to that obtained from optimizing the topology (data not shown) (see also Nylander, 2004). All of the supported models incorporated the gamma distribution for among site rate variation and the best-supported models also included a proportion of invariable sites. Models that assumed equal base frequencies fitted the data poorly and received essentially no support (i.e., their Akaike weights are close to zero). The TN93+I+Γ model had the smallest AICc value, but there was considerable uncertainty in identifying the most appropriate number of different substitution rates between nucleotides. The Akaike weights calculated from the AICc values were very similar to those calculated from the AIC. This is because the n/K ratio, 37.14, is close to the value of 40, which Burnham and Anderson (2003, p. 66) recommend as the cut-off for preferring AICc. Indeed, when n/K is relatively large the AICc converges back to the AIC, and so it is still appropriate to use the AICc instead of the AIC. The hLRT approach led to selection of the HKY +I+Γ model, which only received an AICc weight of 0.1692 (Table 1), but was contained within the 95% AIC confidence set of models. The ML tree under the HKY+I+Γ model differs by a symmetrical distance (Foulds et al., 1979) of 4 and 5 from the two trees estimated under the TN93+I+Γ model.
In total 23 unique tree topologies were estimated from all of the models; however, only 8 unique topologies were contained in the set of trees that were estimated from models that received greater than or equal to 0.00001 support from the AICc weights. Some tree searches under the among-site rate variation models recovered two topologies, where one of these topologies had an internal branch collapsed to zero length. The weighted AICc consensus topology (Fig. 5A) was almost identical to the topology estimated under the best AICc model (TN93+I+Γ) (Fig. 5B), but due to the model selection uncertainty there is considerable ambiguity in selecting the best point estimate of topology for these data. The bootstrap analysis under the best AICc model indicates that the nodes that are not supported under all of the models also have low bootstrap support (Fig. 5). This observation is important because it suggests that in this case if we had ignored model selection uncertainty our conclusion as to what hypotheses were well supported by the data would be the same. It is worth mentioning that the numbers above branches in Figure 5A describe the uncertainty of branches due to uncertainty on the models of molecular evolution. This is in contrast with the bootstrap values in Figure 5B, which describe uncertainty due to the stochasticity of molecular evolution. The former numbers can be regarded as “bootstrap proportions” obtained by resampling models with probabilities proportional to the Akaike weights. The phylogenetic relationships among the Ohomopterus carabid beetles are very similar to those estimated by Sota and Vogler (2001) using maximum parsimony.
We examined the association between pairwise AICc differences and pairwise tree distances (Foulds et al., 1979) for the 11 models included in the 99% confidence set (Fig. 6). This relationship shows a weak but significant correlation (r2 = 0.2394; P = 0.00015) between the improvement of fit of a model to the data and differences in topology. This graph supports, to a limited extent, the intuition that models with similar fits to the data tend to support similar trees.
The model averaged parameter estimates are very similar to the maximum likelihood estimates under the best-fit models (Table 2) because models with similar likelihoods, and thus low AIC differences tend to result in similar parameter estimates. The variability between the model averaged and best-fit model parameter estimates is unlikely to have a large effect on estimation of topology. The greatest variability between the model averaged parameter and best-fit model parameter estimates is observed for the transversion rate parameters. This is not surprising given that relatively few transversions have occurred in these data and therefore there is not much information from which to gain stable estimates.
Parameter . | Model-averaged estimate . | AICc model estimate . | hLRT model estimate . |
---|---|---|---|
πA | 0.3330 | 0.3342 | 0.3303 |
πC | 0.0683 | 0.0667 | 0.0725 |
πG | 0.1362 | 0.1369 | 0.1335 |
πT | 0.4625 | 0.4622 | 0.4637 |
κ | 14.8483 | 14.8476 | 14.8476 |
ϕA − C | 0.6290 | 1.0 | — |
ϕA − G | 13.4111 | 13.1823 | — |
ϕA − T | 1.0536 | 1.0 | — |
ϕC − G | 0.4189 | 1.0 | — |
ϕC − T | 20.0553 | 19.7583 | — |
α | 0.1011 | — | — |
α(I+Γ) | 0.7149 | 0.7658 | 0.5849 |
pinv(I+Γ) | 0.6874 | 0.7038 | 0.6644 |
Parameter . | Model-averaged estimate . | AICc model estimate . | hLRT model estimate . |
---|---|---|---|
πA | 0.3330 | 0.3342 | 0.3303 |
πC | 0.0683 | 0.0667 | 0.0725 |
πG | 0.1362 | 0.1369 | 0.1335 |
πT | 0.4625 | 0.4622 | 0.4637 |
κ | 14.8483 | 14.8476 | 14.8476 |
ϕA − C | 0.6290 | 1.0 | — |
ϕA − G | 13.4111 | 13.1823 | — |
ϕA − T | 1.0536 | 1.0 | — |
ϕC − G | 0.4189 | 1.0 | — |
ϕC − T | 20.0553 | 19.7583 | — |
α | 0.1011 | — | — |
α(I+Γ) | 0.7149 | 0.7658 | 0.5849 |
pinv(I+Γ) | 0.6874 | 0.7038 | 0.6644 |
Parameter . | Model-averaged estimate . | AICc model estimate . | hLRT model estimate . |
---|---|---|---|
πA | 0.3330 | 0.3342 | 0.3303 |
πC | 0.0683 | 0.0667 | 0.0725 |
πG | 0.1362 | 0.1369 | 0.1335 |
πT | 0.4625 | 0.4622 | 0.4637 |
κ | 14.8483 | 14.8476 | 14.8476 |
ϕA − C | 0.6290 | 1.0 | — |
ϕA − G | 13.4111 | 13.1823 | — |
ϕA − T | 1.0536 | 1.0 | — |
ϕC − G | 0.4189 | 1.0 | — |
ϕC − T | 20.0553 | 19.7583 | — |
α | 0.1011 | — | — |
α(I+Γ) | 0.7149 | 0.7658 | 0.5849 |
pinv(I+Γ) | 0.6874 | 0.7038 | 0.6644 |
Parameter . | Model-averaged estimate . | AICc model estimate . | hLRT model estimate . |
---|---|---|---|
πA | 0.3330 | 0.3342 | 0.3303 |
πC | 0.0683 | 0.0667 | 0.0725 |
πG | 0.1362 | 0.1369 | 0.1335 |
πT | 0.4625 | 0.4622 | 0.4637 |
κ | 14.8483 | 14.8476 | 14.8476 |
ϕA − C | 0.6290 | 1.0 | — |
ϕA − G | 13.4111 | 13.1823 | — |
ϕA − T | 1.0536 | 1.0 | — |
ϕC − G | 0.4189 | 1.0 | — |
ϕC − T | 20.0553 | 19.7583 | — |
α | 0.1011 | — | — |
α(I+Γ) | 0.7149 | 0.7658 | 0.5849 |
pinv(I+Γ) | 0.6874 | 0.7038 | 0.6644 |
Not all model parameters have the same importance for this data set (Table 3). The alpha shape parameter from the gamma distribution of among-site rate variation and the base frequency parameters have a relative importance of 1.0 because they appear in all of the supported models. The proportion of invariable sites is also a very important parameter although a few models with low weight without this parameter are supported. This observation suggests that these properties of the evolutionary process are very important for obtaining a good model fit. The ϕA − G and ϕC − T substitution rate parameters have higher relative importance values that the transversion parameters. This indicates that for these data it is important to allow the two transition types to have different rates, more so than the transversion types. The results shown in Table 2 make sense in light of our current knowledge of the dynamics of animal mitochondrial DNA evolution (e.g., Brown et al. 1982; Tamura and Nei 1993; Buckley et al. 2001a).
. | wi . | πA . | πC . | πG . | πT . | κ . | ϕA − C . | ϕA − G . | ϕA − T . | ϕC − G . | ϕC − T . | ϕG − T . | α . | pinv . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TN93+I+Γ | 0.5221 | • | • | • | • | • | • | • | • | |||||
TIM+I+Γ | 0.1913 | • | • | • | • | • | • | • | • | |||||
HKY85+I+Γ | 0.1692 | • | • | • | • | • | • | • | ||||||
K81uf+I+Γ | 0.0642 | • | • | • | • | • | • | |||||||
GTR+I+Γ | 0.0344 | • | • | • | • | • | • | • | • | • | • | • | • | |
TVM+I+Γ | 0.0165 | • | • | • | • | • | • | • | • | • | • | |||
TN93+Γ | 0.0011 | • | • | • | • | • | • | • | ||||||
HKY85+Γ | 0.0005 | • | • | • | • | • | • | |||||||
TIM+Γ | 0.0004 | • | • | • | • | • | • | • | ||||||
K81uf+Γ | 0.0002 | • | • | • | • | • | ||||||||
GTR+Γ | 0.0001 | • | • | • | • | • | • | • | • | • | • | • | ||
Relative parameter importance | 1.0 | 1.0 | 1.0 | 1.0 | 0.170 | 0.051 | 0.749 | 0.051 | 0.051 | 0.749 | 0.051 | 1.0 | 0.997 |
. | wi . | πA . | πC . | πG . | πT . | κ . | ϕA − C . | ϕA − G . | ϕA − T . | ϕC − G . | ϕC − T . | ϕG − T . | α . | pinv . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TN93+I+Γ | 0.5221 | • | • | • | • | • | • | • | • | |||||
TIM+I+Γ | 0.1913 | • | • | • | • | • | • | • | • | |||||
HKY85+I+Γ | 0.1692 | • | • | • | • | • | • | • | ||||||
K81uf+I+Γ | 0.0642 | • | • | • | • | • | • | |||||||
GTR+I+Γ | 0.0344 | • | • | • | • | • | • | • | • | • | • | • | • | |
TVM+I+Γ | 0.0165 | • | • | • | • | • | • | • | • | • | • | |||
TN93+Γ | 0.0011 | • | • | • | • | • | • | • | ||||||
HKY85+Γ | 0.0005 | • | • | • | • | • | • | |||||||
TIM+Γ | 0.0004 | • | • | • | • | • | • | • | ||||||
K81uf+Γ | 0.0002 | • | • | • | • | • | ||||||||
GTR+Γ | 0.0001 | • | • | • | • | • | • | • | • | • | • | • | ||
Relative parameter importance | 1.0 | 1.0 | 1.0 | 1.0 | 0.170 | 0.051 | 0.749 | 0.051 | 0.051 | 0.749 | 0.051 | 1.0 | 0.997 |
. | wi . | πA . | πC . | πG . | πT . | κ . | ϕA − C . | ϕA − G . | ϕA − T . | ϕC − G . | ϕC − T . | ϕG − T . | α . | pinv . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TN93+I+Γ | 0.5221 | • | • | • | • | • | • | • | • | |||||
TIM+I+Γ | 0.1913 | • | • | • | • | • | • | • | • | |||||
HKY85+I+Γ | 0.1692 | • | • | • | • | • | • | • | ||||||
K81uf+I+Γ | 0.0642 | • | • | • | • | • | • | |||||||
GTR+I+Γ | 0.0344 | • | • | • | • | • | • | • | • | • | • | • | • | |
TVM+I+Γ | 0.0165 | • | • | • | • | • | • | • | • | • | • | |||
TN93+Γ | 0.0011 | • | • | • | • | • | • | • | ||||||
HKY85+Γ | 0.0005 | • | • | • | • | • | • | |||||||
TIM+Γ | 0.0004 | • | • | • | • | • | • | • | ||||||
K81uf+Γ | 0.0002 | • | • | • | • | • | ||||||||
GTR+Γ | 0.0001 | • | • | • | • | • | • | • | • | • | • | • | ||
Relative parameter importance | 1.0 | 1.0 | 1.0 | 1.0 | 0.170 | 0.051 | 0.749 | 0.051 | 0.051 | 0.749 | 0.051 | 1.0 | 0.997 |
. | wi . | πA . | πC . | πG . | πT . | κ . | ϕA − C . | ϕA − G . | ϕA − T . | ϕC − G . | ϕC − T . | ϕG − T . | α . | pinv . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TN93+I+Γ | 0.5221 | • | • | • | • | • | • | • | • | |||||
TIM+I+Γ | 0.1913 | • | • | • | • | • | • | • | • | |||||
HKY85+I+Γ | 0.1692 | • | • | • | • | • | • | • | ||||||
K81uf+I+Γ | 0.0642 | • | • | • | • | • | • | |||||||
GTR+I+Γ | 0.0344 | • | • | • | • | • | • | • | • | • | • | • | • | |
TVM+I+Γ | 0.0165 | • | • | • | • | • | • | • | • | • | • | |||
TN93+Γ | 0.0011 | • | • | • | • | • | • | • | ||||||
HKY85+Γ | 0.0005 | • | • | • | • | • | • | |||||||
TIM+Γ | 0.0004 | • | • | • | • | • | • | • | ||||||
K81uf+Γ | 0.0002 | • | • | • | • | • | ||||||||
GTR+Γ | 0.0001 | • | • | • | • | • | • | • | • | • | • | • | ||
Relative parameter importance | 1.0 | 1.0 | 1.0 | 1.0 | 0.170 | 0.051 | 0.749 | 0.051 | 0.051 | 0.749 | 0.051 | 1.0 | 0.997 |
Lastly, model averaging could also be applied to other problems in evolutionary biology in which inferences can be drawn from several models, for example as in the detection of positive selection from sequence alignments (Yang et al., 2000), and the estimation of divergence times using relaxed molecular clocks (Aris-Brosou and Yang, 2002), where different models can frequently yield different results.
Philosophical Considerations on Model Selection
There is still an important philosophical debate about model selection in general (Burnham and Anderson, 1998, 2003; Forster and Sober, 1994, 2004; Forster, 2000, 2001; Kass and Raftery, 1995; Kieseppä, 2002; Myrvold and Harper, 2002; Popper, 1959; Sober, 2002a; Wasserman, 2000), and here we do not attempt to address all the issues, but just those we think are most relevant. The information-theoretic and the Bayesian approaches represent different philosophical approaches to the problem of model selection (Forster and Sober, 1994; Kuha, 2003; Sober, 2002a). The AIC is designed to choose the model that best approximates reality. The conclusions of AIC are never about the truth or falsity of a hypothesis, but about its closeness to the truth (Forster and Sober, 2004). On the other hand, Bayesian approaches are designed to identify the true model, given the data. Both the AIC and Bayesian approaches have been criticized on different grounds.
That Bayesian approaches are designed to identify the true model can be surprising when surely we know that all models of evolution are false (i.e., their probability is zero). The standard interpretation of P(Mi|D) is that it is the probability that Mi is the true model given the data, even though we know that this statement is false a priori (Gelfand, 1996). A common response to this criticism is that we can hope that at least one of the models is approximately true, and that the posterior distributions allows us to compare the relative merits of the models (Wasserman 2000). On the other hand, it has been argued that the derivation of the BIC does not require that the true model is contained within the set of candidate models (Burnham and Anderson, 2003, pp. 293–295; Cavanaugh and Neath, 1999). Interestingly, it is possible to obtain the AIC as a Bayesian result if a particular prior (the so called K-L prior) is used with the BIC (Burnham and Anderson, 2003, pp. 302–305).
It has been alleged in the statistical literature that, under certain conditions, the BIC is statistically consistent (it does converge to truth as more data is added), whereas the AIC is not (but see Bozdogan, 1987; Findley, 1991; Keuzenkamp and McAleer, 1995; Nishii, 1984, 1988; Shibata, 1986; Woodroofe, 1982) but the relevance of statistical consistency in this context is not clear (Forster, 2002).
We can think of a model as a set or family of sharp hypotheses. For example, the K80 model contains all hypotheses representing different values of the transition/transversion parameter, κ. The JC69 model, however, contains only one hypothesis, as all its parameters are fixed (equal base frequencies and equal rates for transitions or transversions). The AIC and the BIC work with maximized likelihoods, and therefore they are comparing the best point hypothesis within each model. However, it might be unwise to compare models based only on the merits of a single point, even if this point is optimal, and that is why Bayesians prefer models for which the sum of the likelihoods of all contained point hypotheses is largest (Holder and Lewis, 2003).
Which Model Selection Method is Best for Phylogenetics?
The use of different model selection strategies may lead to the selection of different models of evolution (Posada and Crandall, 2001a), and we know that model choice affects all aspects of phylogenetic analysis. Here we have attempted to compare different model selection strategies from a theoretical and practical point of view, in the context of phylogenetics. Previous Monte Carlo simulations on the performance of model selection in phylogenetics (Posada, 2001; Posada and Crandall, 2001b) showed that these methods work well when the aim is to identify the generating model. However, these simulations missed the point that the true model of evolution will never be one of the candidate models. It would be more useful to generate data from a model much more complex than any of the candidate models, and then study how well the selected models approximate this complex generating model (e.g., Minin et al., 2003). Clearly, we should seek models that are good approximations to the truth and from which therefore we can make valid inferences concerning the real process of molecular evolution. Too often we read expressions like “The best-fit model was selected with the program Modeltest” without any reference to which model selection strategy was used (in this case, hLRT or AIC). When a method of model selection is used, this should be explicitly reported.
From the discussion above it should be clear that the Bayesian and AIC approaches present several important advantages over the hLRTs for model selection (see also Table 4). Namely, they are able to simultaneously compare multiple nested or nonnested models (see Chamberlain, 1890), account for model selection uncertainty, and allow for model-averaged inference. Although model selection uncertainty tools do not exist within the standard hLRTs framework, there are extensions of the LRT framework that allow for the specification of confidence sets of models. Evidence for a model can be also estimated by the “expected likelihood weights” (Strimmer, 2001; Strimmer and Rambaut, 2001). Criteria like the AIC or BIC are very simple to calculate from the maximum likelihood estimate, although they do rely on point estimates and do not take in account topological uncertainty (Bollback, 2002). The importance of the later effect has yet to be examined (but see Posada and Crandall, 2001b), as well as the potential impact of comparing models with parameters fixed at the boundary of their ranges (e.g., α = ∝) in the AIC and BIC.
Good properties for model selection methods . | hLRT . | Bayesian . | AIC . |
---|---|---|---|
Applies easily to nonnested models | No | Yes | Yes |
Allows for the simultaneous comparison of multiple models | No | Yes | Yes |
Does not depend on a subjective significance level | No | Yes§ | Yes |
Incorporates topological uncertainty | No | Yes* | No |
Easy to compute | Yes | No* | Yes |
Assesses model selection uncertainty | No | Yes | Yes |
Allows model averaging | No | Yes | Yes |
Provides the possibility of specifying prior information for models | No | Yes* | Yes |
Provides the possibility of specifying prior information for model parameters | No | Yes* | No |
Designed to approximate, rather than to identify, truth | No | No | Yes |
Good properties for model selection methods . | hLRT . | Bayesian . | AIC . |
---|---|---|---|
Applies easily to nonnested models | No | Yes | Yes |
Allows for the simultaneous comparison of multiple models | No | Yes | Yes |
Does not depend on a subjective significance level | No | Yes§ | Yes |
Incorporates topological uncertainty | No | Yes* | No |
Easy to compute | Yes | No* | Yes |
Assesses model selection uncertainty | No | Yes | Yes |
Allows model averaging | No | Yes | Yes |
Provides the possibility of specifying prior information for models | No | Yes* | Yes |
Provides the possibility of specifying prior information for model parameters | No | Yes* | No |
Designed to approximate, rather than to identify, truth | No | No | Yes |
Not the BIC.
In a sense, the interpretation of Bayes factors could be considered as subjective.
Good properties for model selection methods . | hLRT . | Bayesian . | AIC . |
---|---|---|---|
Applies easily to nonnested models | No | Yes | Yes |
Allows for the simultaneous comparison of multiple models | No | Yes | Yes |
Does not depend on a subjective significance level | No | Yes§ | Yes |
Incorporates topological uncertainty | No | Yes* | No |
Easy to compute | Yes | No* | Yes |
Assesses model selection uncertainty | No | Yes | Yes |
Allows model averaging | No | Yes | Yes |
Provides the possibility of specifying prior information for models | No | Yes* | Yes |
Provides the possibility of specifying prior information for model parameters | No | Yes* | No |
Designed to approximate, rather than to identify, truth | No | No | Yes |
Good properties for model selection methods . | hLRT . | Bayesian . | AIC . |
---|---|---|---|
Applies easily to nonnested models | No | Yes | Yes |
Allows for the simultaneous comparison of multiple models | No | Yes | Yes |
Does not depend on a subjective significance level | No | Yes§ | Yes |
Incorporates topological uncertainty | No | Yes* | No |
Easy to compute | Yes | No* | Yes |
Assesses model selection uncertainty | No | Yes | Yes |
Allows model averaging | No | Yes | Yes |
Provides the possibility of specifying prior information for models | No | Yes* | Yes |
Provides the possibility of specifying prior information for model parameters | No | Yes* | No |
Designed to approximate, rather than to identify, truth | No | No | Yes |
Not the BIC.
In a sense, the interpretation of Bayes factors could be considered as subjective.
The possibility of inferring model-averaging phylogenies will eliminate some of the criticisms that model-based methods are contingent on the single best-fit model selected. Obviously, the methods described above can facilitate model-averaged hypothesis testing, as one could test for the monophyly of a group by considering all models available. Sanderson and Kim (2000) already hinted at the possibility of model-averaging phylogenies, but claimed that such a composite solution would be computationally prohibitive. However, this computational burden will depend on the size of the data set (especially on the number of taxa) and the number of models considered (but one could work with the 95% confidence or credible set of models), and in some cases it will certainly be feasible.
Selecting a set of candidate models is not easy; there are 203 “standard” time-reversible models of nucleotide substitution, but model selection in phylogenetics is commonly limited to a subset of these (Huelsenbeck et al., 2004). Indeed, evaluating a large number of models is more problematic for the hLRT than for the AIC and Bayesian approaches for the reasons explained above. The implications of conditioning model selection on a subset of the possible set of models is currently unknown.
Selection bias (Zucchini, 2000) may occur when the number of candidate models is large. In such cases random fluctuations in the data will increase the score of some models more than others and therefore the chance that the best model won for spurious reasons increases. Indeed, the set of candidate models influences model choice, and a careful a priori selection of candidate models is very important.
Both in the AICc and the BIC descriptions above, the total number of characters was used as an estimate of sample size. However, effective sample sizes in phylogenetic studies are poorly understood, and depend on the quantity of interest (Churchill et al., 1992; Goldman, 1998; Morozov et al., 2000). Characters in an alignment will often not be independent, so using the total number of characters as a surrogate for sample size (Minin et al., 2003; Posada and Crandall, 2001b) could be an overestimate. Using only the number of variable sites as an estimate of sample size is a more conservative approach, but could be an underestimate (note that all sites are used when estimating base frequencies or the proportion of invariable sites). Indeed, sample size also depends on the number of taxa. Importantly, sample size can have an effect on the outcome of model selection with the AICc. In our example above, if we were to use the number of variable characters (301 sites) as the sample size, instead of the total number of characters (1927 sites), the best AICc model would not change, but the second and third AICc models would exchange their rankings. Furthermore, because the LRT, the AIC, and the BIC strategies rely on large sample asymptotics, it is also important to decide when a sample should be considered small. Although the AICc was derived under Gaussian assumptions, Burnham et al. (1994) found that this second order expression performed well in product multinomial models for open population capture-recapture. Burnham and Anderson (2003, p. 66) suggest using this correction when the sample size is small compared to the number of adjustable parameters, n/K < 40. Alternatively, and because AICc converges to the AIC with increasing n/K ratios, one could always use the AICc (D. Anderson, personal communications). Phylogenetic characters are mostly discrete, and the unconstrained model in phylogenetics is multinomial (Goldman, 1993). One may think of an alignment of nucleotide characters as a large and sparse contingency table with 4T bins, where T is the number of taxa. For large sample asymptotics to hold in a contingency table every cell should contain, in general, more than 5 observations (see Agresti, 1990, p. 49, 244–250), which gives a rule of thumb of n/4T > 5. Clearly, more research is needed on sample size in phylogenetics.
Other model selection methods exist, like cross-validation and the bootstrap (see Browne, 2000; Efron and Tibshirani, 1993; Linhart and Zucchini, 1986), but they seem too time-consuming—note that cross validation is asymptotically equivalent to the AIC (Stone, 1977)—for the selection of substitution models. There is an important role for more general tests of model fit and accuracy within the process of model selection. For example, tests of base frequency stationarity (Rzhetsky and Nei, 1995; Van Den Bussche et al., 1998) should be standard before a phylogenetic analysis. In addition, the global tests of Goldman (1993) and Bollback (2001) are useful for detecting model misspecification. When tests such as these indicate that the final model selected still does not fit the data well, our results must be interpreted with caution as the possibility remains that some vital evolutionary process has not been accounted for, which could potentially be misleading.
Model selection is a useful tool for research, but it is not a substitute for careful thinking and common sense reasoning (Browne, 2000). There are examples in the phylogenetic literature where the best-fit models have led to phylogenetic estimates that are clearly incorrect (Buckley and Cunningham, 2002; Posada and Crandall, 2001c). Consideration of model selection uncertainty and multimodel inference should lead to equal or better estimates of phylogenies and substitution parameters, and we should see more applications of these ideas in the future (see also Nylander, 2004). Computation of AIC differences, Akaike weights, model-averaged estimates, and relative parameter importance is currently implemented in the program Modeltest (Posada and Crandall, 1998). Further developments will allow for the simultaneous use of different models for different partitions of the data (Nylander et al., 2004; Pupko et al., 2002; Suchard et al., 2003a; Yang, 1996b). It is now time to start thinking about how we will select those. Model selection in phylogenetics is indeed still an open area for research (Huelsenbeck et al., 2002).
Occam's (ca. 1280–1349) parsimony principle or Occam's razor was stated as “Pluralitas non est ponenda sine necessitate,” which translates literally into English as “plurality should not be posited without necessity.”
For continuous functions.
Acknowledgements
We are undoubtedly indebted to Kenneth Burnham and David Anderson for their enlightening book. David Anderson, Elliot Sober, and Carsten Wiuf provided very insightful comments on the manuscript. Robert Weiss, Janet Sinsheimer, Paul Lewis, Paul Joyce, Hidetoshi Shimodaira, and Rissa Ota helped clarify some ideas on Bayesian model selection. Nick Goldman and two anonymous referees provided useful comments on a first version. Jeff Thorne, Hirohisa Kishino, and two anonymous referees provide very valuable comments that considerably improved the manuscript. Thanks to David Swofford and Jack Sullivan for many valuable conversations on model selection throughout the years. DP was funded by the Spanish Ministry of Science and Technology, while funding for TRB was provided by the New Zealand Foundation for Research, Science, and Technology.