『Computational Molecular Evolution』

Ziheng Yang

(2006年12月刊行,Oxford University Press[Oxford Series in Ecology and Evolution], ISBN:0198566999 [hbk] / ISBN:0198567022 [pbk])



【目次】
Preface

Part I: Modeling Molecular Evolution 1

CHAPTER 1 Models of Nucleotide Substitution 1

1.1 Introduction 3
1.2 Markov Models of Nucleotide Substitution and Distance Estimation 4
 1.2.1 The JC69 Model 4
 1.2.2 The K80 Model 10
 1.2.3 HKY85, F84, TN93 etc. 11
 1.2.4 The Transition/Transversion Rate Ratio 17
1.3 Variable Substitution Rates Across Sites 18
1.4 Maximum Likelihood Estimation 22
 1.4.1 The JC69 Model 22
 1.4.2 The K80 Model 25
 *1.4.3 Profile and Integrated Likelihood Methods 27
1.5 Markov Chains and Distance Estimation under General Models 30
 1.5.1 General Theory 30
 1.5.2 The General Time-Reversible (GTR) Model 33
1.6 Discussions 37
 1.6.1 Distance estimation under different substitution model 37
 1.6.2 Limitations of pairwise comparison 37
1.7 Exercises 38

CHAPTER 2 Models of Amino Acid and Codon Substitution 40

2.1 Introduction 40
2.2 Models of Amino Acid Replacement 40
 2.2.1 Empirical Models 40
 2.2.2 Mechanistic Models 43
 2.2.3 Among-Site Heterogeneity 44
2.3 Estimation of Distance Between Two Protein Sequences 45
 2.3.1 The Poisson model 45
 2.3.2 Empirical models 46
 2.3.2 Gamma Distances 46
 2.3.3 Example: Distance between Cat and Rabbit p53 Genes 47
2.4 Models of Codon Substitution 48
2.5 Estimation of Synonymous and Nonsynonymous Substitution Rates 49
 2.5.1 Counting Methods 50
 2.5.2 Maximum Likelihood Method 58
 2.5.3 Comparison of Methods 61
 *2.5.4 Interpretation and a Plethora of Distances 62
*2.6 Numerical Calculation of the Transition-Probability Matrix 68
2.7 Exercises 70

Part II: Phylogeny Reconstruction 71

CHAPTER 3 PHYLOGENY RECONSTRUCTION: Overview 73

3.1 Tree Concepts 73
 3.1.1 Terminology 73
 3.1.2 Topological Distance Between Trees 77
 3.1.3 Consensus Trees 79
 3.1.4 Gene Trees and Species Trees 80
 3.1.5 Classification of Tree-Reconstruction Methods 81
3.2 Exhaustive and Heuristic Tree Search 82
 3.2.1 Exhaustive Tree Search 82
 3.2.2 Heuristic Tree Search 83
 3.2.3 Branch Swapping 84
 3.2.4 Local Peaks in the Tree Space 87
 3.2.5 Stochastic Tree Search 89
3.3 Distance Methods 89
 3.3.1 Least Squares Method 90
 3.3.2 Neighbor-Joining Method 92
3.4 Maximum Parsimony 93
 3.4.1 Brief History 93
 3.4.2 Counting the Minimum Number of Changes Given the Tree 94
 3.4.3 Weighted Parsimony and Transversion Parsimony 95
 3.4.4 Long-Branch Attraction 98
 3.4.5 Assumptions of Parsimony 99

CHAPTER 4 Maximum Likelihood Methods 100

4.1 Introduction 100
4.2 Likelihood Calculation on Tree 100
 4.2.1 Data, Model, Tree, and Likelihood 100
 4.2.2 The Pruning Algorithm 102
 4.2.3 Time Reversibility, the Root of Tree and the Molecular Clock 106
 4.2.4 Missing Data and Alignment Gaps 107
 4.2.5 An Numerical Example: Phylogeny of Apes 108
4.3 Likelihood Calculation under More-Complex Models 109
 4.3.1 Models of Variable Rates Among Sites 110
 4.3.2 Models for Combined Analysis of Multiple Data Sets 116
 4.3.3 Nonhomogeneous and Nonstationary Models 118
 4.3.4 Amino Acid and Codon Models 119
4.4 Reconstruction of Ancestral States 119
 4.4.1 Overview 119
 4.4.2 Empirical and Hierarchical Bayes Reconstruction 121
 4.4.3 Discrete Morphological Characters 124
 4.4.4 Systematic Biases in Ancestral Reconstruction 126
*4.5 Numerical Algorithms for Maximum Likelihood Estimation 128
 4.5.1 Univariate Optimization 129
 4.5.2 Multivariate Optimization 131
 4.5.3 Optimization on a Fixed Tree 134
 4.5.4 Multiple Local Peaks on the Likelihood Surface for a Fixed Tree 135
 4.5.5 Search for the Maximum Likelihood Tree 136
4.6 Approximations to Likelihood 137
4.7 Model Selection and Robustness 137
 4.7.1 LRT, AIC, and BIC 137
 4.7.2 Model Adequacy and Robustness 142
4.8 Exercises 144

CHAPTER 5 Bayesian Methods 145

5.1 The Bayesian Paradigm 145
 5.1.1 Overview 145
 5.1.2 Bayes Theorem 146
 5.1.3 Classical versus Bayesian Statistics 151
5.2 Prior 158
5.3 Markov Chain Monte Carlo 159
 5.3.1 Monte Carlo Integration 160
 5.3.2 Metropolis-Hastings Algorithm 161
 5.3.3 Single-Component Metropolis-Hastings Algorithm 164
 5.3.4 Gibbs Sampler 166
 5.3.5 Metropolis-Coupled MCMC (MCMCMC) 166
5.4 Simple Moves and Their Proposal Ratios 167
 5.4.1 Sliding Window Using Uniform Proposal 168
 5.4.2 Sliding Window Using Normal Proposal 168
 5.4.3 Sliding Window Using Multivariate Normal Proposal 169
 5.4.4 Proportional Shrinking and Expanding 170
5.5 Monitoring Markov Chains and Processing Output 171
 5.5.1 Validating and Diagnosing MCMC Algorithms 171
 5.5.2 Potential Scale Reduction Statistic 173
 5.5.3 Processing Output 174
5.6 Bayesian Phylogenetics 174
 5.6.1 Brief History 174
 5.6.2 General Framework 175
 5.6.3 Summarizing MCMC Output 175
 5.6.4 Bayesian versus Likelihood 177
 5.6.5 A Numerical Example: Phylogeny of Apes 180
5.7 MCMC Algorithms under the Coalescent Model 181
 5.7.1 Overview 181
 5.7.2 Estimation of θ 181
5.8 Exercises 184

CHAPTER 6 Comparison of Methods and Tests on Trees 185

6.1 Statistical Performance of Tree-Reconstruction Methods 186
 6.1.1 Criteria 186
 6.1.2 Performance 186
6.2 Likelihood 188
 6.2.1 Contrast with Conventional Parameter Estimation 190
 6.2.2 Consistency 191
 6.2.3 Efficiency 192
 6.2.4 Robustness 196
6.3 Parsimony 198
 6.3.1 Equivalence with Misbehaved Likelihood Models 198
 6.3.2 Equivalence with Well-Behaved Likelihood Models 201
 6.3.3 Assumptions and Justifications 204
6.4 Testing Hypotheses Concerning Trees 206
 6.4.1 Bootstrap 207
 6.4.2 Interior Branch Test 210
 6.4.3 Kishino-Hasegawa Test and Modifications 211
 6.4.4 Indexes Used in Parsimony Analysis 213
 6.4.5 Example: Phylogeny of Apes 214
*6.5 Appendix: Tuffley and Steel's Likelihood Analysis of One Character 215

Part III: Advanced Topics 221

CHAPTER 7 Molecular Clock and Estimation of Species Divergence Times 223

7.1 Overview 223
7.2 Tests of the Molecular Clock 225
 7.2.1 Relative Rate Tests 225
 7.2.2 Likelihood Ratio Test 226
 7.2.3 Limitations of the clock test 227
 7.2.4 Index of Dispersion 228
7.3 Likelihood Estimation of Divergence Times 228
 7.3.1 Global Clock Model 228
 7.3.2 Local-Clock Models 230
 7.3.3 Heuristic Rate-Smoothing Methods 231
 7.3.4 Dating Primate Divergences 233
 *7.3.5 Uncertainties in Fossils 235
7.4 Bayesian Estimation of Divergence Times 245
 7.4.1 General Framework 245
 7.4.2 Calculation of the Likelihood 246
 7.4.3 Prior on Rates 247
 7.4.4 Uncertainties in Fossils and Prior on Divergence Times 248
 7.4.5 Application to Primate and Mammalian Divergences 252
7.5 Perspectives 257

CHAPTER 8 Neutral and Adaptive Protein Evolution 259

8.1 Introduction 259
8.2 The Neutral Theory and Tests of Neutrality 260
 8.2.1 The Neutral and Nearly Neutral Theory 260
 8.2.2 Tajima's D statistic 262
 8.2.3 Fu and Li's D and Fay and Wu's H Statistics 264
 8.2.4 McDonald-Kreitman Test and Estimation of Selective Strength 265
 8.2.5 Hudson-Kreitman-Aquade Test 267
8.3 Lineages Undergoing Adaptive Evolution 268
 8.3.1 Heuristic Methods 268
 8.3.2 Likelihood Method 269
8.4 Amino Acid Sites Undergoing Adaptive Evolution 271
 8.4.1 Three Strategies 271
 8.4.2 Likelihood Ratio Test of Positive Selection under Random-Sites Models 273
 8.4.3 Identification of Sites under Positive Selection 276
 8.4.4 Positive Selection in the Human Major Histocompatability (MHC) Locus 276
8.5 Adaptive Evolution Affecting Particular Sites and Lineages 279
 8.5.1 Branch-Site Test of Positive Selection 279
 8.5.2 Other Similar Models 281
 8.5.3 Adaptive Evolution in Angiosperm Phytochromes 282
8.6 Assumptions, Limitations and Comparisons 284
 8.6.1 Limitations of Current Methods 284
 8.6.2 Comparison Between Tests of Neutrality and Tests Based on dN and dS 286
8.7 Adaptively Evolving Genes 286

CHAPTER 9 Simulating Molecular Evolution 293

9.1 Introduction 293
9.2 Random Number Generator 294
9.3 Generation of Continuous Random Variables 295
9.4 Generation of Discrete Random Variables 296
 9.4.1 Discrete Uniform Distribution 296
 9.4.2 Binomial Distribution 297
 9.4.3 General Discrete Distribution 297
 9.4.4 Multinomial Distribution 298
 9.4.5 The Composition Method for Mixture Distributions 298
 *9.4.6 The Alias Method for Sampling from a Discrete Distribution 299
9.5 Simulating Molecular Evolution 302
 9.5.1 Simulating Sequences on a Fixed Tree 302
 9.5.2 Generating Random Trees 305
9.6 Exercises 306

CHAPTER 10 Perspectives 308

10.1 Theoretical Issues in Phylogeny Reconstruction 308
10.2 Computational Issues in Analysis of Large and Heterogeneous Data Sets 309
10.3 Genome Rearrangement Data 309
10.4 Comparative Genomics 310

Appendixes 311

 A. Functions of Random Variables 311
 B. The Delta Technique 313
 C. Phylogenetics Software 316


References 319
Index 353