Analysis of protein-coding genetic variation in 60,706 humans

doi:10.1038/nature19057

. 2016 Aug 18;536(7616):285-91.

doi: 10.1038/nature19057.

Analysis of protein-coding genetic variation in 60,706 humans

Monkol Lek^{1

2

3

4}, Konrad J Karczewski^{1

2}, Eric V Minikel^{1

2

5}, Kaitlin E Samocha^{1

2

5

6}, Eric Banks², Timothy Fennell², Anne H O'Donnell-Luria^{1

2

7}, James S Ware^{2

8

9

10

11}, Andrew J Hill^{1

2

12}, Beryl B Cummings^{1

2

5}, Taru Tukiainen^{1

2}, Daniel P Birnbaum², Jack A Kosmicki^{1

2

6

13}, Laramie E Duncan^{1

2

6}, Karol Estrada^{1

2}, Fengmei Zhao^{1

2}, James Zou², Emma Pierce-Hoffman^{1

2}, Joanne Berghout^{14

15}, David N Cooper¹⁶, Nicole Deflaux¹⁷, Mark DePristo¹⁸, Ron Do^{19

20

21

22}, Jason Flannick^{2

23}, Menachem Fromer^{1

6

19

20

24}, Laura Gauthier¹⁸, Jackie Goldstein^{1

2

6}, Namrata Gupta², Daniel Howrigan^{1

2

6}, Adam Kiezun¹⁸, Mitja I Kurki^{2

25}, Ami Levy Moonshine¹⁸, Pradeep Natarajan^{2

26

27

28}, Lorena Orozco²⁹, Gina M Peloso^{2

27

28}, Ryan Poplin¹⁸, Manuel A Rivas², Valentin Ruano-Rubio¹⁸, Samuel A Rose⁶, Douglas M Ruderfer^{19

20

24}, Khalid Shakir¹⁸, Peter D Stenson¹⁶, Christine Stevens², Brett P Thomas^{1

2}, Grace Tiao¹⁸, Maria T Tusie-Luna³⁰, Ben Weisburd², Hong-Hee Won³¹, Dongmei Yu^{6

25

27

32}, David M Altshuler^{2

33}, Diego Ardissino³⁴, Michael Boehnke³⁵, John Danesh³⁶, Stacey Donnelly², Roberto Elosua³⁷, Jose C Florez^{2

26

27}, Stacey B Gabriel², Gad Getz^{18

26

38}, Stephen J Glatt^{39

40

41}, Christina M Hultman⁴², Sekar Kathiresan^{2

26

27

28}, Markku Laakso⁴³, Steven McCarroll^{6

8}, Mark I McCarthy^{44

45

46}, Dermot McGovern⁴⁷, Ruth McPherson⁴⁸, Benjamin M Neale^{1

2

6}, Aarno Palotie^{1

2

5

49}, Shaun M Purcell^{19

20

24}, Danish Saleheen^{50

51

52}, Jeremiah M Scharf^{2

6

25

27

32}, Pamela Sklar^{19

20

24

53

54}, Patrick F Sullivan^{55

56}, Jaakko Tuomilehto⁵⁷, Ming T Tsuang⁵⁸, Hugh C Watkins^{44

59}, James G Wilson⁶⁰, Mark J Daly^{1

2

6}, Daniel G MacArthur^{1

2}; Exome Aggregation Consortium

Collaborators, Affiliations

Collaborators

Exome Aggregation Consortium:
Monkol Lek, Konrad J Karczewski, Eric V Minikel, Kaitlin E Samocha, Eric Banks, Timothy Fennell, Anne H O'Donnell-Luria, James S Ware, Andrew J Hill, Beryl B Cummings, Taru Tukiainen, Daniel P Birnbaum, Jack A Kosmicki, Laramie E Duncan, Karol Estrada, Fengmei Zhao, James Zou, Emma Pierce-Hoffman, Joanne Berghout, David N Cooper, Nicole Deflaux, Mark DePristo, Ron Do, Jason Flannick, Menachem Fromer, Laura Gauthier, Jackie Goldstein, Namrata Gupta, Daniel Howrigan, Adam Kiezun, Mitja I Kurki, Ami Levy Moonshine, Pradeep Natarajan, Lorena Orozco, Gina M Peloso, Ryan Poplin, Manuel A Rivas, Valentin Ruano-Rubio, Samuel A Rose, Douglas M Ruderfer, Khalid Shakir, Peter D Stenson, Christine Stevens, Brett P Thomas, Grace Tiao, Maria T Tusie-Luna, Ben Weisburd, Hong-Hee Won, Dongmei Yu, David M Altshuler, Diego Ardissino, Michael Boehnke, John Danesh, Stacey Donnelly, Roberto Elosua, Jose C Florez, Stacey B Gabriel, Gad Getz, Stephen J Glatt, Christina M Hultman, Sekar Kathiresan, Markku Laakso, Steven McCarroll, Mark I McCarthy, Dermot McGovern, Ruth McPherson, Benjamin M Neale, Aarno Palotie, Shaun M Purcell, Danish Saleheen, Jeremiah M Scharf, Pamela Sklar, Patrick F Sullivan, Jaakko Tuomilehto, Ming T Tsuang, Hugh C Watkins, James G Wilson, Mark J Daly, Daniel G MacArthur, Hanna E Abboud, Goncalo Abecasis, Carlos A Aguilar-Salinas, Olimpia Arellano-Campos, Gil Atzmon, Ingvild Aukrust, Cathy L Barr, Graeme I Bell, Graeme I Bell, Sarah Bergen, Lise Bjørkhaug, John Blangero, Donald W Bowden, Cathy L Budman, Noël P Burtt, Federico Centeno-Cruz, John C Chambers, Kimberly Chambert, Robert Clarke, Rory Collins, Giovanni Coppola, Emilio J Córdova, Maria L Cortes, Nancy J Cox, Ravindranath Duggirala, Martin Farrall, Juan C Fernandez-Lopez, Pierre Fontanillas, Timothy M Frayling, Nelson B Freimer, Christian Fuchsberger, Humberto García-Ortiz, Anuj Goel, María J Gómez-Vázquez, María E González-Villalpando, Clicerio González-Villalpando, Marco A Grados, Leif Groop, Christopher A Haiman, Craig L Hanis, Craig L Hanis, Andrew T Hattersley, Brian E Henderson, Jemma C Hopewell, Alicia Huerta-Chagoya, Sergio Islas-Andrade, Suzanne B R Jacobs, Shapour Jalilzadeh, Christopher P Jenkinson, Jennifer Moran, Silvia Jiménez-Morale, Anna Kähler, Robert A King, George Kirov, Jaspal S Kooner, Theodosios Kyriakou, Jong-Young Lee, Donna M Lehman, Gholson Lyon, William MacMahon, Patrik K E Magnusson, Anubha Mahajan, Jaume Marrugat, Angélica Martínez-Hernández, Carol A Mathews, Gilean McVean, James B Meigs, Thomas Meitinger, Elvia Mendoza-Caamal, Josep M Mercader, Karen L Mohlke, Hortensia Moreno-Macías, Andrew P Morris, Laeya A Najmi, Pål R Njølstad, Michael C O'Donovan, Maria L Ordóñez-Sánchez, Michael J Owen, Taesung Park, David L Pauls, Danielle Posthuma, Cristina Revilla-Monsalve, Laura Riba, Stephan Ripke, Rosario Rodríguez-Guillén, Maribel Rodríguez-Torres, Paul Sandor, Mark Seielstad, Rob Sladek, Xavier Soberón, Timothy D Spector, Shyong E Tai, Tanya M Teslovich, Geoffrey Walford, Lynne R Wilkens, Amy L Williams

Affiliations

¹ Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
² Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
³ School of Paediatrics and Child Health, University of Sydney, Sydney, New South Wales 2145, Australia.
⁴ Institute for Neuroscience and Muscle Research, Children's Hospital at Westmead, Sydney, New South Wales 2145, Australia.
⁵ Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA.
⁶ Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
⁷ Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts 02115, USA.
⁸ Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
⁹ National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK.
¹⁰ NIHR Royal Brompton Cardiovascular Biomedical Research Unit, Royal Brompton Hospital, London SW3 6NP, UK.
¹¹ MRC Clinical Sciences Centre, Imperial College London, London SW7 2AZ, UK.
¹² Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
¹³ Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, Massachusetts 02115, USA.
¹⁴ Mouse Genome Informatics, Jackson Laboratory, Bar Harbor, Maine 04609, USA.
¹⁵ Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, Arizona 85721, USA.
¹⁶ Institute of Medical Genetics, Cardiff University, Cardiff CF10 3XQ, UK.
¹⁷ Google, Mountain View, California 94043, USA.
¹⁸ Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
¹⁹ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²⁰ Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²¹ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²² The Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²³ Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
²⁴ Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²⁵ Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
²⁶ Harvard Medical School, Boston, Massachusetts 02115, USA.
²⁷ Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
²⁸ Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
²⁹ Immunogenomics and Metabolic Disease Laboratory, Instituto Nacional de Medicina Genómica, Mexico City 14610, Mexico.
³⁰ Molecular Biology and Genomic Medicine Unit, Instituto Nacional de Ciencias Médicas y Nutrición, Mexico City 14080, Mexico.
³¹ Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea.
³² Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
³³ Vertex Pharmaceuticals, Boston, Massachusetts 02210, USA.
³⁴ Department of Cardiology, University Hospital, 43100 Parma, Italy.
³⁵ Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA.
³⁶ Department of Public Health and Primary Care, Strangeways Research Laboratory, Cambridge CB1 8RN, UK.
³⁷ Cardiovascular Epidemiology and Genetics, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain.
³⁸ Department of Pathology and Cancer Center, Massachusetts General Hospital, Boston, Massachusetts, 02114 USA.
³⁹ Psychiatric Genetic Epidemiology &Neurobiology Laboratory, State University of New York, Upstate Medical University, Syracuse, New York 13210, USA.
⁴⁰ Department of Psychiatry and Behavioral Sciences, State University of New York, Upstate Medical University, Syracuse, New York 13210, USA.
⁴¹ Department of Neuroscience and Physiology, State University of New York, Upstate Medical University, Syracuse, New York 13210, USA.
⁴² Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, SE-171 77 Stockholm, Sweden.
⁴³ Department of Medicine, University of Eastern Finland and Kuopio University Hospital, 70211 Kuopio, Finland.
⁴⁴ Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX1 2JD, UK.
⁴⁵ Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford OX1 2JD, UK.
⁴⁶ Oxford NIHR Biomedical Research Centre, Oxford University Hospitals Foundation Trust, Oxford OX1 2JD, UK.
⁴⁷ Inflammatory Bowel Disease and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, California 90048, USA.
⁴⁸ Atherogenomics Laboratory, University of Ottawa Heart Institute, Ottawa, Ontario K1Y 4W7, Canada.
⁴⁹ Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00100 Helsinki, Finland.
⁵⁰ Department of Biostatistics and Epidemiology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
⁵¹ Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
⁵² Center for Non-Communicable Diseases, Karachi, Pakistan.
⁵³ Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
⁵⁴ Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
⁵⁵ Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599, USA.
⁵⁶ Department of Medical Epidemiology and Biostatistics, Karolinska Institutet SE-171 77 Stockholm, Sweden.
⁵⁷ Department of Public Health, University of Helsinki, 00100 Helsinki, Finland.
⁵⁸ Department of Psychiatry, University of California, San Diego, California 92093, USA.
⁵⁹ Radcliffe Department of Medicine, University of Oxford, Oxford OX1 2JD, UK.
⁶⁰ Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, Mississippi 39216, USA.

PMID: 27535533
PMCID: PMC5018207
DOI: 10.1038/nature19057

Analysis of protein-coding genetic variation in 60,706 humans

Monkol Lek et al. Nature. 2016.

. 2016 Aug 18;536(7616):285-91.

doi: 10.1038/nature19057.

Authors

Collaborators

Exome Aggregation Consortium:
Monkol Lek, Konrad J Karczewski, Eric V Minikel, Kaitlin E Samocha, Eric Banks, Timothy Fennell, Anne H O'Donnell-Luria, James S Ware, Andrew J Hill, Beryl B Cummings, Taru Tukiainen, Daniel P Birnbaum, Jack A Kosmicki, Laramie E Duncan, Karol Estrada, Fengmei Zhao, James Zou, Emma Pierce-Hoffman, Joanne Berghout, David N Cooper, Nicole Deflaux, Mark DePristo, Ron Do, Jason Flannick, Menachem Fromer, Laura Gauthier, Jackie Goldstein, Namrata Gupta, Daniel Howrigan, Adam Kiezun, Mitja I Kurki, Ami Levy Moonshine, Pradeep Natarajan, Lorena Orozco, Gina M Peloso, Ryan Poplin, Manuel A Rivas, Valentin Ruano-Rubio, Samuel A Rose, Douglas M Ruderfer, Khalid Shakir, Peter D Stenson, Christine Stevens, Brett P Thomas, Grace Tiao, Maria T Tusie-Luna, Ben Weisburd, Hong-Hee Won, Dongmei Yu, David M Altshuler, Diego Ardissino, Michael Boehnke, John Danesh, Stacey Donnelly, Roberto Elosua, Jose C Florez, Stacey B Gabriel, Gad Getz, Stephen J Glatt, Christina M Hultman, Sekar Kathiresan, Markku Laakso, Steven McCarroll, Mark I McCarthy, Dermot McGovern, Ruth McPherson, Benjamin M Neale, Aarno Palotie, Shaun M Purcell, Danish Saleheen, Jeremiah M Scharf, Pamela Sklar, Patrick F Sullivan, Jaakko Tuomilehto, Ming T Tsuang, Hugh C Watkins, James G Wilson, Mark J Daly, Daniel G MacArthur, Hanna E Abboud, Goncalo Abecasis, Carlos A Aguilar-Salinas, Olimpia Arellano-Campos, Gil Atzmon, Ingvild Aukrust, Cathy L Barr, Graeme I Bell, Graeme I Bell, Sarah Bergen, Lise Bjørkhaug, John Blangero, Donald W Bowden, Cathy L Budman, Noël P Burtt, Federico Centeno-Cruz, John C Chambers, Kimberly Chambert, Robert Clarke, Rory Collins, Giovanni Coppola, Emilio J Córdova, Maria L Cortes, Nancy J Cox, Ravindranath Duggirala, Martin Farrall, Juan C Fernandez-Lopez, Pierre Fontanillas, Timothy M Frayling, Nelson B Freimer, Christian Fuchsberger, Humberto García-Ortiz, Anuj Goel, María J Gómez-Vázquez, María E González-Villalpando, Clicerio González-Villalpando, Marco A Grados, Leif Groop, Christopher A Haiman, Craig L Hanis, Craig L Hanis, Andrew T Hattersley, Brian E Henderson, Jemma C Hopewell, Alicia Huerta-Chagoya, Sergio Islas-Andrade, Suzanne B R Jacobs, Shapour Jalilzadeh, Christopher P Jenkinson, Jennifer Moran, Silvia Jiménez-Morale, Anna Kähler, Robert A King, George Kirov, Jaspal S Kooner, Theodosios Kyriakou, Jong-Young Lee, Donna M Lehman, Gholson Lyon, William MacMahon, Patrik K E Magnusson, Anubha Mahajan, Jaume Marrugat, Angélica Martínez-Hernández, Carol A Mathews, Gilean McVean, James B Meigs, Thomas Meitinger, Elvia Mendoza-Caamal, Josep M Mercader, Karen L Mohlke, Hortensia Moreno-Macías, Andrew P Morris, Laeya A Najmi, Pål R Njølstad, Michael C O'Donovan, Maria L Ordóñez-Sánchez, Michael J Owen, Taesung Park, David L Pauls, Danielle Posthuma, Cristina Revilla-Monsalve, Laura Riba, Stephan Ripke, Rosario Rodríguez-Guillén, Maribel Rodríguez-Torres, Paul Sandor, Mark Seielstad, Rob Sladek, Xavier Soberón, Timothy D Spector, Shyong E Tai, Tanya M Teslovich, Geoffrey Walford, Lynne R Wilkens, Amy L Williams

Affiliations

¹ Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
² Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
³ School of Paediatrics and Child Health, University of Sydney, Sydney, New South Wales 2145, Australia.
⁴ Institute for Neuroscience and Muscle Research, Children's Hospital at Westmead, Sydney, New South Wales 2145, Australia.
⁵ Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA.
⁶ Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
⁷ Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts 02115, USA.
⁸ Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
⁹ National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK.
¹⁰ NIHR Royal Brompton Cardiovascular Biomedical Research Unit, Royal Brompton Hospital, London SW3 6NP, UK.
¹¹ MRC Clinical Sciences Centre, Imperial College London, London SW7 2AZ, UK.
¹² Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
¹³ Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, Massachusetts 02115, USA.
¹⁴ Mouse Genome Informatics, Jackson Laboratory, Bar Harbor, Maine 04609, USA.
¹⁵ Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, Arizona 85721, USA.
¹⁶ Institute of Medical Genetics, Cardiff University, Cardiff CF10 3XQ, UK.
¹⁷ Google, Mountain View, California 94043, USA.
¹⁸ Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
¹⁹ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²⁰ Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²¹ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²² The Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²³ Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
²⁴ Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
²⁵ Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
²⁶ Harvard Medical School, Boston, Massachusetts 02115, USA.
²⁷ Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
²⁸ Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
²⁹ Immunogenomics and Metabolic Disease Laboratory, Instituto Nacional de Medicina Genómica, Mexico City 14610, Mexico.
³⁰ Molecular Biology and Genomic Medicine Unit, Instituto Nacional de Ciencias Médicas y Nutrición, Mexico City 14080, Mexico.
³¹ Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea.
³² Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
³³ Vertex Pharmaceuticals, Boston, Massachusetts 02210, USA.
³⁴ Department of Cardiology, University Hospital, 43100 Parma, Italy.
³⁵ Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA.
³⁶ Department of Public Health and Primary Care, Strangeways Research Laboratory, Cambridge CB1 8RN, UK.
³⁷ Cardiovascular Epidemiology and Genetics, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain.
³⁸ Department of Pathology and Cancer Center, Massachusetts General Hospital, Boston, Massachusetts, 02114 USA.
³⁹ Psychiatric Genetic Epidemiology &Neurobiology Laboratory, State University of New York, Upstate Medical University, Syracuse, New York 13210, USA.
⁴⁰ Department of Psychiatry and Behavioral Sciences, State University of New York, Upstate Medical University, Syracuse, New York 13210, USA.
⁴¹ Department of Neuroscience and Physiology, State University of New York, Upstate Medical University, Syracuse, New York 13210, USA.
⁴² Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, SE-171 77 Stockholm, Sweden.
⁴³ Department of Medicine, University of Eastern Finland and Kuopio University Hospital, 70211 Kuopio, Finland.
⁴⁴ Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX1 2JD, UK.
⁴⁵ Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford OX1 2JD, UK.
⁴⁶ Oxford NIHR Biomedical Research Centre, Oxford University Hospitals Foundation Trust, Oxford OX1 2JD, UK.
⁴⁷ Inflammatory Bowel Disease and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, California 90048, USA.
⁴⁸ Atherogenomics Laboratory, University of Ottawa Heart Institute, Ottawa, Ontario K1Y 4W7, Canada.
⁴⁹ Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00100 Helsinki, Finland.
⁵⁰ Department of Biostatistics and Epidemiology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
⁵¹ Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
⁵² Center for Non-Communicable Diseases, Karachi, Pakistan.
⁵³ Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
⁵⁴ Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
⁵⁵ Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599, USA.
⁵⁶ Department of Medical Epidemiology and Biostatistics, Karolinska Institutet SE-171 77 Stockholm, Sweden.
⁵⁷ Department of Public Health, University of Helsinki, 00100 Helsinki, Finland.
⁵⁸ Department of Psychiatry, University of California, San Diego, California 92093, USA.
⁵⁹ Radcliffe Department of Medicine, University of Oxford, Oxford OX1 2JD, UK.
⁶⁰ Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, Mississippi 39216, USA.

PMID: 27535533
PMCID: PMC5018207
DOI: 10.1038/nature19057

Abstract

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

PubMed Disclaimer

Figures

**Extended Data Figure 1. The impact of recurrence across different mutation and functional classes**
a) TiTv (Transition to transversion) ratio of synonymous variants at downsampled intervals of ExAC. The TiTv is relatively stable at previous sample sizes (<5000) but changes drastically at larger sample sizes. b) For synonymous doubleton variants, mutability of each trinucleotide context is correlated with mean Euclidean distance of individuals that share the doubleton. Transversion (red) and non-CpG transition (green) doubletons are more likely to be found in closer PCA space (i.e. more similar ethnicities) than CpG transitions (blue) c) The proportion singleton among various functional categories. The functional category stop lost has a higher singleton rate than nonsense. Error bars represent standard error of the mean. d) Among synonymous variants, mutability of each trinucleotide context is correlated with proportion singleton, suggesting CpG transitions (blue) are more likely to have multiple independent origins driving their allele frequency up. e) The proportion singleton metric from c) broken down by transversions, non-CpG transitions, and CpG variants. Notably, there is a wide variation in singleton rates among mutational contexts in functional classes, and there are no stop-lost CpG transitions. Error bars represent standard error of the mean.

**Extended Data Figure 2. Multi-nucleotide variants discovered in the ExAC data set**
a) Number of MNPs per impact on the variant interpretation. b) Distribution of the number of MNPs per sample where phasing changes interpretation, separated by allele frequency. Common > 1%, Rare < 1%. MNPs comprised of a rare and common allele are considered rare as this defines the frequency of the MNP.

Extended Data Figure 3. Relationships between depth and observed vs expected variants as well as correlations between observed and expected variant counts for synonymous, missense, and protein-truncating
a) The relationship between the median depth of exons (bins of 2) and the sum of all observed synonymous variants in those exons divided by the sum of all expected synonymous variants. The curve was used to determine the appropriate depth adjustment for expected variant counts. For the rest of the panels, the correlation between the depth-adjusted expected variants counts and observed are depicted for synonymous (b), missense (c), and protein-truncating (d). The black line indicates a perfect correlation (slope = 1). Axes have been trimmed to remove *TTN*.

**Extended Data Figure 4. Number of protein-truncating variants in constrained genes per individual by allele frequency bin**
Equivalent to Figure 5b limited to constrained (pLI ≥ 0.9) genes.

**Extended Data Figure 5. Principal component analysis (PCA) and key metrics used to filter samples**
a) Principal component analysis using a set of 5,400 common exome SNPs. Individuals are colored by their distance from each of the population cluster centers using the first 4 principal components. b) The metrics number of variants, TiTv, alternate heterozygous/homozygous (HetHom) ratio and Insertion/Deletion (InsDel) ratio. Populations are their respective colors: Latino (red), African (purple), European (blue), South Asian (yellow) and East Asian (green).

**Figure 1. Patterns of genetic variation in 60,706 humans**
a) The size and diversity of public reference exome datasets. ExAC exceeds previous datasets in size for all studied populations. b) Principal component analysis (PCA) dividing ExAC individuals into five continental populations. PC2 and PC3 are shown; additional PCs are in Extended Data Figure 5a. c) The allele frequency spectrum of ExAC highlights that the majority of genetic variants are rare and novel. d) The proportion of possible variation observed by mutational context and functional class. Over half of all possible CpG transitions are observed. Error bars represent standard error of the mean. e-f) The number (e) and frequency distribution (proportion singleton; f) of indels, by size. Compared to in-frame indels, frameshift variants are less common (have a higher proportion of singletons, a proxy for predicted deleteriousness on gene product). Error bars indicate 95% confidence intervals.

**Figure 2. Mutational recurrence at large sample sizes**
a) Proportion of validated *de novo* variants from two external datasets that are independently found in ExAC, separated by functional class and mutational context. Error bars represent standard error of the mean. Colors are consistent in a-d. b) Number of unique variants observed, by mutational context, as a function of number of individuals (down-sampled from ExAC). CpG transitions, the most likely mutational event, begin reaching saturation at ~20,000 individuals. c) The site frequency spectrum is shown for each mutational context. d) For doubletons (variants with an allele count of 2), mutation rate is positively correlated with the likelihood of being found in two individuals of different continental populations. e) The mutability-adjusted proportion of singletons (MAPS) is shown across functional classes. Error bars represent standard error of the mean of the proportion of singletons.

**Figure 3. Quantifying intolerance to functional variation in genes and gene sets**
a) Histograms of constraint Z scores for 18,225 genes. This measure of departure of number of variants from expectation is normally distributed for synonymous variants, but right-shifted (higher constraint) for missense and protein-truncating variants (PTVs), indicating that more genes are intolerant to these classes of variation. b) The proportion of genes that are very likely intolerant of loss-of-function variation (pLI ≥ 0.9) is highest for ClinGen haploinsufficient genes, and stratifies by the severity and age of onset of the haploinsufficient phenotype. Genes essential in cell culture and dominant disease genes are likewise enriched for intolerant genes, while recessive disease genes and olfactory receptors have fewer intolerant genes. Black error bars indicate 95% confidence intervals (CI). c) Synonymous Z scores show no correlation with the number of tissues in which a gene is expressed, but the most missense- and PTV-constrained genes tend to be expressed in more tissues. Thick black bars indicate the first to third quartiles, with the white circle marking the median. d) Highly missense- and PTV-constrained genes are less likely to have eQTLs discovered in GTEx as the average gene. Shaded regions around the lines indicate 95% CI. e) Highly missense- and PTV-constrained genes are more likely to be adjacent to GWAS signals than the average gene. Shaded regions around the lines indicate 95% CI. f) MAPS (Figure 2d) is shown for each functional category, broken down by constraint score bins as shown. Missense and PTV constraint score bins provide information about natural selection at least partially orthogonal to MAPS, PolyPhen, and CADD scores, indicating that this metric should be useful in identifying variants associated with deleterious phenotypes. Shaded regions around the lines indicate 95% CI. For panels a,c-f: synonymous shown in gray, missense in orange, and protein-truncating in maroon.

**Figure 4. Filtering for Mendelian variant discovery**
a) Predicted missense and protein-truncating variants in 500 randomly chosen ExAC individuals were filtered based on allele frequency information from ESP, or from the remaining ExAC individuals. At a 0.1% allele frequency (AF) filter, ExAC provides greater power to remove candidate variants, leaving an average of 154 variants for analysis, compared to 1090 after filtering against ESP. Popmax AF also provides greater power than global AF, particularly when populations are unequally sampled. b) Estimates of allele frequency in Europeans based on ESP are more precise at higher allele frequencies. Sampling variance and ascertainment bias make AF estimates unreliable, posing problems for Mendelian variant filtration. 69% of ESP European singletons are not seen a second time in ExAC (tall bar at left), illustrating the dangers of filtering on very low allele counts. c) Allele frequency spectrum of disease-causing variants in the Human Gene Mutation Database (HGMD) and/or pathogenic or likely pathogenic variants in ClinVar for well characterized autosomal dominant and autosomal recessive disease genes. Most are not found in ExAC; however, many of the reportedly pathogenic variants found in ExAC are at too high a frequency to be consistent with disease prevalence and penetrance. d) Literature review of variants with >1% global allele frequency or >1% Latin American or South Asian population allele frequency confirmed there is insufficient evidence for pathogenicity for the majority of these variants. Variants were reclassified by ACMG guidelines.

**Figure 5. Protein-truncating variation in ExAC**
a) The average ExAC individual has 85 heterozygous and 35 homozygous protein-truncating variants (PTVs), of which 18 and 0.19 are rare (<0.1% popmax AF), respectively. Error bars represent standard deviation. b) Breakdown of PTVs per individual (a) by popmax AF bin. Across all populations, most PTVs found in a given individual are common (>5% popmax AF). c-d) Number of genes with at least one PTV (c) or homozygous PTV (d) as a function of number of individuals, downsampled from ExAC. South Asian population is broken down by consanguinity (Inbreeding coefficient, F). At 60,000 individuals for ExAC, the plots in c) and d) extends to 15,750 with at least one PTV and 1,550 genes with at least one homozygous PTV.

See this image and copyright information in PMC

Comment in

Human genomics: A deep dive into genetic variation.
Shendure J. Shendure J. Nature. 2016 Aug 18;536(7616):277-8. doi: 10.1038/536277a. Nature. 2016. PMID: 27535530 No abstract available.
Rethink the links between genes and disease.
[No authors listed] [No authors listed] Nature. 2016 Oct 13;538(7624):140. doi: 10.1038/538140a. Nature. 2016. PMID: 27734882 No abstract available.
How scientists use Slack.
Perkel JM. Perkel JM. Nature. 2016 Dec 29;541(7635):123-124. doi: 10.1038/541123a. Nature. 2016. PMID: 28054618 No abstract available.

Cited by

Exome functional risk score and brain connectivity can predict social adaptability outcome of children with autism spectrum disorder in 4 years' follow up.
Luo T, Zhang M, Li S, Situ M, Liu P, Wang M, Tao Y, Zhao S, Wang Z, Yang Y, Huang Y. Luo T, et al. Front Psychiatry. 2024 May 16;15:1384134. doi: 10.3389/fpsyt.2024.1384134. eCollection 2024. Front Psychiatry. 2024. PMID: 38818019 Free PMC article.
Variability in SOD1-associated amyotrophic lateral sclerosis: geographic patterns, clinical heterogeneity, molecular alterations, and therapeutic implications.
Huang M, Liu YU, Yao X, Qin D, Su H. Huang M, et al. Transl Neurodegener. 2024 May 29;13(1):28. doi: 10.1186/s40035-024-00416-x. Transl Neurodegener. 2024. PMID: 38811997 Free PMC article. Review.
Genetic background of primary and familial HLH in Qatar: registry data and population study.
Elgaali E, Mezzavilla M, Ahmed I, Elanbari M, Ali A, Abdelaziz G, Fakhro KA, Saleh A, Ben-Omran T, Almulla N, Cugno C. Elgaali E, et al. Front Pediatr. 2024 May 9;12:1326489. doi: 10.3389/fped.2024.1326489. eCollection 2024. Front Pediatr. 2024. PMID: 38808104 Free PMC article.
APF2: an improved ensemble method for pharmacogenomic variant effect prediction.
Zhou Y, Pirmann S, Lauschke VM. Zhou Y, et al. Pharmacogenomics J. 2024 May 27;24(3):17. doi: 10.1038/s41397-024-00338-x. Pharmacogenomics J. 2024. PMID: 38802404 Free PMC article.
The copy number variant architecture of psychopathology and cognitive development in the ABCD^® study.
Sha Z, Sun KY, Jung B, Barzilay R, Moore TM, Almasy L, Forsyth JK, Prem S, Gandal MJ, Seidlitz J, Glessner JT, Alexander-Bloch AF. Sha Z, et al. medRxiv [Preprint]. 2024 May 15:2024.05.14.24307376. doi: 10.1101/2024.05.14.24307376. medRxiv. 2024. PMID: 38798629 Free PMC article. Preprint.

See all "Cited by" articles

References

1. Fu W, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493:216–220. - PMC - PubMed
1. 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature. 2015;526:68–74. - PMC - PubMed
1. Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. - PMC - PubMed
1. Stoneking M, Krause J. Learning about human population history from ancient and modern genomes. Nat. Rev. Genet. 2011;12:603–614. - PubMed
1. MacArthur DG, et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science. 2012;335:823–828. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
Molecular Biology Databases
- GlyGen glycoinformatics resource
- The Weizmann Institute of Science GeneCards and MalaCards databases

[1] Fu W, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493:216–220. - PMC - PubMed

[2] Fu W, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493:216–220. - PMC - PubMed

[3] 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature. 2015;526:68–74. - PMC - PubMed

[4] 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature. 2015;526:68–74. - PMC - PubMed

[5] Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. - PMC - PubMed

[6] Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. - PMC - PubMed

[7] Stoneking M, Krause J. Learning about human population history from ancient and modern genomes. Nat. Rev. Genet. 2011;12:603–614. - PubMed

[8] Stoneking M, Krause J. Learning about human population history from ancient and modern genomes. Nat. Rev. Genet. 2011;12:603–614. - PubMed

[9] MacArthur DG, et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science. 2012;335:823–828. - PMC - PubMed

[10] MacArthur DG, et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science. 2012;335:823–828. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Analysis of protein-coding genetic variation in 60,706 humans

Collaborators

Affiliations

Analysis of protein-coding genetic variation in 60,706 humans

Authors

Collaborators

Affiliations

Abstract

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases