top of page


Our Current Focus



We are interested in both method development using innovative approaches and statistics for reading genetic information, as well as learning about the evolution and phenotypic variation of diverse humans. We are interested in collaborating with investigators working on other natural populations, to understand the similarities and differences in the evolutionary process in humans compared to other species. Also trained in history, we have a keen awareness of, and continue to retain an interest in studying how societal norms affect the scientific process and enterprise and vice versa. We are committed to working towards making genomics and personalized medicine a universal phenomenon by helping to create genomic resources for underserved populations, and participating in the training of diverse scientists. We are also committed to participate in making genomic science an ethical and inclusionary pursuit. We enjoy working on science communication and outreach projects, using visual, audiovisual and written means. The Sohail lab considers all of the above important parts of a 21st century geneticist’s toolkit, and training and work in the lab aims to be multifaceted in this respect.

Mohatta Palace Poster_slice_edited.jpg


Population genetic history

We are interested in how genomics can inform historical inference, and vice versa, with a particular interest in Mexico and Pakistan (among the larger regions of central and south Asia and America), which present numerous parallels in their rich indigenous diversity, recent European colonization, and global positionality from being developing countries. We are interested in using identity-by-descent segments and other means of demographic inference to learn about the genetic histories of different regions and groups, and how these correlate with cultural histories and present-day cultural structure. We place an emphasis on developing relationships with participants and creating equitable and reciprocal frameworks for data generation and use for our studies. We are developing projects describing nation-wide genetic diversity and history, as well as projects focused on specific regional histories. Many of our current projects are using the Mexican Biobank which we helped generate, as well as studies such as the Human Genome Diversity Project, the1000 Genomes Project, the UK Biobank and Genes and Health. 


Complex trait variation

We are interested in investigating in the role that genetic ancestry and population history play in generating trait-relevant genetic variation. This involves combined analyses of identity-by-descent segments, runs of homozygosity, ancestry estimates, environmental variables and complex trait variation. We are also interested in developing methods for evaluating the role of gene-by-environment interactions in complex trait variation. Our general framework employs the correlations among fine-scale genetic, trait and cultural structure.


Prediction of traits in present-day and ancient humans

We are investigating the accuracy of trait prediction in ancient samples using present-day trait causing alleles by employing simulations under neutrality and under stabilizing selection. We are interested in learning the factors that are most relevant for lowering or improving trait prediction. We are also developing a project to evaluate deep leaning approaches (compared to polygenic scores) from artificial intelligence for trait prediction incorporating both genetic and environmental factors.


The evolutionary history of traits

We are interested in understanding the history of complex traits and disease by interrogating them through the contexts of our evolutionary past. Within this general agenda, we are developing a project on investigating relative contributions to complex trait architecture from sequence elements originating across multiple evolutionary time-scales.


Role of rare genetic variation

Rare genetic variants are relevant for understanding recent population structure, and have also been implicated in trait-relevant variants that are population-specific. We are interested in developing methods employing rare variants to interrogate recent history and variation in traits and disease incidence.



Kun, E., Javan, E.M., Smith, O., Gulamali, F., de la Fuente, J., Flynn, B.I., Vajrala, K., Trutner, Z., Jayakumar, P., Tucker-Drob, E.M., Sohail, M., Singh, T., Narasimhan, V.M.. (2023). The genetic architecture and evolution of the human skeletal form. Science (2023)

Research Topic Collection. Genetic Architecture and Evolution of Complex Traits and Diseases in Diverse Human Populations. Frontiers in Genetics (2022) 


Mashaal Sohail. Investigating relative contributions to psychiatric disease architecture from sequence elements originating across multiple evolutionary time-scales. BioRxiv (2022)

Mashaal Sohail, Alan Izarras-Gomez and Diego Ortega-Del Vecchyo. Populations, traits, and their spatial structure in humans. Genome Biology and Evolution (2021)

Andrés Jiménez-Kaufmann, Amanda Y. Chong, Adrián Cortés, Selene Fernandez-Valverde, Leticia Ferreyra-Reyes, Luis Pablo Cruz-Hervert, Mashaal Sohail, Consuelo D. Quinto-Cortés, Alexander J. Mentzer, Adrian V.S. Hill, Alan M. Torres, Hie Lim Kim, Stephan C. Schuster, Diego Ortega Del-Vecchyo, Lourdes García-García, Andrés Moreno-Estrada. Imputation performance in Latin Americans: improving accuracy in rare variants with the inclusion of Native American genomes. Frontiers in Genetics (2021)

Mashaal Sohail, Amanda Y. Chong, Consuelo D. Quinto-Cortes, María J. Palma-Martínez, Aaron Ragsdale, Santiago G. Medina-Muñoz, Carmina Barberena-Jonas, Guadalupe Delgado-Sánchez, Luis Pablo Cruz-Hervert, Leticia Ferreyra-Reyes, Elizabeth Ferreira-Guerrero, Norma Mongua-Rodríguez, Andrés Jimenez-Kaufmann, Hortensia Moreno-Macías, Carlos A. Aguilar-Salinas, Kathryn Auckland, Adrián Cortés, Víctor Acuña-Alonzo, Alexander G. Ioannidis, Christopher R. Gignoux, Genevieve L. Wojcik, Selene L. Fernández-Valverde, Adrian V.S. Hill, María Teresa Tusié-Luna, Alexander J. Mentzer, John Novembre, Lourdes García-García, Andrés Moreno-Estrada Nation-wide biobank in Mexico unravels demographic history and complex trait architecture from 6,000 genomes. bioRxiv (2022). See The MX Biobank project


The Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nature Genetics (2014)


Mashaal Sohail*, Robert M. Maier*, Andrea Ganna, Alex Bloemendal, Alicia R. Martin, Michael C. Turchin,  Charleston W. K. Chiang, Joel N. Hirschhorn, Mark J. Daly, Nick Patterson, Benjamin M. Neale, Iain Mathieson, David Reich, Shamil R. Sunyaev. Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide

association studies. eLife (2018) *equal contribution



Using linkage disequilibrium (two-locus dynamics) to learn about evolution.

We have been involved in developing methods to use linkage equilibrium to learn about non-additive natural selection (epistasis) and population structure (assortative mating). We are in interested in continuing to develop approaches using linkage disequilibrium to learn about evolutionary processes such as epistatic natural selection, non-random mating and adaptive introgression in humans and in other species. 

Genome-wide patterns of selection

We are interested in global patterns of selection genome-wide. We have previously found evidence for synergistic epistasis acting against deleterious variants genome-wide in human and in fruit flies. We have also done work on showing how balancing selection genome-wide correlates with genes that are mono-allelically expressed. Lastly, we have studied patterns of polygenic adaptation and how these are confounded by population stratification in genome-wide association studies. We are interested in continuing to develop approaches to interrogate different types of selection genome-wide and their relationship with modules of gene expression. We have been interested in the historiography of the concept of natural selection as a mechanism for evolution, and our major interest is in understanding how prevalent it is in living organisms, and in humans, using the plethora of genetic and phenotypic datasets now available.


Mashaal Sohail, Olga A Vakhrusheva, Jae Hoon Sul, Sara Pulit, Laurent Francioli, GoNL Consortium, Alzheimers Disease Neuroimaging Initiative, Leonard H van den Berg, Jan H Veldink, Paul de Bakker, Georgii A Bazykin, Alexey S Kondrashov, Shamil Sunyaev. Negative selection in humans and fruit flies involves synergistic epistasis. Science (2017)

Brian Arnold, Mashaal Sohail, Crista Wadsworth, Bill Hanage, Jukka Corander, Shamil Sunyaev, Yonatan Grad. Linkage reveals localized population structure and selection in Neisseria gonorrhoeae. bioRxiv (2019).

Virginia Savova*, Sung Chun*, Mashaal Sohail*,  Ruth B. McCole, Robert Witwicki, Lisa Gai, Tobias L. Lenz, C.-ting Wu, Shamil R. Sunyaev, Alexander A. Gimelbrant. Genes with monoallelic expression contribute disproportionately to genetic diversity in humans. Nature Genetics  (2016) *equal contribution

Mashaal Sohail*, Robert M. Maier*, Andrea Ganna, Alex Bloemendal, Alicia R. Martin, Michael C. Turchin,  Charleston W. K. Chiang, Joel N. Hirschhorn, Mark J. Daly, Nick Patterson, Benjamin M. Neale, Iain Mathieson, David Reich, Shamil R. Sunyaev. Polygenic adaptation on height has been overestimated due to uncorrected stratification in genome-wide association studies. eLife (2019) *equal contribution

AM history of science thesis paper (reach out for a copy!)



We believe that the 21st century geneticist’s toolkit should include a keen awareness of the history of race and genetics (to help understand how this legacy affects our thinking and practices today), and part of the work in our lab is continued participation in and organization of reading and study groups, invited seminars, courses, workshops and transdisciplinary teams towards this goal. Further, we read and consult broadly on our research themes to learn from our colleagues in history, science and technology studies, anthropology, archaeology and sociology to help contextualize and historicize our research questions, methods and the implications of our research results. We make an effort to apply what we are learning from these sources in every step of our research work from project motivation/design, sampling approach, analysis methods to interpretation, writing and communication of results. This is part of a general agenda of increasing our understanding and ability to think and communicate about issues at the intersection of genetics and society. A specific research area is investigating the use of groupings, especially continental groupings, in the study of human genetic and phenotypic variation. 


Conacyt funded project “Desafiando las agrupaciones continentales de humanos en el Proyecto 1000 Genomas”

History of race and genetics reading group at U Chicago. Website here:


Member of Harvard University’s Edmond J. Safra Center for Ethics working group on ‘Understanding Variation: normative and analytical implications of using “population” and “ancestry” for generalization and comparison’

Anna C. F. Lewis, Santiago J. Molina, Paul S Appelbaum, Bege Dauda, Anna Di Rienzo, Agustin Fuentes, Stephanie M. Fullerton, Nanibaa' A. Garrison, Nayanika Ghosh, Evelynn M. Hammonds, David S. Jones, Eimear E. Kenny, Peter Kraft, Sandra S.-J. Lee, Madelyn Mauro, John Novembre, Aaron Panofsky, Mashaal Sohail, Benjamin M. Neale, Danielle S. Allen. Getting Genetic Ancestry Right for Science and Society. Science (2022) 

Bege Dauda, Santiago J. Molina, Danielle S. Allen, Agustin Fuentes, Nayanika Ghosh, Madelyn Mauro, Benjamin M. Neale, Aaron Panofsky, Mashaal Sohail, Sarah R. Zhang and Anna C. F. Lewis. Ancestry: How researchers use it and what they mean by it. Frontiers in Genetics (2023).

Anna C. F. Lewis, Santiago J. Molina, Paul S. Appelbaum, Bege Dauda, Agustin Fuentes, Stephanie M. Fullerton, Nanibaa’ A. Garrison, Nayanika Ghosh, Robert C. Green, Evelynn M. Hammonds, Janina M. Jeff, David S. Jones, Eimear E. Kenny, Peter Kraft, Madelyn Mauro, Anil P. S. Ori, Aaron Panofsky, Mashaal Sohail, Benjamin M. Neale, and Danielle S. Allen. An ethical framework for research using genetic ancestry. Perspectives in Biology and Medicine (2023).

bottom of page