The Faculty


Bayesian and computational statistics, particularly when applied to problems in population genetics

Research Description

My general interests include Bayesian and computational statistics, particularly when applied to problems in population genetics.  Specific interests include:

  • estimating haplotypes from population genotype data (for which I distribute a software package PHASE).
  • developing statistical models for patterns of linkage disequilibrium across multiple loci, and using these patterns to identify recombination hotspots.
  • spatial modelling of allele frequency variation.

Selected Publications


H Shim, Z Xing, E Pantaleo, F Luca, R Pique-Regi and M Stephens. Multi-scale Poisson process approaches for differential expression analysis of high-throughput sequencing data. arXiv:2106.13634. R package | source code implementing the analyses

P Carbonetto, A Sarkar, Z Wang and M Stephens. Non-negative matrix factorization algorithms greatly improve topic model fits. arXiv:2105.13440. R package | companion source code repository

Barbeira et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biology 22: 49.

de Goede et al. Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease. Cell 184(10): 2633-2648.e19.

A Sarkar and M Stephens. Separating measurement and expression models clarifies confusion in single cell RNA-seq analysis. Nature Genetics 53: 770-777. bioRxiv preprint | Python package | companion website | companion source code repository

W Wang and M Stephens. Empirical Bayes matrix factorization. Journal of Machine Learning Research 22(120): 1-40. arXiv preprint | R package | code used to produce results in paper

Z Xing, P Carbonetto and M Stephens. Flexible signal denoising via flexible empirical Bayes shrinkage. Journal of Machine Learning Research 22(93): 1-28. arXiv preprint | R package | accompanying code and data

A E White, K K Dey, M Stephens and T D Price. Dispersal syndromes drive the formation of biogeographical regions, illustrated by the case of Wallace’s Line. Global Ecology and Biogeography 30: 685–696. accompanying code and data

M C Ward, N E Banovich, A Sarkar, M Stephens and Yoav Gilad. Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes. eLife 10: e57345.


Z Zhang, K Luo, Z Zou, M Qiu, J Tian, L Sieh, H Shi, Y Zou, G Wang, J Morrison, A C Zhu, M Qiao, Z Li, M Stephens, X He and C He. Genetic analyses support the contribution of mRNA N6-methyladenosine (m6A) modification to human disease heritability. Nature Genetics 52: 939–949.

S Kim-Hellmuth et al. Cell type–specific genetic regulation of gene expression across human tissues. Science 369(6509): eaaz8528.

L Jiang et al. A quantitative proteome map of the human body. Cell 183(1): 269-283.e19.

J Morrison, N Knoblauch, J Marcus, M Stephens and X He. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nature Genetics 52: 740–747. bioRxiv preprint | code

C J Hsiao, P Tung, J D Blischak, J Burnett, K Barr, K K Dey, M Stephens and Y Gilad. Characterizing and inferring quantitative cell cycle phase in single-cell RNA-seq data analysis. Genome Biology 30: 611-621. bioRxiv preprint | R package

G Wang, A Sarkar, P Carbonetto and M Stephens. A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society, Series B 82: 1273-1300. R package | accompanying code and data resources

Y Kim, P Carbonetto, M Stephens and M Anitescu. A fast algorithm for maximum likelihood estimation of mixture proportions using sequential quadratic programming. Journal of Computational and Graphical Statistics 29: 261-273. arXiv preprint | accompanying code resources | R package


J D Blischak, P Carbonetto and M Stephens. Creating and sharing reproducible research code the workflowr way. F1000Research 8: 1749 [version 1; peer review: 3 approved]. R package on CRAN | R package on GitHub

M Lu and M Stephens. Empirical Bayes estimation of normal means, accounting for uncertainty in estimated standard errors. arXiv:1901.10679. accompanying code and data

D Gerard and M Stephens. Unifying and generalizing methods for removing unwanted variation based on negative controls. Statistica Sinica forthcoming. arXiv preprint | R package | code used to produce results in paper

M C Turchin and M Stephens. Bayesian multivariate reanalysis of large genetic studies identifies many new associations. PLoS Genetics 15(10): e1008431. bioRxiv preprint | R package

S Zhao, J Liu, P Nanga, Y Liu, A E Cicek, N Knoblauch, C He, M Stephens and X He. Detailed modeling of positive selection improves detection of cancer driver genes. Nature Communications 10: 3399. bioRxiv preprint | accompanying code and data resources

A Sarkar, P-Y Tung, J D Blischak, J E Burnett, Y I Li, M Stephens and Y Gilad. Discovery and characterization of variance QTLs in human induced pluripotent stem cells. PLoS Genetics 15(4): e1008045. accompanying code and data

H Al-Asadi, K K Dey, J Novembre and M Stephens. Inference and visualization of DNA damage patterns using a grade of membership model. Bioinformatics 35(8): 1292-1298. R package

A E White, K K Dey, D Mohan, M Stephens and T D Price. Regional influences on community structure across the tropical-temperate divide. Nature Communications 10: 2646. R package

H Al-Asadi, D Petkova, M Stephens and J Novembre. Estimating recent migration and population size surfaces. PLoS Genetics 15(1): e1007908. software

S M Urbut, G Wang, P Carbonetto and M Stephens. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature Genetics 51(1): 187-195. bioRxiv preprint | R package | accompanying code and data resources