Statistics
AWNCut
Copy Link
With the regulation relationships, regulators contain important information on the properties of GEs. We develop a novel assisted clustering method, which effectively uses regulator information to improve clustering analysis using GE data. To account for the fact that not all GEs are informative, we propose a weighted strategy, where the weights are determined data-dependently and can discriminate informative GEs from noises. The proposed method is built on the NCut technique and effectively realized using a simulated annealing algorithm.
Faculty: Shuangge Steven Ma, PhD
Download: GitHub / AWNCut package
Platform: R
Reference: doi.org (AWNCut)
bama
Copy Link
Perform mediation analysis in the presence of high-dimensional mediators based on the potential outcome framework. Bayesian Mediation Analysis (BAMA), developed by Song et al (2019) <doi:10.1111/biom.13189> and Song et al (2020) <doi:10.48550/arXiv.2009.11409>, relies on two Bayesian sparse linear mixed models to simultaneously analyze a relatively large number of mediators for a continuous exposure and outcome assuming a small number of mediators are truly active. This sparsity assumption also allows the extension of univariate mediator analysis by casting the identification of active mediators as a variable selection problem and applying Bayesian methods with continuous shrinkage priors on the effects.
Faculty: Bhramar Mukherjee, PhD
Download: Cran R / bama package
Platform: R
Reference: doi.org (bama)
bayesgm
Copy Link
bayesgm is a unified framework for Bayesian generative modeling that combines deep neural networks with principled probabilistic inference. It provides tools for learning latent representations, causal inference, conditional inference with quantifying uncertainty under complex settings. It provides both Python and R packages.
Faculty: Qiao Liu, PhD
Download: Liu Lab / bayesgm package
Platform: Python
Reference: arxiv.org (bayesgm)
CIMPLE
Copy Link
Analyzes longitudinal Electronic Health Record (EHR) data with possibly informative observational time. These methods are grouped into two classes depending on the inferential task. One group focuses on estimating the effect of an exposure on a longitudinal biomarker while the other group assesses the impact of a longitudinal biomarker on time-to-diagnosis outcomes. The accompanying paper is Du et al (2024) <doi:10.48550/arXiv.2410.13113>.
Faculty: Bhramar Mukherjee, PhD
Download: Cran R / CIMPLE package
Platform: R
Reference: doi.org (CIMPLE)
codependent
Copy Link
An R package for estimating affiliate species richness based on power law scaling with host diversity, using rarefaction on bipartite species association networks. Because some things just go together. Use the function copredict to extrapolate power law curves out to a higher value. Use copredict.ci to fit a series of models to only half of the total curve, and see what happens (for an overestimated confidence bound).
Faculty: Colin J. Carlson, PhD
Download: GitHub / codependent package
Platform: R
CompMix
Copy Link
Quantitative characterization of the health impacts associated with exposure to chemical mixtures has received considerable attention in current environmental and epidemiological studies. 'CompMix' package allows practitioners to estimate the health impacts from exposure to chemical mixtures data through various statistical approaches, including Lasso, Elastic net, Bayeisan kernel machine regression (BKMR), hierNet, Quantile g-computation, Weighted quantile sum (WQS) and Random forest. Hao W, Cathey A, Aung M, Boss J, Meeker J, Mukherjee B. (2024) "Statistical methods for chemical mixtures: a practitioners guide". <doi:10.1101/2024.03.03.24303677>.
Faculty: Bhramar Mukherjee, PhD
Download: Cran R / CompMix package
Platform: R
Reference: doi.org (CompMix)
CorBin
Copy Link
We design algorithms with linear time complexity with respect to the dimension for three commonly studied correlation structures, including exchangeable, decaying-product and K-dependent correlation structures, and extend the algorithms to generate binary data of general non-negative correlation matrices with quadratic time complexity.
Faculty: Hongyu Zhao, PhD
Download: Cran R / CorBin Package
Platform: R
Reference: doi.org (CorBin)
hdmed
Copy Link
A suite of functions for performing mediation analysis with high-dimensional mediators. In addition to centralizing code from several existing packages for high-dimensional mediation analysis, we provide organized, well-documented functions for a handle of methods which, though programmed their original authors, have not previously been formalized into R packages or been made presentable for public use. The methods we include cover a broad array of approaches and objectives, and are described in detail by both our companion manuscript—"Methods for Mediation Analysis with High-Dimensional DNA Methylation Data: Possible Choices and Comparison"—and the original publications that proposed them. The specific methods offered by our package include the Bayesian sparse linear mixed model (BSLMM) by Song et al. (2019); high-dimensional mediation analysis (HDMA) by Gao et al. (2019); high-dimensional multivariate mediation (HDMM) by Chén et al. (2018); high-dimensional linear mediation analysis (HILMA) by Zhou et al. (2020); high-dimensional mediation analysis (HIMA) by Zhang et al. (2016); latent variable mediation analysis (LVMA) by Derkach et al. (2019); mediation by fixed-effect model (MedFix) by Zhang (2021); pathway LASSO by Zhao & Luo (2022); principal component mediation analysis (PCMA) by Huang & Pan (2016); and sparse principal component mediation analysis (SPCMA) by Zhao et al. (2020). Citations for the corresponding papers can be found in their respective functions.
Faculty: Bhramar Mukherjee, PhD
Download: Cran R / hdmed package
Platform: R
Reference: doi.org (hdmed)
lodi
Copy Link
Impute observed values below the limit of detection (LOD) via censored likelihood multiple imputation (CLMI) in single-pollutant models, developed by Boss et al (2019) <doi:10.1097/EDE.0000000000001052>. CLMI handles exposure detection limits that may change throughout the course of exposure assessment. 'lodi' provides functions for imputing and pooling for this method.
Faculty: Bhramar Mukherjee, PhD
Download: Cran R / lodi package
Platform: R
Reference: doi.org (lodi)
medScan
Copy Link
A collection of methods for large scale single mediator hypothesis testing. The six included methods for testing the mediation effect are Sobel's test, Max P test, joint significance test under the composite null hypothesis, high dimensional mediation testing, divide-aggregate composite null test, and Sobel's test under the composite null hypothesis. Du et al (2023) <doi:10.1002/gepi.22510>.
Faculty: Bhramar Mukherjee, PhD
Download: Cran R / medScan package
Platform: R
Reference: doi.org (medScan)
MMBeans
Copy Link
Faculty: Yize Zhao, PhD, Hongyu Zhao, PhD
Download: GitHub / MMBeans package
Platform: R
Reference: doi.org (MMBeans)
r-hpc-reproducible-repo
Copy Link
The goal of this repository is to provide a reproducible R modelling pipeline that runs an analysis geared towards a Bayesian statistical modeling framework, using the targets package for reproducible workflows. The pipeline is designed to be run in a Docker container, which can be executed locally or on a high-performance computing (HPC) cluster. It is only Bayesian in the sense that it uses rstanarm for model fitting, but the pipeline can be adapted to other modelling frameworks like NIMBLE or brms. Rstan/rstanarm is used here as an example since it requires a C++ compiler and is a common choice for Bayesian modelling in R. More on that later.
Faculty: Colin J. Carlson, PhD
Download: GitHub / r-hpc-reproducible-repo package
Platform: R
Roundtrip
Copy Link
Roundtrip is a deep generative neural density estimator which exploits the advantage of GANs for generating samples and estimates density by either importance sampling or Laplace approximation. This repository provides source code and instructions for using Roundtrip on both simulation data and real data.
Faculty: Qiao Liu, PhD
Download: Liu Lab / Roundtrip package
Platform: Python
Reference: doi.org (Roundtrip)