Skip to Main Content

Modeling

ABESS

Best-subset selection aims to find a small subset of predictors, so that the resulting linear model is expected to have the most desirable prediction accuracy. ABESS is the R-package that implements a polynomial time algorithm to identify the best-subset model in linear regression.

Faculty: Heping Zhang, PhD

Download: ABESS package

Platform: R

Reference: doi.org (ABESS)


Bweight

Bayesian Modeling of Cancer Outcomes Using Genetic Variables Assisted by Pathological Imaging Data

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / Bweight package

Platform: Julia


Bootpenal

Bootstrap penalization for variable selection with big data.

Faculty: Shuangge Steven Ma, PhD

Download: Bootpenal package

Platform: R

Reference: doi.org (Bootpenal)


dame-gideon

"The goal of this project is to come up with a reasonable way to predict pathogen sharing patterns between countries, using the DAME (dynamic additive and multiplicative effects model) for dyadic networks.

This follows on two previous preprints:
Poisot, T., Nunn, C., & Morand, S. (2014). Ongoing worldwide homogenization of human pathogens. bioRxiv, 009977.
Dallas, T. A., Carlson, C. J., & Poisot, T. (2018). Leveraging pathogen community distributions to understand outbreak and emergence potential. bioRxiv, 336065.
The GIDEON data is available as a list of countries, years, and pathogens. The goal is currently to come up with appropriate dyadic predictors for pairs of countries."

Faculty: Colin J. Carlson, PhD

Download: GitHub / dame-gideon package

Platform: R

Reference: doi.org (dame-gideon)


eLASSO

This is a Matlab implementation of eLASSO. This zipped file contains 14 M-files, 7 of them are related to the optimization problem, which are translated from the block coordinate gradient descent(BGCD) method proposed by Paul Tseng and Sangwoon Yun. They are cgdsq.m, dirq.m, nz.m, signx.m, fnc.m, and grad.m. This program implements the method described in: Wang X., Jiang Y., Huang M., and Zhang H. Robust Variable Selection with Exponential Squared Loss. Journal of the American Statistical Association, 108: 632-643, 2013.

Faculty: Heping Zhang, PhD

Download: eLASSO package

Platform: Matlab

Reference: doi.org (eLASSO)


elton

Bayesian joint species distribution models with networks.

Faculty: Colin J. Carlson, PhD

Download: GitHub / elton package

Platform: R


embarcadero

This package is basically a wrapper around 'dbarts' with a few tools: basic model summary statistics and diagnostics, spatial prediction with raster data, credible interval draws from the posterior distribution, visualization of how posterior draws learn over time, variable importance measures and plots, stepwise variable elimination, automatic Nice Plots for partials, including multiple ways to visualize posterior draws spatial projection of partials (""spartials""), compatibility with random intercept BART models (riBART), plots for random intercepts

In future versions I'd hope to include compatibility with: explicitly-spatial adaptations of BART (spatial priors), compatibility with smoothed BART models (softBART) and sparse BART models with Dirichlet priors (DART)"

Faculty: Colin J. Carlson, PhD

Download: GitHub / embarcadero package

Platform: R


epizootic

epizootic is an extension to poems, a spatially-explicit, process-explicit, pattern-oriented framework for modeling population dynamics. This extension adds functionality for modeling disease dynamics in wildlife. It also adds capability for seasonality and for unique dispersal dynamics for each life cycle stage.

Faculty: Colin J. Carlson, PhD

Download: Cran R / epizootic package

Platform: R


ExcessILI

ExcessILI facilitates formatting line list data from syndromic surveillance datasets into time series, and enables analysis of these data to detect increases in reporting above the seasonal baseline. For US data, there is an option to automatically adjust the data for state-specific flu activity (using data from NREVSS and/or state-specific RSV activity (based on Google search volume). The user can either start with line list data, or formatted time series data. An rshiny app is provided to examine data products.

Faculty: Daniel Weinberger, PhD

Download: GitHub / ExcessILI package

Platform: R


Fabs

Penalization is a popular tool for multi- and high-dimensional data. Most of the existing computational algorithms have been developed for convex loss functions. Nonconvex loss functions can sometimes generate more robust results and have important applications. Motivated by the BLasso algorithm, this study develops the Forward and Backward Stagewise (Fabs) algorithm for nonconvex loss functions with the adaptive Lasso (aLasso) penalty. It is shown that each point along the Fabs paths is an approximate solution to the aLasso problem and the Fabs paths converge to the stationary points of the aLasso problem when goes to zero, given that the loss function has second-order derivatives bounded from above.

Faculty: Shuangge Steven Ma, PhD; Yuan Huang, PhD

Download: GitHub / Fabs package

Platform: R

Reference: doi.org (Fabs)


fmrGI

In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / fmrGI package

Platform: R

Reference: doi.org (fmrGI)


greHD

For complex diseases, the interactions between genetic and environmental risk factors can have important implications beyond the main effects. Many of the existing interaction analyses conduct marginal analysis and cannot accommodate the joint effects of multiple main effects and interactions. In this study, we conduct joint analysis which can simultaneously accommodate a large number of effects. Significantly different from the existing studies, we adopt loss functions based on relative errors, which offer a useful alternative to the "classic" methods such as the least squares and least absolute deviation. Further to accommodate censoring in the response variable, we adopt a weighted approach. Penalization is used for identification and regularized estimation. Computationally, we develop an effective algorithm which combines the majorize-minimization and coordinate descent. Simulation shows that the proposed approach has satisfactory performance. We also analyze lung cancer prognosis data with gene expression measurements.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / greHD package

Platform: Matlab

Reference: doi.org (greHD)


HierNetwork

1: Function.R: including a function that implements the proposed model.

2: Simulation_settings.R: including codes for generating simulated data.

3: case_study.R: performing our method on the LUAD dataset and visualizing results.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / HierNetwork package

Platform: R

Reference: doi.org (HierNetwork)


mdpd

This package estimates coefficients of a high-dimensional linear regression model. Significantly different from the existing studies, we adopt loss functions based on minimum density power divergence (MDPD) criteria. Multiple published studies have shown that this approach outperforms alternatives under low dimensional situations, especially when normality assumption is violated. We extend this method to a high dimensional situation and also observe the robust performance. Penalization is used for identification and regularized estimation. Computationally, we develop an effective algorithm which utilizes the coordinate descent. Simulation shows that the proposed approach has satisfactory performance.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / mdpd package

Platform: Matlab

Reference: doi.org (mdpd)


miselect

Penalized regression methods, such as lasso and elastic net, are used in many biomedical applications when simultaneous regression coefficient estimation and variable selection is desired. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors, making it difficult to ascertain a final active set without resorting to ad hoc combination rules. 'miselect' presents Stacked Adaptive Elastic Net (saenet) and Grouped Adaptive LASSO (galasso) for continuous and binary outcomes, developed by Du et al (2022) <doi:10.1080/10618600.2022.2035739>. They, by construction, force selection of the same variables across multiply imputed data. 'miselect' also provides cross validated variants of these methods.

Faculty: Bhramar Mukherjee, PhD

Download: Cran R / miselect package

Platform: R

Reference: doi.org (miselect)


ModelDB

ModelDB provides an accessible location for storing and efficiently retrieving computational neuroscience models. Models in ModelDB can be coded in any language for any environment. Model code can be viewed before downloading, and browsers can be set to auto-launch the models.

Faculty: Robert A McDougal, PhD

Access: ModelDB Website

Platform: Website

Reference: doi.org (ModelDB)


NEURON

NEURON is a simulator for neurons and networks of neurons that runs efficiently on your local machine, in the cloud, or on an HPC. Build and simulate models using Python, HOC, and/or NEURON’s graphical interface.

Faculty: Robert A McDougal, PhD

Download: GitHub / NEURON package

Platform: Python

Reference: doi.org (NEURON)


Nowcaster

Every single system of notification has an intrinsic delay between the date of onset of the event and the date of report. nowcaster can estimate how many counts of any epidemiological data of interest (i.e., daily cases and deaths counts) by fitting a negative binomial model to the time steps of delay between onset date of the event, (i.e., date of first symptoms for cases or date of occurrence of death) and the date of report (i.e., date of notification of the case or death).

Faculty: Colin J. Carlson, PhD

Download: GitHub / Nowcaster package

Platform: R


pLASSO

pLASSO is a statistical method which incorporates prior information into the L1 penalized generalized linear models. We distribute here two R functions (function_linear.R and function_logistic.R) related to pLASSO. These two functions are for linear regression and logistic regression, respectively. Both functions can find all six estimators compared in Jiang, He, and Zhang (2014), i.e., LASSO, p, pLASSO; LASSO-A, p-A, pLASSO-A. The functions use cross validation to select the optimal tuning parameters. See the following paper for more details. Jiang Y, He Y, and Zhang H. (2014). Variable selection with prior information for generalized linear models via the pLASSO method.

Faculty: Heping Zhang, PhD

Download: pLASSO package

Platform: R

Reference: doi.org (pLASSO)


robHD

This is a function to compute the regression coefficients for robust penalized regression. The exponential squared loss function is used to provide the robustness, and penalization is employed to enforce the sparsity of estimators.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / robHD package

Platform: R


smog

Structural Modeling by using Overlapped Group Penalty. Fits a linear non-penalized phenotype (demographic) variables and penalized groups of prognostic effect and predictive effect, by satisfying such hierarchy structures that if a predictive effect exists, its prognostic effect must also exist. This package can deal with continuous, binomial or multinomial, and survival response variables, underlying the assumption of Gaussian, binomial (multinomial), and Cox proportional hazard models, respectively. It is implemented by combining the iterative shrinkage-thresholding algorithm and the alternating direction method of multipliers algorithms. The main method is built in C++, and the complementary methods are written in R.

Faculty: Shuangge Steven Ma, PhD

Download: Cran R / smog package

Platform: R

Reference: doi.org (smog)


snif

snif is a R package that implements "Selection of Nonlinear Interactions by a Forward Stepwise Algorithm". snif is currently in the middle of being tested and polished, and as such it is BETA software.

Faculty: Bhramar Mukherjee, PhD

Download: GitHub / snif package

Platform: R

Reference: doi.org (snif)


sure

An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017) <doi:10.1080/01621459.2017.1292915>. These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an 'autoplot' function for producing standard diagnostic plots using 'ggplot2' graphics. The package currently supports cumulative link models from packages 'MASS', 'ordinal', 'rms', and 'VGAM'. Support for binary regression models using the standard 'glm' function is also available.

Faculty: Heping Zhang, PhD

Download: CranR / sure package

Platform: R

Reference: doi.org (sure)


svycdiff

Estimates the population average controlled difference for a given outcome between levels of a binary treatment, exposure, or other group membership variable of interest for clustered, stratified survey samples where sample selection depends on the comparison group. Provides three methods for estimation, namely outcome modeling and two factorizations of inverse probability weighting. Under stronger assumptions, these methods estimate the causal population average treatment effect. Salerno et al., (2024) <doi:10.48550/arXiv.2406.19597>.

Faculty: Bhramar Mukherjee, PhD

Download: Cran R / svycdiff package

Platform: R

Reference: doi.org (svycdiff)


SynDI

There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology, where the goal is to fit a target regression model with all available predictors in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each external population, uses stacked multiple imputation to create a long dataset with complete covariate information. The final analysis of the stacked imputed data is conducted by weighted regression. This flexible and unified approach can improve statistical efficiency of the estimated coefficients in the internal study, improve predictions by utilizing even partial information available from models that use a subset of the full set of covariates used in the internal study, and provide statistical inference for the external population with potentially different covariate effects from the internal population.

Faculty: Bhramar Mukherjee, PhD

Download: GitHub / SynDI package

Platform: R

Reference: doi.org (SynDI)