Skip to Main Content

Shuangge (Steven) Ma, PhD

GEInfo

Realize three approaches for Gene-Environment interaction analysis. All of them adopt Sparse Group Minimax Concave Penalty to identify important G variables and G-E interactions, and simultaneously respect the hierarchy between main G and G-E interaction effects. All the three approaches are available for Linear, Logistic, and Poisson regression. Also realize to mine and construct prior information for G variables and G-E interactions.

Faculty: Shuangge Steven Ma, PhD. Download: Cran R / GEInfo Package

Platform: R. Release Date: less than 5yrs

Reference: doi.org (GEInfo)


NCutYX

Omics data come in different forms: gene expression, methylation, copy number, protein measurements and more. 'NCutYX' allows clustering of variables, of samples, and both variables and samples (biclustering), while incorporating the dependencies across multiple types of Omics data.

Faculty: Shuangge Steven Ma, PhD

Download: Cran R / NCutYX package

Platform: R

Reference: doi.org (NCutYX)


HeteroGGM

The goal of this package is to user-friendly realizing Gaussian graphical model-based heterogeneity analysis. Recently, several Gaussian graphical model-based heterogeneity analysis techniques have been developed. A common methodological limitation is that the number of subgroups is assumed to be known a priori, which is not realistic. In a very recent study (Ren et al., 2022), a novel approach based on the penalized fusion technique is developed to fully data-dependently determine the number and structure of subgroups in Gaussian graphical model-based heterogeneity analysis. It opens the door for utilizing the Gaussian graphical model technique in more practical settings. Beyond Ren et al. (2022), more estimations and functions are added, so that the package is self-contained and more comprehensive and can provide “more direct” insights to practitioners (with the visualization function).

Faculty: Shuangge Steven Ma, PhD

Download: Cran R / HeteroGGM package

Platform: R

Reference: doi.org (HeteroGGM)


iSFun

The implement of integrative analysis methods based on a two-part penalization, which realizes dimension reduction analysis and mining the heterogeneity and association of multiple studies with compatible designs. The software package provides the integrative analysis methods including integrative sparse principal component analysis (Fang et al., 2018), integrative sparse partial least squares (Liang et al., 2021) and integrative sparse canonical correlation analysis, as well as corresponding individual analysis and meta-analysis versions.

Faculty: Shuangge Steven Ma, PhD

Download: Cran R / iSFun package

Platform: R

Reference: doi.org (iSFun)


GEInter

For the risk, progression, and response to treatment of many complex diseases, it has been increasingly recognized that gene-environment interactions play important roles beyond the main genetic and environmental effects. In practical interaction analyses, outliers in response variables and covariates are not uncommon. In addition, missingness in environmental factors is routinely encountered in epidemiological studies. The developed package consists of five robust approaches to address the outliers problems, among which two approaches can also accommodate missingness in environmental factors. Both continuous and right censored responses are considered. The proposed approaches are based on penalization and sparse boosting techniques for identifying important interactions, which are realized using efficient algorithms. Beyond the gene-environment analysis, the developed package can also be adopted to conduct analysis on interactions between other types of low-dimensional and high-dimensional data.

Faculty: Shuangge Steven Ma, PhD.

Download: Cran R / GEInter Package

Platform: R.

Reference: doi.org (GEInter)


FunctanSNP

An implementation of revised functional regression models for multiple genetic variation data, such as single nucleotide polymorphism (SNP) data, which provides revised functional linear regression models, partially functional interaction regression analysis with penalty-based techniques and corresponding drawing functions, etc.

Faculty: Shuangge Steven Ma, PhD. Download: Cran R / FunctanSNP Package

Platform: R. Release Date: less than 5yrs

Reference: doi.org (FunctanSNP)


Bweight

Bayesian Modeling of Cancer Outcomes Using Genetic Variables Assisted by Pathological Imaging Data

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / Bweight package

Platform: Julia


CrossNetworks

Network-based analysis for cross-platform communications.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / CrossNetworks package

Platform: R


ANNI

Aligned Deep Neural Network for Integrative Analysis with High-dimensional Input.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / ANNI package

Platform: Python


HierNetwork

1: Function.R: including a function that implements the proposed model.

2: Simulation_settings.R: including codes for generating simulated data.

3: case_study.R: performing our method on the LUAD dataset and visualizing results.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / HierNetwork package

Platform: R

Reference: doi.org (HierNetwork)


fmrGI

In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / fmrGI package

Platform: R

Reference: doi.org (fmrGI)


AWNCut

With the regulation relationships, regulators contain important information on the properties of GEs. We develop a novel assisted clustering method, which effectively uses regulator information to improve clustering analysis using GE data. To account for the fact that not all GEs are informative, we propose a weighted strategy, where the weights are determined data-dependently and can discriminate informative GEs from noises. The proposed method is built on the NCut technique and effectively realized using a simulated annealing algorithm.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / AWNCut package

Platform: R

Reference: doi.org (AWNCut)


CQPCorr

Gene-environment (G-E) interactions have important implications for the etiology and progression of many complex diseases. Compared to continuous markers and categorical disease status, prognosis has been less investigated, with the additional challenges brought by the unique characteristics of survival outcomes. Most of the existing G-E interaction approaches for prognosis data share the limitation that they cannot accommodate long-tailed or contaminated outcomes. In this study, for prognosis data, we develop a robust G-E interaction identification approach using the censored quantile partial correlation (CQPCorr) technique. The proposed approach is built on the quantile regression technique (and hence has a solid statistical basis), uses weights to easily accommodate censoring, and adopts partial correlation to identify important interactions while properly controlling for the main genetic and environmental effects. In simulation, it outperforms multiple competitors with more accurate identification. In the analysis of TCGA data on lung cancer and melanoma, biologically sensible findings different from using the alternatives are made.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / CQPCorr package

Platform: R

Reference: doi.org (CQPCorr)


PTensor

Gene-gene (G×G) interactions have been shown to be critical for the fundamental mechanisms and development of complex diseases beyond main genetic effects. The commonly adopted marginal analysis is limited by considering only a small number of G factors at a time. With the “main effects, interactions” hierarchical constraint, many of the existing joint analysis methods suffer from prohibitively high computational cost. In this study, we propose a new method for identifying important G×G interactions under joint modeling. The proposed method adopts tensor regression to accommodate high data dimensionality and the penalization technique for selection. It naturally accommodates the strong hierarchical structure without imposing additional constraints, making optimization much simpler and faster than in the existing studies. It outperforms multiple alternatives in simulation. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer and melanoma demonstrates that it can identify markers with important implications and better prediction performance.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / PTensor package

Platform: R

Reference: doi.org (PTensor)


Fabs

Penalization is a popular tool for multi- and high-dimensional data. Most of the existing computational algorithms have been developed for convex loss functions. Nonconvex loss functions can sometimes generate more robust results and have important applications. Motivated by the BLasso algorithm, this study develops the Forward and Backward Stagewise (Fabs) algorithm for nonconvex loss functions with the adaptive Lasso (aLasso) penalty. It is shown that each point along the Fabs paths is an approximate solution to the aLasso problem and the Fabs paths converge to the stationary points of the aLasso problem when goes to zero, given that the loss function has second-order derivatives bounded from above.

Faculty: Shuangge Steven Ma, PhD; Yuan Huang, PhD

Download: GitHub / Fabs package

Platform: R

Reference: doi.org (Fabs)


ARMI

Gene expression studies have been playing a critical role in cancer research. Despite tremendous effort, the analysis results are still often unsatisfactory, because of the weak signals and high data dimensionality. Analysis is often further challenged by the long-tailed distributions of the outcome variables. In recent multidimensional studies, data have been collected on gene expressions as well as their regulators (for example, copy number alterations, methylation, and microRNAs), which can provide additional information on the associations between gene expressions and cancer outcomes. In this study, we develop an ARMI (Assisted Robust Marker Identification) approach for analyzing cancer studies with measurements on gene expressions as well as regulators. The proposed approach borrows information from regulators and can be more effective than analyzing gene expression data alone. A robust objective function is adopted to accommodate long-tailed distributions. Marker identification is effectively realized using penalization. The proposed approach has an intuitive formulation and is computationally much affordable.

Faculty: Shuangge Steven Ma, PhD; Yuan Huang, PhD

Download: GitHub / ARMI package

Platform: R

Reference: doi.org (ARMI)


mdpd

This package estimates coefficients of a high-dimensional linear regression model. Significantly different from the existing studies, we adopt loss functions based on minimum density power divergence (MDPD) criteria. Multiple published studies have shown that this approach outperforms alternatives under low dimensional situations, especially when normality assumption is violated. We extend this method to a high dimensional situation and also observe the robust performance. Penalization is used for identification and regularized estimation. Computationally, we develop an effective algorithm which utilizes the coordinate descent. Simulation shows that the proposed approach has satisfactory performance.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / mdpd package

Platform: Matlab

Reference: doi.org (mdpd)


group_lapl

SGLS implements penalization method for integrative analysis of multiple high-throughput cancer prognosis studies incorporating network structures. This method is based on a combination of the group MCP penalty and a Laplacian penalty. The group MCP is adopted for gene selection and Laplacian penalty is applied to smooth the differences between regression coefficients of tightly-connected genes.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / group_lapl package

Platform: R

Reference: doi.org (group_lapl)


sglasso

SGL implements penalization method for group variable selection which can properly accommodate the correlation between adjacent groups. This method is based on a combination of the group Lasso penalty and a quadratic penalty on the difference of regression coefficients of adjacent groups. It encourages group sparsity and smoothes regression coefficients for adjacent groups. Canonical correlations are applied to the weights between groups in the quadratic difference penalty.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / sglasso package

Platform: R

Reference: doi.org (sglasso)


greHD

For complex diseases, the interactions between genetic and environmental risk factors can have important implications beyond the main effects. Many of the existing interaction analyses conduct marginal analysis and cannot accommodate the joint effects of multiple main effects and interactions. In this study, we conduct joint analysis which can simultaneously accommodate a large number of effects. Significantly different from the existing studies, we adopt loss functions based on relative errors, which offer a useful alternative to the "classic" methods such as the least squares and least absolute deviation. Further to accommodate censoring in the response variable, we adopt a weighted approach. Penalization is used for identification and regularized estimation. Computationally, we develop an effective algorithm which combines the majorize-minimization and coordinate descent. Simulation shows that the proposed approach has satisfactory performance. We also analyze lung cancer prognosis data with gene expression measurements.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / greHD package

Platform: Matlab

Reference: doi.org (greHD)


robHD

This is a function to compute the regression coefficients for robust penalized regression. The exponential squared loss function is used to provide the robustness, and penalization is employed to enforce the sparsity of estimators.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / robHD package

Platform: R


Bootpenal

Bootstrap penalization for variable selection with big data.

Faculty: Shuangge Steven Ma, PhD

Download: Bootpenal package

Platform: R

Reference: doi.org (Bootpenal)


smog

Structural Modeling by using Overlapped Group Penalty. Fits a linear non-penalized phenotype (demographic) variables and penalized groups of prognostic effect and predictive effect, by satisfying such hierarchy structures that if a predictive effect exists, its prognostic effect must also exist. This package can deal with continuous, binomial or multinomial, and survival response variables, underlying the assumption of Gaussian, binomial (multinomial), and Cox proportional hazard models, respectively. It is implemented by combining the iterative shrinkage-thresholding algorithm and the alternating direction method of multipliers algorithms. The main method is built in C++, and the complementary methods are written in R.

Faculty: Shuangge Steven Ma, PhD

Download: Cran R / smog package

Platform: R

Reference: doi.org (smog)