Skip to Main Content

Leying Guan, PhD

msCCA

Faculty: Leying Guan, PhD

Download: GitHub / msCCA package

Platform: R


PREGS

A conformal test of non-zero coefficient in linear models via permutation-augmentation.

Faculty: Leying Guan, PhD

Download: GitHub / PREGS package

Platform: R

Reference: doi.org (PREGS)


SPACO

As immunological and clinical studies become more complex, there is an increasing need to analyze temporal immunophenotypes alongside demographic and clinical covariates, where each subject receives matrix-valued time series observations for potentially high-dimensional longitudinal features, as well as other static characterizations. Researchers aim to find the low-dimensional embedding of subjects using matrix-valued time series observations and investigate relationships between static clinical responses and the embedding. However, constructing these embeddings can be challenging due to high dimensionality, sparsity, and irregularity in sample collection over time. In addition, the incorporation of static auxiliary covariates is frequently desired during such a construction. To address these issues, we propose a smoothed probabilistic PARAFAC model with covariates (SPACO) that uses auxiliary covariates of interest. We provide extensive simulations to test different aspects of SPACO and demonstrate its application to an immunological dataset from patients with SARS-CoV-2 infection.

Faculty: Leying Guan, PhD

Download: GitHub / SPACO package

Platform: Python

Reference: doi.org (SPACO)


LCP

We propose a new inference framework called localized conformal prediction. It generalizes the framework of conformal prediction by offering a single-test-sample adaptive construction that emphasizes a local region around this test sample, and can be combined with different conformal score constructions. The proposed framework enjoys an assumption-free finite sample marginal coverage guarantee, and it also offers additional local coverage guarantees under suitable assumptions. We demonstrate how to change from conformal prediction to localized conformal prediction using several conformal scores, and we illustrate a potential gain via numerical examples.

Faculty: Leying Guan, PhD

Download: GitHub / LCP package

Platform: R

Reference: doi.org (LCP)


BCOPS

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set C(x) as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers x, for which the method returns no prediction (corresponding to C(x) equal to the empty set). The proposed method combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given method. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

Faculty: Leying Guan, PhD

Download: GitHub / BCOPS package

Platform: R

Reference: doi.org (BCOPS)


NextDoor

We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are "close" to the chosen "base model," and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure "Next-Door analysis" since it examines models "next" to the base model. It can be applied to supervised learning problems with ℓ1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library.

Faculty: Leying Guan, PhD

Download: GitHub / NextDoor package

Platform: R

Reference: doi.org (NextDoor)


ITEB

Iterated & truncated empirical bayes for strong signal detection (ITEB) is a modified two-group model where the null group corresponds to genes which are not direct targets, but can have small non-zero effects.

Faculty: Leying Guan, PhD

Download: GitHub / ITEB package

Platform: R

Reference: doi.org (ITEB)


SPEAR

Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. Spear is a sparse supervised bayesian factor model for multi-omics analysis.

Faculty: Leying Guan, PhD

Download: Bitbucket / SPEAR package

Platform: R

Reference: doi.org (SPEAR)


puddlr

puddlr is a general-purpose set of tools for the analysis of datasets with relatively few observations compared to the total number of features. These data sets are often called "shallow" and "wide", which is the inspiration for the "puddlr" name.

Faculty: Leying Guan, PhD

Download: GitHub / puddlr package

Platform: R

Reference: doi.org (puddlr)


PAC

Implements partition-assisted clustering and multiple alignments of networks. It 1) utilizes partition-assisted clustering to find robust and accurate clusters and 2) discovers coherent relationships of clusters across multiple samples. It is particularly useful for analyzing single-cell data set.

Faculty: Leying Guan, PhD

Download: Cran R / PAC package

Platform: R

Reference: doi.org (PAC)