Skip to Main Content

Genetics

BayesMEModel

A Bayesian Approach to Correcting the Attenuation Bias of Regression Using Polygenic Risk Score.

Faculty: Hongyu Zhao, PhD

Download: GitHub / BayesMEModel package

Platform: R

Reference: doi.org (BayesMEModel)


BV-LDER-GE

BV-LDER-GE harnesses both correlations with additive genetic effects and full LD information to enhance the statistical power to detect genome-scale G E interactions.

Faculty: Hongyu Zhao, PhD

Download: BV-LDER-GE package

Platform: R


CASE

CASE is an R package designed for multi-trait fine-mapping analysis, with a particular focus on single-cell eQTL fine-mapping.

Faculty: Hongyu Zhao, PhD

Download: GitHub / CASE package

Platform: R


Composite-trait LDSC

Estimating correlation between composite phenotypes and traits.

Faculty: Hongyu Zhao, PhD

Download: GitHub / Composite-trait LDSC package

Platform: Python

Reference: doi.org (Composite-trait LDSC)


CQPCorr

Gene-environment (G-E) interactions have important implications for the etiology and progression of many complex diseases. Compared to continuous markers and categorical disease status, prognosis has been less investigated, with the additional challenges brought by the unique characteristics of survival outcomes. Most of the existing G-E interaction approaches for prognosis data share the limitation that they cannot accommodate long-tailed or contaminated outcomes. In this study, for prognosis data, we develop a robust G-E interaction identification approach using the censored quantile partial correlation (CQPCorr) technique. The proposed approach is built on the quantile regression technique (and hence has a solid statistical basis), uses weights to easily accommodate censoring, and adopts partial correlation to identify important interactions while properly controlling for the main genetic and environmental effects. In simulation, it outperforms multiple competitors with more accurate identification. In the analysis of TCGA data on lung cancer and melanoma, biologically sensible findings different from using the alternatives are made.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / CQPCorr package

Platform: R

Reference: doi.org (CQPCorr)


cWAS

cWAS is a statistical framework to identify cell types whose genetically regulated proportions are associated with complex diseases.

Faculty: Hongyu Zhao, PhD

Download: GitHub / cWAS package

Platform: R

Reference: journals.plos.org (cWAS)


EB-PRS

EB-PRS is a novel method that leverages information for effect sizes across all the markers to improve the prediction accuracy. No parameter tuning is needed in the method, and no external information is needed. This R-package provides the calculation of polygenic risk scores from the given training summary statistics and testing data. We can use EB-PRS to extract main information, estimate Empirical Bayes parameters, derive polygenic risk scores for each individual in testing data, and evaluate the PRS according to AUC and predictive r2.

Faculty: Hongyu Zhao, PhD

Download: Cran R / EB-PRS Package

Platform: R

Reference: doi.org (EB-PRS)


Geneverse

A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research.

Faculty: Hongyu Zhao, PhD

Download: GitHub / Geneverse package

Platform: Python

Reference: aclanthology.org (Geneverse)


GENJI

Estimating genetic correlation jointly using individual-level and summary-level GWAS data.

Faculty: Hongyu Zhao, PhD

Download: GitHub / GENJI package

Platform: Python

Reference: biorxiv.org (GENJI)


GenoCanyon

GenoCanyon is a whole-genome functional annotation approach based on unsupervised statistical learning. It integrates genomic conservation measures and biochemical annotation data to predict the functional potential at each nucleotide. More details about the method can be found in our PAPER.

Faculty: Hongyu Zhao, PhD.

Download: zhaocenter.com / GenoCanyon Package

Platform: R; RShiny

Reference: doi.org (GenoCanyon)


GenoSkyline and GenoSkyline Plus

GenoSkyline is a principled framework to predict tissue-specific functional regions through integrating high-throughput epigenomic annotations. Integrative analysis of GenoSkyline annotations with GWAS summary statistics could systematically identify biologically relevant tissue types and provide novel insights into the genetic basis of human complex traits.

Faculty: Hongyu Zhao, PhD

Download: zhaocenter.org / GenoSkyline Package

Platform: BED

Reference: doi.org (GenoSkyline) and doi.org (GenoSkyline Plus)


GPA

Realize three approaches for Gene-Environment interaction analysis. All of them adopt Sparse Group Minimax Concave Penalty to identify important G variables and G-E interactions, and simultaneously respect the hierarchy between main G and G-E interaction effects. All the three approaches are available for Linear, Logistic, and Poisson regression. Also realize to mine and construct prior information for G variables and G-E interactions.

Faculty: Hongyu Zhao, PhD

Download: GitHub / GPA Package

Platform: R and RStudio

Reference: doi.org (GPA)


HBI

A hierarchical Bayesian interaction model to estimate cell-type-specific methylation quantitative trait loci.

Faculty: Hongyu Zhao, PhD

Download: GitHub / HBI package

Platform: R

Reference: genomebiology.biomedcentral.com (HBI)


hicGAN

Hi-C is a genome-wide technology for investigating 3D chromatin conformation by measuring physical contacts between pairs of genomic regions. The resolution of Hi-C data directly impacts the effectiveness and accuracy of downstream analysis such as identifying topologically associating domains (TADs) and meaningful chromatin loops. High resolution Hi-C data are valuable resources which implicate the relationship between 3D genome conformation and function, especially linking distal regulatory elements to their target genes. However, high resolution Hi-C data across various tissues and cell types are not always available due to the high sequencing cost. It is therefore indispensable to develop computational approaches for enhancing the resolution of Hi-C data.

Faculty: Qiao Liu, PhD

Download: Liu Lab / hicGAN package

Platform: Python

Reference: doi.org (hicGAN)


JointPRS

A statistical model for multi-population PRS calculation.

Faculty: Hongyu Zhao, PhD

Download: GitHub / JointPRS package

Platform: Python


LDER-GE

LDER-GE improves the accuracy of estimating the phenotypic variance component explained by genome-wide GE interactions using large-scale biobank association summary statistics.

Faculty: Hongyu Zhao, PhD

Download: LDER-GE package

Platform: R

Reference: academic.oup.com (LDER-GE)


MAJAR

A statistical model to assess replicability of biomarkers.

Faculty: Hongyu Zhao, PhD

Download: GitHub / MAJAR package

Platform: R

Reference: journals.sagepub.com (MAJAR)


M-DATA

A statistical model to jointly analyze de novo mutations for multiple traits.

Faculty: Hongyu Zhao, PhD

Download: GitHub / M-DATA package

Platform: R

Reference: journals.plos.org (M-DATA)


N-DATA

A network-assisted model of de novo variants using protein-protein interaction information.

Faculty: Hongyu Zhao, PhD

Download: GitHub / N-DATA package

Platform: R

Reference: journals.plos.org (N-DATA)


PTensor

Gene-gene (G×G) interactions have been shown to be critical for the fundamental mechanisms and development of complex diseases beyond main genetic effects. The commonly adopted marginal analysis is limited by considering only a small number of G factors at a time. With the “main effects, interactions” hierarchical constraint, many of the existing joint analysis methods suffer from prohibitively high computational cost. In this study, we propose a new method for identifying important G×G interactions under joint modeling. The proposed method adopts tensor regression to accommodate high data dimensionality and the penalization technique for selection. It naturally accommodates the strong hierarchical structure without imposing additional constraints, making optimization much simpler and faster than in the existing studies. It outperforms multiple alternatives in simulation. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer and melanoma demonstrates that it can identify markers with important implications and better prediction performance.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / PTensor package

Platform: R

Reference: doi.org (PTensor)


REML-mediation

REML-mediation is an restricted-maximum-likelihood (REML)-based mediation analysis framework that adjusts for genetic confounding effects.

Faculty: Hongyu Zhao, PhD

Download: REML-mediation package

Platform: R

Reference: www.nature.com (REML-mediation)


sglasso

SGL implements penalization method for group variable selection which can properly accommodate the correlation between adjacent groups. This method is based on a combination of the group Lasso penalty and a quadratic penalty on the difference of regression coefficients of adjacent groups. It encourages group sparsity and smoothes regression coefficients for adjacent groups. Canonical correlations are applied to the weights between groups in the quadratic difference penalty.

Faculty: Shuangge Steven Ma, PhD

Download: GitHub / sglasso package

Platform: R

Reference: doi.org (sglasso)


SDPR

A fast and robust Bayesian nonparametric method for prediction of complex traits using GWAS summary statistics.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SDPR package

Platform: C++

Reference: doi.org (SDPR)


SDPRX

A statistical method for cross-population prediction of complex traits.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SDPRX package

Platform: Python

Reference: doi.org (SDPRX)


SDPR_admix

A statistical method to calculate PRS in admixed population.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SDPR_admix package

Platform: C++


SNVler

This repository contains a standalone Python pipeline for assembling segmented Hantavirus genomes from Nanopore sequencing data. The pipeline performs quality control, mapping, consensus sequence generation, and report creation. An optional masking step is also available (requires a bed file containing primer sequences and coordinates, to date, it has not been implemented).

Faculty: Colin J. Carlson, PhD

Download: GitHub / SNVler package

Platform: Python


SUPERGNOVA

Local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SUPERGNOVA package

Platform: Python

Reference: genomebiology.biomedcentral.com (SUPERGNOVA)


SuSiE²

Integration of expression QTLs with fine mapping via SuSiE.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SuSiE² package

Platform: R

Reference: pubmed.ncbi.nlm.nih.gov (SuSiE²)


tatat

Transcriptome Assembly, Thinning, and Annotation Tool

Faculty: Colin J. Carlson, PhD

Download: GitHub / tatat package

Platform: Python


T-GEN

T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) is a framework to identify disease-associated genes leveraging epigenetic information.

Faculty: Hongyu Zhao, PhD

Download: GitHub / T-GEN package

Platform: R

Reference: journals.plos.org (T-GEN)


TWASKnockoff

Knockoff procedure improves identification of candidate causal genes in conditional transcriptome-wide association studies.

Faculty: Hongyu Zhao, PhD

Download: TWASKnockoff package

Platform: R


UTMOST

Transcriptome-wide association analysis is a powerful approach to studying the genetic architecture of complex traits. A key component of this approach is to build a model to impute gene expression levels from genotypes by using samples with matched genotypes and gene expression data in a given tissue. However, it is challenging to develop robust and accurate imputation models with a limited sample size for any single tissue. Here, we first introduce a multi-task learning method to jointly impute gene expression in 44 human tissues. Compared with single-tissue methods, our approach achieved an average of 39% improvement in imputation accuracy and generated effective imputation models for an average of 120% more genes. We describe a summary-statistic-based testing framework that combines multiple single-tissue associations into a powerful metric to quantify the overall gene–trait association. We applied our method, called UTMOST (unified test for molecular signatures), to multiple genome-wide-association results and demonstrate its advantages over single-tissue strategies.

Faculty: Hongyu Zhao, PhD

Download: zhaocenter.com / UTMOST Package

Platform: R

Reference: doi.org (UTMOST)


UKin

UKin is an improved kinship estimation method which can reduce both bias and root mean square error (RMSE) in the estimation of genomic relationship matrix.

Faculty: Hongyu Zhao, PhD

Download: GitHub / UKin package

Platform: Python; R

Reference: bmcbioinformatics.biomedcentral.com (UKin)