Skip to Main Content

Hongyu Zhao, PhD

GenoCanyon

GenoCanyon is a whole-genome functional annotation approach based on unsupervised statistical learning. It integrates genomic conservation measures and biochemical annotation data to predict the functional potential at each nucleotide. More details about the method can be found in our PAPER.

Faculty: Hongyu Zhao, PhD.

Download: zhaocenter.com / GenoCanyon Package

Platform: R; RShiny

Reference: doi.org (GenoCanyon)


GenoSkyline and GenoSkyline Plus

GenoSkyline is a principled framework to predict tissue-specific functional regions through integrating high-throughput epigenomic annotations. Integrative analysis of GenoSkyline annotations with GWAS summary statistics could systematically identify biologically relevant tissue types and provide novel insights into the genetic basis of human complex traits.

Faculty: Hongyu Zhao, PhD

Download: zhaocenter.org / GenoSkyline Package

Platform: BED

Reference: doi.org (GenoSkyline) and doi.org (GenoSkyline Plus)


UTMOST

Transcriptome-wide association analysis is a powerful approach to studying the genetic architecture of complex traits. A key component of this approach is to build a model to impute gene expression levels from genotypes by using samples with matched genotypes and gene expression data in a given tissue. However, it is challenging to develop robust and accurate imputation models with a limited sample size for any single tissue. Here, we first introduce a multi-task learning method to jointly impute gene expression in 44 human tissues. Compared with single-tissue methods, our approach achieved an average of 39% improvement in imputation accuracy and generated effective imputation models for an average of 120% more genes. We describe a summary-statistic-based testing framework that combines multiple single-tissue associations into a powerful metric to quantify the overall gene–trait association. We applied our method, called UTMOST (unified test for molecular signatures), to multiple genome-wide-association results and demonstrate its advantages over single-tissue strategies.

Faculty: Hongyu Zhao, PhD

Download: zhaocenter.com / UTMOST Package

Platform: R

Reference: doi.org (UTMOST)


GPA

Realize three approaches for Gene-Environment interaction analysis. All of them adopt Sparse Group Minimax Concave Penalty to identify important G variables and G-E interactions, and simultaneously respect the hierarchy between main G and G-E interaction effects. All the three approaches are available for Linear, Logistic, and Poisson regression. Also realize to mine and construct prior information for G variables and G-E interactions.

Faculty: Hongyu Zhao, PhD

Download: GitHub / GPA Package

Platform: R and RStudio

Reference: doi.org (GPA)


GRAPE

Gene-Ranking Analysis of Pathway Expression (GRAPE) is a tool for summarizing the consensus behavior of biological pathways in the form of a template, and for quantifying the extent to which individual samples deviate from the template. GRAPE templates are based only on the relative rankings of the genes within the pathway and can be used for classification of tissue types or disease subtypes. GRAPE can be used to represent gene-expression samples as vectors of pathway scores, where each pathway score indicates the departure from a given collection of reference samples. The resulting pathway- space representation can be used as the feature set for various applications, including survival analysis and drug-response prediction.

Faculty: Hongyu Zhao, PhD

Download: Cran R / GRAPE Package

Platform: R

Reference: doi.org (GRAPE)


EB-PRS

EB-PRS is a novel method that leverages information for effect sizes across all the markers to improve the prediction accuracy. No parameter tuning is needed in the method, and no external information is needed. This R-package provides the calculation of polygenic risk scores from the given training summary statistics and testing data. We can use EB-PRS to extract main information, estimate Empirical Bayes parameters, derive polygenic risk scores for each individual in testing data, and evaluate the PRS according to AUC and predictive r2.

Faculty: Hongyu Zhao, PhD

Download: Cran R / EB-PRS Package

Platform: R

Reference: doi.org (EB-PRS)


CorBin

We design algorithms with linear time complexity with respect to the dimension for three commonly studied correlation structures, including exchangeable, decaying-product and K-dependent correlation structures, and extend the algorithms to generate binary data of general non-negative correlation matrices with quadratic time complexity.

Faculty: Hongyu Zhao, PhD

Download: Cran R / CorBin Package

Platform: R

Reference: doi.org (CorBin)


dcGSA

Distance-correlation based Gene Set Analysis for longitudinal gene expression profiles. In longitudinal studies, the gene expression profiles were collected at each visit from each subject and hence there are multiple measurements of the gene expression profiles for each subject. The dcGSA package could be used to assess the associations between gene sets and clinical outcomes of interest by fully taking advantage of the longitudinal nature of both the gene expression profiles and clinical outcomes.

Faculty: Hongyu Zhao, PhD

Download: Bioconductor / dsGSA Package

Platform: R

Reference: doi.org (dsGSA)


SUPERGNOVA

Local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SUPERGNOVA package

Platform: Python

Reference: genomebiology.biomedcentral.com (SUPERGNOVA)


GENJI

Estimating genetic correlation jointly using individual-level and summary-level GWAS data.

Faculty: Hongyu Zhao, PhD

Download: GitHub / GENJI package

Platform: Python

Reference: biorxiv.org (GENJI)


Composite-trait LDSC

Estimating correlation between composite phenotypes and traits.

Faculty: Hongyu Zhao, PhD

Download: GitHub / Composite-trait LDSC package

Platform: Python

Reference: doi.org (Composite-trait LDSC)


SDPR

A fast and robust Bayesian nonparametric method for prediction of complex traits using GWAS summary statistics.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SDPR package

Platform: C++

Reference: doi.org (SDPR)


SDPRX

A statistical method for cross-population prediction of complex traits.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SDPRX package

Platform: Python

Reference: doi.org (SDPRX)


SDPR_admix

A statistical method to calculate PRS in admixed population.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SDPR_admix package

Platform: C++


BayesMEModel

A Bayesian Approach to Correcting the Attenuation Bias of Regression Using Polygenic Risk Score.

Faculty: Hongyu Zhao, PhD

Download: GitHub / BayesMEModel package

Platform: R

Reference: doi.org (BayesMEModel)


JointPRS

A statistical model for multi-population PRS calculation.

Faculty: Hongyu Zhao, PhD

Download: GitHub / JointPRS package

Platform: Python


M-DATA

A statistical model to jointly analyze de novo mutations for multiple traits.

Faculty: Hongyu Zhao, PhD

Download: GitHub / M-DATA package

Platform: R

Reference: journals.plos.org (M-DATA)


N-DATA

A network-assisted model of de novo variants using protein-protein interaction information.

Faculty: Hongyu Zhao, PhD

Download: GitHub / N-DATA package

Platform: R

Reference: journals.plos.org (N-DATA)


MAJAR

A statistical model to assess replicability of biomarkers.

Faculty: Hongyu Zhao, PhD

Download: GitHub / MAJAR package

Platform: R

Reference: journals.sagepub.com (MAJAR)


ResPAN

A powerful batch correction model for scRNA-seq data through residual adversarial networks.

Faculty: Hongyu Zhao, PhD

Download: GitHub / ResPAN package

Platform: Python

Reference: doi.org (ResPAN)


scAAnet

Non-linear archetypal analysis of single-cell RNA-seq data by deep autoencoders.

Faculty: Hongyu Zhao, PhD

Download: GitHub / scAAnet package

Platform: Python

Reference: doi.org (scAAnet)


MuSe-GNN

Learning Unified Gene Representation From Multimodal Biological Graph Data.

Faculty: Hongyu Zhao, PhD

Download: GitHub / MuSe-GNN package

Platform: Python

Reference: proceedings.neurips.cc (MuSe-GNN)


CosGeneGate

CosGeneGate selects multi-functional and credible biomarkers for single-cell analysis.

Faculty: Hongyu Zhao, PhD

Download: GitHub / CosGeneGate package

Platform: Python

Reference: academic.oup.com (CosGeneGate)


Geneverse

A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research.

Faculty: Hongyu Zhao, PhD

Download: GitHub / Geneverse package

Platform: Python

Reference: aclanthology.org (Geneverse)


HBI

A hierarchical Bayesian interaction model to estimate cell-type-specific methylation quantitative trait loci.

Faculty: Hongyu Zhao, PhD

Download: GitHub / HBI package

Platform: R

Reference: genomebiology.biomedcentral.com (HBI)


UKin

UKin is an improved kinship estimation method which can reduce both bias and root mean square error (RMSE) in the estimation of genomic relationship matrix.

Faculty: Hongyu Zhao, PhD

Download: GitHub / UKin package

Platform: Python; R

Reference: bmcbioinformatics.biomedcentral.com (UKin)


SuSiE²

Integration of expression QTLs with fine mapping via SuSiE.

Faculty: Hongyu Zhao, PhD

Download: GitHub / SuSiE² package

Platform: R

Reference: pubmed.ncbi.nlm.nih.gov (SuSiE²)


TWASKnockoff

Knockoff procedure improves identification of candidate causal genes in conditional transcriptome-wide association studies.

Faculty: Hongyu Zhao, PhD

Download: TWASKnockoff package

Platform: R


scNAT

A deep learning method for integrating paired single-cell RNA and T cell receptor sequencing profiles.

Faculty: Hongyu Zhao, PhD

Download: GitHub / scNAT package

Platform: Python

Reference: genomebiology.biomedcentral.com (scNAT)


MARBLES

A Markov random field model-based approach for differentially expressed gene detection from single-cell RNA-seq data.

Faculty: Hongyu Zhao, PhD

Download: GitHub / MARBLES package

Platform: R

Reference: academic.oup.com (MARBLES)


T-GEN

T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) is a framework to identify disease-associated genes leveraging epigenetic information.

Faculty: Hongyu Zhao, PhD

Download: GitHub / T-GEN package

Platform: R

Reference: journals.plos.org (T-GEN)


cWAS

cWAS is a statistical framework to identify cell types whose genetically regulated proportions are associated with complex diseases.

Faculty: Hongyu Zhao, PhD

Download: GitHub / cWAS package

Platform: R

Reference: journals.plos.org (cWAS)


REML-mediation

REML-mediation is an restricted-maximum-likelihood (REML)-based mediation analysis framework that adjusts for genetic confounding effects.

Faculty: Hongyu Zhao, PhD

Download: REML-mediation package

Platform: R

Reference: www.nature.com (REML-mediation)


LDER-GE

LDER-GE improves the accuracy of estimating the phenotypic variance component explained by genome-wide GE interactions using large-scale biobank association summary statistics.

Faculty: Hongyu Zhao, PhD

Download: LDER-GE package

Platform: R

Reference: academic.oup.com (LDER-GE)


BV-LDER-GE

BV-LDER-GE harnesses both correlations with additive genetic effects and full LD information to enhance the statistical power to detect genome-scale G E interactions.

Faculty: Hongyu Zhao, PhD

Download: BV-LDER-GE package

Platform: R


CASE

CASE is an R package designed for multi-trait fine-mapping analysis, with a particular focus on single-cell eQTL fine-mapping.

Faculty: Hongyu Zhao, PhD

Download: GitHub / CASE package

Platform: R


PERADIGM

Phenotype Embedding Similarity-based Rare Disease Gene Mapping.

Faculty: Hongyu Zhao, PhD

Download: PERADIGM package

Platform: R