Skip to Main Content

Yale-BI Joint Research Committee (JRC) and Fellows

Current Fellows

  • 2025 Fellows

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '25

      Topic: Transcriptional-Perturbation Integration for Therapeutic Response Prediction

      Project summary: Accurate prediction of therapeutic response remains a critical challenge in precision oncology. Existing gene expression foundation models, while powerful, lack cancer-specific fine-tuning, limiting their ability to capture tumor-relevant regulatory patterns essential for therapeutic response prediction. Additionally, the translational gap between preclinical models and clinical patient outcomes continues to limit their application in therapeutic decision-making. We propose developing a computational framework that integrates cancer-adapted transcriptional modeling with perturbationderived drug signatures to predict patient-level therapeutic responses. The approach combines foundation model adaptation, multi-modal embedding integration, and transfer learning strategies to capture drug sensitivity patterns across cancer contexts. This framework aims to bridge preclinical drug characterization with clinical decision-making, offering potential for therapeutic prioritization, biomarker identification, and discovery of novel treatment opportunities for under-targeted patient populations.

      Biography: Manyan Huang, Ph.D., is a Postdoctoral Associate in the Gerstein Lab at Yale University. Her research focuses on integrative computational analysis of gene expression regulation in brain cancer and neuropsychiatric disorders. She leverages multi-omic datasets and AI-driven foundation models to uncover disease mechanisms and predict therapeutic responses. Dr. Huang earned her Ph.D. from Indiana University under the supervision of Dr. Ming Li, specializing in statistical genetics and genetic epidemiology. Her doctoral research focused on genome-wide association studies and gene-based methods for detecting rare variant effects in complex diseases.

      Yale-Mentor Professor Mark Gerstein, BI-Mentor Dr. Youli Xia

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '25

      Topic: Integrative Construction of Biomedical Knowledge Graph for Explainable Drug Discovery Using Large Language Models

      Project Summary: Biological pathways are crucial for understanding diseases and discovering therapeutics, but the traditional reliance on manual curation limits scalability and contextual specificity. With the rapid increase in biomedical literature, there is a pressing need for automated, context-sensitive approaches. As such, our goal is to develop a scalable, context-aware biomedical knowledge graph. This graph will integrate data mined from approximately 38 million PubMed articles, enriched by large language models (LLMs) and curated biomedical databases such as KEGG and Reactome. It will enable precise querying for biological mechanisms and support drug discovery efforts. Additionally, we will introduce a graphaugmented LLM agent designed for mechanism discovery and therapeutic biomarker identification. This agent will perform context-driven subgraph retrieval and prioritize novel, biologically plausible connections. By providing transparent and interpretable reasoning backed by structured biomedical knowledge, our integrated framework will enhance context-aware mechanism discovery and biomarker identification, significantly advancing explainable drug discovery and therapeutic targeting.

      Biography: Dr. Chia-Hsuan Chang is a postdoctoral associate in the Clinical NLP lab, led by Dr. Hua Xu, at the Department of Biomedical Informatics and Data Science at Yale School of Medicine. His research focuses on natural language processing (NLP) and large language models (LLMs), particularly their applications in enhancing healthcare and literature-based discovery. He obtained a Ph.D. in Information Systems from National Sun Yat-sen University, under the advice of Dr. San-Yih Hwang. Previously, he was a postdoctoral researcher in the College of Computing & Informatics at Drexel University, mentored by Dr. Christopher Yang.

      Yale-Mentor Professor Hua Xu, BI-Mentor Dr. Jon Hill

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '25

      Topic: AI-driven Virtual Cell Models for Drug Discovery

      Project Summary: Cell2Sentence (C2S) is a multimodal virtual cell foundation model that integrates single-cell, perturbational, and textual data to enable natural-language exploration of cellular mechanisms. This proposal will leverage C2S to target age-related macular degeneration (AMD), a leading cause of irreversible vision loss marked by angiogenesis and inflammation, where treatment responses remain heterogeneous. The three aims are: (1) integrate AMD single-cell, spatial, bulk and literature data to define cell states, networks and disease drivers; (2) deploy a multibillion-parameter C2S model for rapid virtual screening of more than 30,000 public and Boehringer compounds, predicting cell-specific responses; and (3) experimentally validate key perturbations in ex vivo human retinal cultures with a lab-in-the-loop AI agent. Expected outcomes include novel targets, prioritized compounds, and a generalizable platform extendable to other chronic diseases.

      Biography: Daniel Levine is the Yale–Boehringer Ingelheim Postdoctoral Fellow in the Van Dijk Lab at Yale School of Medicine. He earned his PhD in Mathematics from Penn State University, specializing in algebraic geometry and moduli spaces of vector bundles, then worked as a machine learning engineer at a cybersecurity startup. Since joining the lab in 2023, he has developed the Cell2Sentence (C2S) virtual cell platform and pursued research in flow matching and operator learning. His current work explores AI agents and multimodal foundation models for virtual cells and drug discovery.

      Yale-Mentor Professor David van Dijk, BI-Mentor Dr. Boris Alexander Bartholdy

  • 2023 Fellows

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '23

      Topic: Moving from GWAS to Casual Genes and Variants

      Project Summary: The size and ethnic diversity of emerging sequencing datasets are growing rapidly. Combining these data with emerging single cell omic datasets and AI models for predicting gene activity (eg: expression) offers an unprecedented opportunity to uncover the causal genes and cell types that drive human traits and disease. However, in emerging sequencing datasets, the strong, often perfect, linkage among associated ultra-rare variants can yield an unwieldy list of candidate causal variants. This problem is exacerbated by the presence of multiple causal variants (allelic heterogeneity) and migration events, both of which are more common in ethnically diverse datasets. This fine mapping enigma motivates our current research. Using novel statistical methods, we aim to develop an automated yet interpretable approach that does not seek to isolate causal variants, but rather to directly identify target genes and pathways from phenotypic and single cell xQTL data across different cohorts.

      Yale-Mentor Professor Ira Hall, BI-Mentor Dr. Daniel Lam

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '23

      Topic: Multi-omics Analytics for Personalized Medicine

      Project Summary: First, we will develop a framework that integrates spatial transcriptomics, single-cell RNA-seq, single-cell ATAC-seq, high-resolution imaging, and single-cell targeted protein data to identify tissue microenvironments. By utilizing network-based variable selection and regression of cell morphology, we will aggregate selected features using cell adjacency matrices to cluster tissue areas into microenvironments. This multi-modal integration promises to uncover new microenvironment characteristics for targeted therapeutics. Second, we will focus on identifying disease progression-associated changes in tissue microenvironments. Using known biomarker genes, we will differentiate microenvironments and assess disease severity and progression. We will analyze changes in cell compositions, expression profiles, gene regulatory networks, and cell-cell communication networks. Deconvolved spatial transcriptomics and causal network approaches will aid in constructing gene regulatory networks, while Connectome and graph attention network methods will establish cell-cell communication networks. Correlations with disease progression will be examined independently and combined using neural networks to gain a comprehensive understanding for precise therapeutic development.

      Yale-Mentor Professor Xiting Yan, BI-Mentor Dr. Alexandra Popa

  • 2022 Fellows

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22

      Topic: Mechanism-based identification of biomarkers and intervention targets from multi-omics datasets

      Project Summary: Recent technological advances allowing for the global characterization of genomic variants, transcription profiles, epigenomic profiles, and protein markers, often down to the single-cell level, have provided unprecedented insights into the homeostatic and perturbed states of biological systems. Analyzing these vast multi-omics datasets to obtain clinically actionable biomarkers and promising intervention targets remains a formidable challenge. Prediction of the individual immune response quality and quantity in health and disease is one quintessential case. Our proposed research will combine statistical analyses with causal inference and multi-scale mathematical modeling to develop a multi-omics data analysis pipeline that (a) provides mechanistic insights into the underlying biological process, (b) captures the diversity seen across individuals, and (c) identifies complex features and rules that are predictive of the response to perturbations. We will apply our approach to datasets characterizing the vaccination response to identify predictive biomarkers and intervention targets to improve vaccine efficacy.

      Yale-Mentor Professor John Tsang, BI-Mentor Dr. Katja Koeppen

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22

      Topic: Multi-omics Analytics and Emerging Technologies

      Project Summary: Recent effort has been made in using CRISPR knockout or activating to screening target to boost T cell effector function and further leverage the immune killing function. However, the manipulating of a single gene might still be hard to overcome the resistance due to genes that can compromise its function in immune cell signaling. Paralogs derived from the same ancestors are reported with synthetic lethal interactions, which might function jointly in augmenting cancer immunity. In this project, we will establish a computational model for predicting paralogs pairs that can team up their function in cancer immunotherapy, by integrating genome-wide CRISPR screens perturbation molecular profiles from Cancer Dependency Map (DepMap) and Connectivity Map (CMap), and cancer datasets with patients receiving immunotherapy. The outcome of this research will deliver in silico tools for screening paralog pairs that can boost immune response, which could inspire effective combination therapeutic strategies toward precision treatment.

      Yale-Mentor Professor Sidi Chen, BI-Mentor Dr. Di Feng

    • Rong Li

      Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22

      Topic: Multimodal network-based cancer heterogeneity analysis

      Project Summary: In the past decade, the maturity of profiling techniques has led to the discovery that previously defined cancer types/subtypes, which is based on pathological images, can be further classified into sub-subtypes. This refined classification has different omics landscapes and clinical paths and demand different treatment strategies. Accordingly, the first guiding principle of this study is that effectively integrating multimodal data, in particular pathological imaging and multi-omics data, can lead to more refined cancer heterogeneity structures. In heterogeneity analysis, incorporating the interconnections among variables can future reveal more subtle cancer heterogeneity structures. As such, the second guiding principle is that utilizing cutting-edge methods to incorporate interconnections can further improve cancer heterogeneity analysis. Our overarching goal is to develop more effective statistical learning methods for cancer heterogeneity analysis, which can deepen our understanding of cancer biology and facilitate more personalized treatment.

      JRC: Yale-Mentor Professor Shuangge Ma, BI-Mentor Zuojian Tang

  • Alumni Fellows 2022

    • Dylan Duchen, PhD, MPH

      Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22

      Topic: Multi-omics Analytics and Emerging Technologies

      Project Summary: Graph genome-based models can characterize genetic variation across both microbial organisms and diverse regions of the human genome. We aim to investigate whether these models can also be used to characterize the extensive genetic diversity observed within immunogenetic sequencing datasets (e.g., B cell receptor (BCR) repertoire sequencing). We will develop graph-based approaches to 1) analyze high-throughput immunogenetic sequencing (e.g., BCR repertoire profiling) and 2) perform genetic association tests focused phenotypes related to the host immune response to vaccines, infection, therapeutics developed by Boehringer Ingelheim, and autoimmune diseases. We will also assess whether graph structure/topology is clinically informative and, by annotating regions across the graph using external multi-modal data, assess whether annotated genome graphs can facilitate immunogenetic-focused genome-wide association studies.

      Yale-Mentor Professor Steven Kleinstein, BI-Mentor Dr. Ingrid Braenne

  • Alumni Fellows - 2021

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21

      Topic: A bioinformatics journey: from EHR to genetic data

      Project Summary: As a postdoc at Yale Center for Biomedical Data Science, Xiayuan is going to work on high-throughput biomedical data, including electronic health records (EHRs) and genetic data. His research will focus on extending state-of-the-art machine learning approaches in health using EHRs, developing machine learning algorithms for drug discovery and adverse drug effects, and applying statistical methods to investigate the challenging problems in genetic data. Based on his PhD research, he believes family history linked EHRs succinctly encompasses shared genetic, epigenetic, and environmental features which enhance the analysis of human disease. He plans to apply machine learning algorithms in healthcare domain, such as disease risk prediction, precision medicine and clinical applications using family history linked EHRs. From the perspective of genetic data, his research work is devoted to addressing challenging problems in single-cell RNA sequencing data, developing innovative statistical models on analyzing the impact of genetic variants in human disease.

      Yale-Mentor Professor Zuoheng Wang, BI-Mentor Dr. Zhihao Ding

    • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21

      Dhananjay Bhaskar is a postdoc in Genetics at the School of Medicine and a Yale-Boehringer Ingelheim Biomedical Data Science Fellow. His interdisciplinary research combines topological data analysis, machine learning, and mathematical modeling with applications in biophysics and biomedical research. Previously he worked on quantitative analysis of pattern formation and phase transitions in active matter, automated embryo selection for IVF, and unsupervised methods for analyzing cell shape and motility in time-lapse microscopy.

      Dhananjay received his Ph.D. in Biomedical Engineering and Sc.M. in Data Science from Brown University. Prior to Brown, he studied Computer Science and Applied Mathematics at the University of British Columbia.

      Yale-Mentor Smita Krishnaswamy, BI-Mentor Gregorio Alanis-Lobato

      • Zhe Sun

        Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21

        Topic: Multi-modal and multi-source data integration for brain imaging.

        Project Summary: With data collected from various brain imaging techniques, there are needs for neurobiological meaningful analytical tools to integrate imaging modalities across techniques and trait-types. To this end, we have proposed a series of neurobiological interpretable models to achieve complex data integrations with applications to neurodegenerative diseases and mental health.

        Yale-Mentor Professor Yize Zhao, BI-Mentor Dr. Gregorio Alanis Lobato