Skip to Main Content

INFORMATION FOR

    Q+A

    Using AI Tools to Detect Misfolding Proteins Behind Neurological Disease and Diabetes

    Proteins that do not hold a fixed shape are one of biology's most stubborn challenges.

    5 Minute Read

    Conventional AI tools for protein research assume proteins have a fixed structure, which leaves a major gap in the ability to study or treat the diseases these unstable proteins can cause.

    Bridging that gap is the focus of María Rodríguez Martínez, PhD, MSc, associate professor of biomedical informatics & data science at Yale School of Medicine. Her research sits at the intersection of computational biology, immunology, and therapeutic design, with the long-term goal of turning complex biological data into tools that can guide biological discovery and inform new strategies for diagnosis and treatment.

    How has your research evolved?

    Much of my group’s work has focused on modeling and understanding B cell and T cell receptors—the proteins the immune system uses to recognize and respond to threats. Parts of these receptors are highly flexible—they bend and shift as they scan for and engage targets—and that flexibility makes them genuinely difficult to model with current computational tools. In the past, we have developed algorithms to predict their important properties, such as binding partners and activation, which are critical to understanding how the immune system works.

    T Cell Receptor in Motion

    The green regions show its stable core, while the colored lines capture the many shapes its flexible loop can adopt to recognize and bind different targets.

    You are currently expanding to model intrinsically disordered proteins. What are these, and why are they so difficult to study?

    Most proteins fold into a fixed shape—like a key designed to fit one specific lock. That shape determines how they work. But intrinsically disordered proteins are different. They do not settle into one form. Instead, they shift constantly between many different shapes depending on their environment, the molecules nearby, or small changes in their genetic code. This flexibility is not a flaw. It is important for how these proteins function in a healthy body. But it also makes them very difficult to study.

    Most tools in biology and AI assume that proteins have one stable structure. When that assumption does not hold, those tools fall short. That leaves a major gap in our ability to understand and treat diseases driven by intrinsically disordered proteins (IDPs).

    How is your approach different from existing AI tools for protein research?

    Most AI tools in this area are built to predict one "best" static structure for a protein. We are asking a different question: not "what shape does this protein have?" but "what range of shapes can it take and which ones can lead to disease?"

    We use large-scale computer simulations, generative AI, and experimental data to build models that capture the full range of conformational states an IDP can adopt, including rare forms that may be the most harmful.

    This is a significant shift from how most tools in the field currently work. We believe it is essential for making progress on diseases rooted in protein misfolding and aggregation, a process by which proteins clump together into toxic deposits that can damage cells, driving conditions such as Alzheimer’s and Parkinson’s.

    What is GRAMMAR-IDP?

    Building on this foundation, we are designing a landmark project: GRAMMAR-IDP. The goal is to decode the “molecular grammar”—the underlying rules—that govern how IDPs behave and translate those rules into tools for early detection and targeted treatment.

    To do this, we are bringing together a large consortium spanning physics, molecular biology, and computational science, including Janghoo Lim, PhD; Corey O'Hern, PhD; Jiangbing Zhou, PhD; Sathish Ramakrishnan, PhD; Jonathan Bogan, MD; and Christian Schlieker, PhD at Yale; Marc Vidal, PhD, at Dana-Farber, and Sung Yun Jung, PhD, at Baylor College of Medicine, alongside industry collaborators.

    Ultimately, we want to detect the earliest signs of harmful protein misfolding and aggregation before major tissue damage occurs, and intervene while there is still time to change the course of disease.

    What diseases is your research focused on, and what could it mean for patients?

    We are starting to investigate highly disordered proteins, including those involved in neurodegenerative diseases. One important example is our ongoing collaboration with Janghoo Lim, PhD, professor of genetics and neuroscience at Yale, on spinocerebellar ataxia type 1, a condition where a key protein misfolds and aggregates in ways that remain poorly understood. A second disease we are investigating is type 2 diabetes, one of the most common chronic diseases in the world.

    In both cases, part of what drives the disease is a protein that misfolds—meaning it fails to adopt the shape it needs to work properly—and aggregates, meaning that it clumps together in harmful ways. Those harmful protein forms are currently very hard to detect and target with existing treatments.

    If we can build tools that identify these dangerous protein forms earlier and more precisely, it could open the door to earlier diagnosis, better ways to track disease progression, and treatments that address the root cause, not just the symptoms.

    You've made rigorous benchmarking a central part of your work. Why does that matter?

    As AI tools become more common in medicine and research, a critical question is: Do they actually work, not just in ideal conditions, but in the complex and noisy environments where biology really happens? That is what benchmarking helps us answer.

    Our group has spent significant effort building evaluation frameworks that put AI methods to the test under realistic conditions. For instance, we built a benchmark using data from more than 380,000 lab experiments to predict how well antibodies bind to their targets.

    In a separate study on peptide engineering in immune receptor complexes, we showed that some widely used AI methods fail in predictable ways when tested under real biological constraints. This kind of rigorous testing is not just an academic exercise—it is what allows researchers and clinicians to trust the tools they use. If we are going to bring AI into medicine, we have to know where it works—and where it does not.

    "One century ago, Einstein famously said that 'God does not play dice' to express his resistance to the probabilistic nature of quantum mechanics. One century later, I sometimes think nature may do exactly that: using highly disordered proteins to orchestrate remarkably precise molecular processes."

    María Rodríguez Martínez, PhD, MSc
    Associate Professor of Biomedical Informatics and Data Science

    Article outro

    Author

    Sooyoun Tan
    Web Design and Communications Officer

    Tags

    Media Contact

    For media inquiries, please contact us.

    Explore More

    Featured in this article

    Related Links

    Related Organizations