Skip to main content

ORIGINAL RESEARCH article

Front. Chem.
Sec. Theoretical and Computational Chemistry
Volume 13 - 2025 | doi: 10.3389/fchem.2025.1545136
This article is part of the Research Topic AI for Molecular Design and Synthesis View all articles

NanoAbLLaMA: construction of nanobody libraries with protein large language models

Provisionally accepted
Xin Wang Xin Wang 1Haotian Chen Haotian Chen 1*Bo Chen Bo Chen 2*Lixin Liang Lixin Liang 1*Fengcheng Mei Fengcheng Mei 1*Bingding Huang Bingding Huang 1*
  • 1 College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
  • 2 Chengdu NBbiolab. CO., LTD, SME Incubation Park, 319 Qingpi Avenue, Chengdu, China

The final, formatted version of the article will be published soon.

    Traditional methods for constructing synthetic nanobody libraries are laborious and timeconsuming. In this work, we introduce a novel approach to building nanobody libraries using protein large language models to generate nanobody sequences, constructing the library through statistical analysis. This process involves the construction of training datasets, efficient fine-tuning to accelerate training, and the creation of nanobody libraries. Specifically, we further trained on the LLaMA2 model using low-rank adaptation targeting germline, ultimately obtaining the NanoAbLLaMA model, a large language model for generating nanobody sequences specific to germline. Experiments show that NanoAbLLaMA has achieved promising results in generating nanobody sequences for specified germlines. Finally, we use this model to generate the required nanobody sequences for two germlines and construct our synthetic nanobody library based on statistical analysis. Code, data and model are available at https://github.com/WangLabforComputationalBiology/NanoAbLLaMA.

    Keywords: reinforcement learning, Generative AI, nanobodies, Libraries, Protein Large Language Models

    Received: 14 Dec 2024; Accepted: 31 Jan 2025.

    Copyright: © 2025 Wang, Chen, Chen, Liang, Mei and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Haotian Chen, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
    Bo Chen, Chengdu NBbiolab. CO., LTD, SME Incubation Park, 319 Qingpi Avenue, Chengdu, China
    Lixin Liang, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
    Fengcheng Mei, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
    Bingding Huang, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.