The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Chem.
Sec. Theoretical and Computational Chemistry
Volume 13 - 2025 |
doi: 10.3389/fchem.2025.1545136
NanoAbLLaMA: construction of nanobody libraries with protein large language models
Provisionally accepted- 1 College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
- 2 Chengdu NBbiolab. CO., LTD, SME Incubation Park, 319 Qingpi Avenue, Chengdu, China
Traditional methods for constructing synthetic nanobody libraries are laborious and timeconsuming. In this work, we introduce a novel approach to building nanobody libraries using protein large language models to generate nanobody sequences, constructing the library through statistical analysis. This process involves the construction of training datasets, efficient fine-tuning to accelerate training, and the creation of nanobody libraries. Specifically, we further trained on the LLaMA2 model using low-rank adaptation targeting germline, ultimately obtaining the NanoAbLLaMA model, a large language model for generating nanobody sequences specific to germline. Experiments show that NanoAbLLaMA has achieved promising results in generating nanobody sequences for specified germlines. Finally, we use this model to generate the required nanobody sequences for two germlines and construct our synthetic nanobody library based on statistical analysis. Code, data and model are available at https://github.com/WangLabforComputationalBiology/NanoAbLLaMA.
Keywords: reinforcement learning, Generative AI, nanobodies, Libraries, Protein Large Language Models
Received: 14 Dec 2024; Accepted: 31 Jan 2025.
Copyright: © 2025 Wang, Chen, Chen, Liang, Mei and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Haotian Chen, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
Bo Chen, Chengdu NBbiolab. CO., LTD, SME Incubation Park, 319 Qingpi Avenue, Chengdu, China
Lixin Liang, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
Fengcheng Mei, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
Bingding Huang, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.