AffinityLM : Binding-Site Informed Multitask Language Model for Drug-Target Affinity Prediction

Tyler Rose *

Bindwell

Navvye Anand *

Bindwell

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

*All authors contributed equally to this work

Abstract

Drug-target affinity (DTA) prediction is crucial for drug discovery and repurposing. Current DTA prediction models are trained on datasets biased towards abundance of unique molecules relative to target proteins. Such models excel at molecule screening but struggle with inverse drug screening and generalization to novel targets. This data bias limitation hinders drug repurposing efforts and the development of truly general solutions capable of understanding interactions between any given drug and target. To address these challenges, we present AffinityLM, a multitask transformer model that creates comprehensive joint-feature representations of drug-target compounds. Our approach leverages both molecule-biased binding affinity and protein-biased binding site data, thus balancing the protein-molecule data distribution. This approach enables learning from a significantly more diverse dataset with more drug-target pairs, with the potential to improve accuracy and generalization. Despite being trained on a relatively smaller subset of data, comparative evaluations against state-of-the-art models on standard benchmarks like DAVIS and KIBA show that AffinityLM matches or surpasses existing models for binding affinity prediction. Our results demonstrate that models forced to learn rich feature spaces for drug-target interactions offer superior performance and versatility in DTA prediction tasks, paving the way for more effective and broadly applicable drug discovery strategies.

Introduction

Drug-target affinity (DTA) prediction is crucial for drug discovery, but traditional experimental methods are costly and time-consuming. While earlier computational methods relied on 3D structural information, the limited availability of such data has led to increased interest in 1D sequence representations for large-scale DTA prediction. Machine learning, particularly transformer models, has emerged as a promising approach for DTA prediction.

Transformers excel at processing 1D sequential data like SMILES and protein sequences, and can leverage pre-training on large-scale datasets. However, generalizability remains a challenge, especially for novel compounds or proteins.

alt text
The transformer deep learning architecture.

Current DTA prediction models are trained on datasets biased towards molecules, hindering inverse drug screening and generalization to new targets. Previous efforts to address this issue have not fully leveraged available molecular and protein data.

We present AffinityLM, a novel multitask transformer model for DTA prediction. By combining molecule-biased binding affinity data with protein-biased binding site data, AffinityLM balances the protein-molecule distribution. This approach allows exposure to an unprecedented scale of unique drug-target pairs, potentially enhancing generalization. AffinityLM’s multitask architecture forces the model to learn comprehensive representations of drug-target compounds, improving performance on novel drugs and targets.

Methodology

Preliminaries

AffinityLM is designed to address the challenges in DTA prediction by leveraging a multitask learning framework that integrates both DTA prediction and BSP. This dual-task approach enables the model to learn richer and more nuanced representations of drug-target interactions. By employing pretrained transformers for the initial feature extraction of protein and molecule sequences, AffinityLM capitalizes on the transformers’ ability to capture intricate sequence dependencies. This setup not only enhances the model’s predictive accuracy but also improves its generalization capabilities across diverse DTPs.

Mathematically, our model ff can be defined as a function that takes as input a protein sequence pp and a molecule sequence mm, and outputs both a binding affinity prediction aa and a binding site prediction bb,

f:(p,m)(a,b).f: (p, m) \rightarrow (a, b).

Here, we have pRlpp \in \mathbb{R}^{l_p}, mRlmm \in \mathbb{R}^{l_{m}}, aRa \in \mathbb{R}, and b{0,1}lpb \in \{0,1\}^{l_p}, lpl_{p} and lml_{m} represent the lengths of the protein and molecule sequences, respectively.

For batched inputs, we extend this formulation:

F:(P,M)(A,B)F: (\mathbf{P}, \mathbf{M}) \rightarrow (\mathbf{A}, \mathbf{B})

where PRn×lp\mathbf{P} \in \mathbb{R}^{n \times l_p}, MRn×lm\mathbf{M} \in \mathbb{R}^{n \times l_m}, ARn\mathbf{A} \in \mathbb{R}^n, and B{0,1}n×lp\mathbf{B} \in \{0,1\}^{n \times l_p}, with nn representing the batch size.

Model

Building upon the multitask specification, we now detail the architecture of AffinityLM, consisting of three main components: individual feature extraction, joint feature extraction, and task-specific prediction heads.

Pretrained transformers have shown remarkable generalization and accuracy in natural language processing tasks, and recent studies have successfully applied them to DTA prediction. We leverage these pretrained models for individual feature extraction of proteins and molecules, yielding rich representations. This approach offers two key benefits: it allows for caching individual embeddings (thus reducing computational costs for repeated analyses) and enhances generalization through models pretrained on extensive datasets.

Diagram of the transformer deep learning architecture
The PLAPT pre-trained transformer learning framework.

For protein feature extraction, we use the Ankh transformer, while for molecule feature extraction, we use MolFormer. The process for each is as follows:

    Protein Embedding:
    1. 1. Tokenize a batch of nn protein sequences PP using Ankh’s tokenizer with padding enabled, resulting in a tensor of shape Rn×Lp\mathbb{R}^{n \times L_p}, where LpL_p is the maximum protein sequence length in the batch.
    2. 2. Feed the padded tokens through the Ankh encoder, producing protein embeddings of the size EpRn×Lp×1536\mathbf{E}_p \in \mathbb{R}^{n \times L_p \times 1536}.

    Molecule Embedding:
    1. 1. Tokenize a batch of nn molecule SMILES strings M\mathbf M using MolFormer’s tokenizer with padding enabled, resulting in a tensor of shape Rn×Lm\mathbb{R}^{n \times L_m}, where LmL_m is the maximum molecule sequence length in the batch.
    2. 2. Feed the padded tokens through the MolFormer encoder, producing molecule embeddings EmRn×Lm×512\mathbf{E}_m \in \mathbb{R}^{n \times L_m \times 512}.

      Masking Padded Embeddings:
    1. To address the issue of padded embeddings, we apply a mask to the batches: Emasked=EMmask\mathbf{E}_{\text{masked}} = \mathbf{E} \odot \mathbf{M}_{\text{mask}} , where E\mathbf E is the embedding tensor and Emasked\mathbf{E}_{\text{masked}} is a binary mask tensor of the same shape as E \mathbf E. The mask is 1 for valid tokens and 0 for padding tokens.

Multitask Specification

Based on previous research, we hypothesize that a multitask approach for DTA models can lead to a more comprehensive understanding of drug-target interactions. By predicting binding sites, the model may develop a better grasp of the spatial and chemical properties that contribute to binding, which could in turn improve its binding affinity predictions. Furthermore, because binding site prediction can be formulated as a token classification problem, it is well suited for sequence-based transformer models, which have exhibited high performance in recent literature. Besides, a multitask architecture may not only improve DTA prediction performance but also enhance the model’s potential as a feature extractor for other drug-target pair attributes, potentially providing a rich internal representation of drug-target compounds that can be valuable for various downstream tasks in drug discovery. The multitask learning (MTL) approach in AffinityLM is designed to enhance the model’s understanding of drug-target interactions by simultaneously predicting binding affinity and identifying binding sites. This strategy is motivated by the hypothesis that learning related tasks in parallel can lead to improved performance and generalization.

Binding Affinity Prediction: A regression task to predict the binding affinity aa between a protein pp and a molecule mm.

Binding Site Prediction: A sequence labeling task to predict which amino acids in the protein sequence pp are part of the binding site for the molecule mm.

Mathematically, we can formulate our multitask learning objective as follows:

Ltotal=λ1LDTA+λ2LBS\mathcal{L}_{total} = \lambda_1 \mathcal{L}_{DTA} + \lambda_2 \mathcal{L}_{BS}

LDTA:Loss function for the binding affinity prediction task (e.g., mean squared error), \mathcal{L}_{DTA}: \quad \text{Loss function for the binding affinity prediction task (e.g., mean squared error), }
LBS:Loss function for the binding site identification task (e.g., binary cross-entropy),\mathcal{L}_{BS}: \quad \text{Loss function for the binding site identification task (e.g., binary cross-entropy),}
λ1 and λ2:Hyperparameters that control the relative importance of each task.\lambda_1 \text{ and } \lambda_2: \quad \text{Hyperparameters that control the relative importance of each task.}

For the binding affinity prediction task, we define the loss as the mean squared error (MSE) between the true affinity value yy and the predicted affinity y^\hat{y}.

Diagram of the transformer deep learning architecture
AffinityLM framework.

Model Configuration and Hyperparameters

The key hyperparameters and training settings used to train AffinityLM are summarized in the following table:

ParameterValue
Embedding dimension768
Number of attention layers4
Number of attention heads8
Feedforward expansion rate4
Dropout rate0.2
Maximum sequence length4200
OptimizerAdamW (fused)
Learning rate5e-4
Batch size (affinity and binding site)56
Number of epochs3
Affinity loss weight2.0
Binding site loss weight0.5

Results

We evaluated AffinityLM’s performance on the DAVIS and KIBA datasets, comparing it with several state-of-the-art models.

MethodMSE↓CI↑rm2
KronRLS0.3790.8710.407
SimBoost0.2820.8720.644
DeepDTA0.2610.8780.630
GANsDTA0.2760.8810.653
GraphDTA0.2290.8930.649
TF-DTA0.2310.8860.670
MGraphDTA0.2070.9000.710
WGNN-DTA0.2080.9000.692
DGraphDTA0.2020.9040.700
AffinityLM0.2050.8990.712

Table: Performance Comparison on DAVIS Dataset. Best values are in bold and second-best values are underlined.

BibTeX citation

  
  @article {Bindwell2025,
	author = {Bindwell},
	title = {AffinityLM: Binding-Site Informed Multitask Language Model for Drug-Target Affinity Prediction},
	year = {2025},
	publisher = {Cold Spring Harbor Laboratory},
	journal = {bioRxiv}
}

  
  
@article {Rose2024.02.08.575577,
	author = {Rose, Tyler and Monti, Nicol{\`o} and Anand, Navvye and Shen, Tianyu},
	title = {PLAPT: Protein-Ligand Binding Affinity Prediction Using Pretrained Transformers},
	elocation-id = {2024.02.08.575577},
	year = {2024},
	doi = {10.1101/2024.02.08.575577},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/02/09/2024.02.08.575577},
	eprint = {https://www.biorxiv.org/content/early/2024/02/09/2024.02.08.575577.full.pdf},
	journal = {bioRxiv}
}