AffinityLM : Binding-Site Informed Multitask Language Model for Drug-Target Affinity Prediction

Bindwell

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

^*All authors contributed equally to this work

Abstract

Drug-target affinity (DTA) prediction is crucial for drug discovery and repurposing. Current DTA prediction models are trained on datasets biased towards abundance of unique molecules relative to target proteins. Such models excel at molecule screening but struggle with inverse drug screening and generalization to novel targets. This data bias limitation hinders drug repurposing efforts and the development of truly general solutions capable of understanding interactions between any given drug and target. To address these challenges, we present AffinityLM, a multitask transformer model that creates comprehensive joint-feature representations of drug-target compounds. Our approach leverages both molecule-biased binding affinity and protein-biased binding site data, thus balancing the protein-molecule data distribution. This approach enables learning from a significantly more diverse dataset with more drug-target pairs, with the potential to improve accuracy and generalization. Despite being trained on a relatively smaller subset of data, comparative evaluations against state-of-the-art models on standard benchmarks like DAVIS and KIBA show that AffinityLM matches or surpasses existing models for binding affinity prediction. Our results demonstrate that models forced to learn rich feature spaces for drug-target interactions offer superior performance and versatility in DTA prediction tasks, paving the way for more effective and broadly applicable drug discovery strategies.

Introduction

Drug-target affinity (DTA) prediction is crucial for drug discovery, but traditional experimental methods are costly and time-consuming. While earlier computational methods relied on 3D structural information, the limited availability of such data has led to increased interest in 1D sequence representations for large-scale DTA prediction. Machine learning, particularly transformer models, has emerged as a promising approach for DTA prediction.

Transformers excel at processing 1D sequential data like SMILES and protein sequences, and can leverage pre-training on large-scale datasets. However, generalizability remains a challenge, especially for novel compounds or proteins.

alt text — The transformer deep learning architecture.

Current DTA prediction models are trained on datasets biased towards molecules, hindering inverse drug screening and generalization to new targets. Previous efforts to address this issue have not fully leveraged available molecular and protein data.

We present AffinityLM, a novel multitask transformer model for DTA prediction. By combining molecule-biased binding affinity data with protein-biased binding site data, AffinityLM balances the protein-molecule distribution. This approach allows exposure to an unprecedented scale of unique drug-target pairs, potentially enhancing generalization. AffinityLM’s multitask architecture forces the model to learn comprehensive representations of drug-target compounds, improving performance on novel drugs and targets.

Methodology

Preliminaries

AffinityLM is designed to address the challenges in DTA prediction by leveraging a multitask learning framework that integrates both DTA prediction and BSP. This dual-task approach enables the model to learn richer and more nuanced representations of drug-target interactions. By employing pretrained transformers for the initial feature extraction of protein and molecule sequences, AffinityLM capitalizes on the transformers’ ability to capture intricate sequence dependencies. This setup not only enhances the model’s predictive accuracy but also improves its generalization capabilities across diverse DTPs.

Mathematically, our model $f$ can be defined as a function that takes as input a protein sequence $p$ and a molecule sequence $m$ , and outputs both a binding affinity prediction $a$ and a binding site prediction $b$ ,

f: (p, m) \rightarrow (a, b).

Here, we have $p \in \mathbb{R}^{l_p}$ , $m \in \mathbb{R}^{l_{m}}$ , $a \in \mathbb{R}$ , and $b \in \{0,1\}^{l_p}$ , $l_{p}$ and $l_{m}$ represent the lengths of the protein and molecule sequences, respectively.

For batched inputs, we extend this formulation:

F: (\mathbf{P}, \mathbf{M}) \rightarrow (\mathbf{A}, \mathbf{B})

where $\mathbf{P} \in \mathbb{R}^{n \times l_p}$ , $\mathbf{M} \in \mathbb{R}^{n \times l_m}$ , $\mathbf{A} \in \mathbb{R}^n$ , and $\mathbf{B} \in \{0,1\}^{n \times l_p}$ , with $n$ representing the batch size.

Model

Building upon the multitask specification, we now detail the architecture of AffinityLM, consisting of three main components: individual feature extraction, joint feature extraction, and task-specific prediction heads.

Pretrained transformers have shown remarkable generalization and accuracy in natural language processing tasks, and recent studies have successfully applied them to DTA prediction. We leverage these pretrained models for individual feature extraction of proteins and molecules, yielding rich representations. This approach offers two key benefits: it allows for caching individual embeddings (thus reducing computational costs for repeated analyses) and enhances generalization through models pretrained on extensive datasets.

Diagram of the transformer deep learning architecture — The PLAPT pre-trained transformer learning framework.

For protein feature extraction, we use the Ankh transformer, while for molecule feature extraction, we use MolFormer. The process for each is as follows:

Protein Embedding:

1. Tokenize a batch of $n$ protein sequences $P$ using Ankh’s tokenizer with padding enabled, resulting in a tensor of shape $\mathbb{R}^{n \times L_p}$ , where $L_p$ is the maximum protein sequence length in the batch.
2. Feed the padded tokens through the Ankh encoder, producing protein embeddings of the size $\mathbf{E}_p \in \mathbb{R}^{n \times L_p \times 1536}$ .

Molecule Embedding:

1. Tokenize a batch of $n$ molecule SMILES strings $\mathbf M$ using MolFormer’s tokenizer with padding enabled, resulting in a tensor of shape $\mathbb{R}^{n \times L_m}$ , where $L_m$ is the maximum molecule sequence length in the batch.
2. Feed the padded tokens through the MolFormer encoder, producing molecule embeddings $\mathbf{E}_m \in \mathbb{R}^{n \times L_m \times 512}$ .

Masking Padded Embeddings:

To address the issue of padded embeddings, we apply a mask to the batches: $\mathbf{E}_{\text{masked}} = \mathbf{E} \odot \mathbf{M}_{\text{mask}}$ , where $\mathbf E$ is the embedding tensor and $\mathbf{E}_{\text{masked}}$ is a binary mask tensor of the same shape as $\mathbf E$ . The mask is 1 for valid tokens and 0 for padding tokens.

Multitask Specification

Based on previous research, we hypothesize that a multitask approach for DTA models can lead to a more comprehensive understanding of drug-target interactions. By predicting binding sites, the model may develop a better grasp of the spatial and chemical properties that contribute to binding, which could in turn improve its binding affinity predictions. Furthermore, because binding site prediction can be formulated as a token classification problem, it is well suited for sequence-based transformer models, which have exhibited high performance in recent literature. Besides, a multitask architecture may not only improve DTA prediction performance but also enhance the model’s potential as a feature extractor for other drug-target pair attributes, potentially providing a rich internal representation of drug-target compounds that can be valuable for various downstream tasks in drug discovery. The multitask learning (MTL) approach in AffinityLM is designed to enhance the model’s understanding of drug-target interactions by simultaneously predicting binding affinity and identifying binding sites. This strategy is motivated by the hypothesis that learning related tasks in parallel can lead to improved performance and generalization.

Binding Affinity Prediction: A regression task to predict the binding affinity $a$ between a protein $p$ and a molecule $m$ .

Binding Site Prediction: A sequence labeling task to predict which amino acids in the protein sequence $p$ are part of the binding site for the molecule $m$ .

Mathematically, we can formulate our multitask learning objective as follows:

\mathcal{L}_{total} = \lambda_1 \mathcal{L}_{DTA} + \lambda_2 \mathcal{L}_{BS}

\mathcal{L}_{DTA}: \quad \text{Loss function for the binding affinity prediction task (e.g., mean squared error), }

\mathcal{L}_{BS}: \quad \text{Loss function for the binding site identification task (e.g., binary cross-entropy),}

\lambda_1 \text{ and } \lambda_2: \quad \text{Hyperparameters that control the relative importance of each task.}

For the binding affinity prediction task, we define the loss as the mean squared error (MSE) between the true affinity value $y$ and the predicted affinity $\hat{y}$ .

Model Configuration and Hyperparameters

The key hyperparameters and training settings used to train AffinityLM are summarized in the following table:

Parameter	Value
Embedding dimension	768
Number of attention layers	4
Number of attention heads	8
Feedforward expansion rate	4
Dropout rate	0.2
Maximum sequence length	4200
Optimizer	AdamW (fused)
Learning rate	5e-4
Batch size (affinity and binding site)	56
Number of epochs	3
Affinity loss weight	2.0
Binding site loss weight	0.5

Results

We evaluated AffinityLM’s performance on the DAVIS and KIBA datasets, comparing it with several state-of-the-art models.

Method	MSE↓	CI↑	r_m²↑
KronRLS	0.379	0.871	0.407
SimBoost	0.282	0.872	0.644
DeepDTA	0.261	0.878	0.630
GANsDTA	0.276	0.881	0.653
GraphDTA	0.229	0.893	0.649
TF-DTA	0.231	0.886	0.670
MGraphDTA	0.207	0.900	0.710
WGNN-DTA	0.208	0.900	0.692
DGraphDTA	0.202	0.904	0.700
AffinityLM	0.205	0.899	0.712

Table: Performance Comparison on DAVIS Dataset. Best values are in bold and second-best values are underlined.

BibTeX citation

  
  @article {Bindwell2025,
	author = {Bindwell},
	title = {AffinityLM: Binding-Site Informed Multitask Language Model for Drug-Target Affinity Prediction},
	year = {2025},
	publisher = {Cold Spring Harbor Laboratory},
	journal = {bioRxiv}
}

  
  
@article {Rose2024.02.08.575577,
	author = {Rose, Tyler and Monti, Nicol{\`o} and Anand, Navvye and Shen, Tianyu},
	title = {PLAPT: Protein-Ligand Binding Affinity Prediction Using Pretrained Transformers},
	elocation-id = {2024.02.08.575577},
	year = {2024},
	doi = {10.1101/2024.02.08.575577},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/02/09/2024.02.08.575577},
	eprint = {https://www.biorxiv.org/content/early/2024/02/09/2024.02.08.575577.full.pdf},
	journal = {bioRxiv}
}