Drug-target affinity (DTA) prediction is crucial for drug discovery and repurposing. Current DTA prediction models are trained on datasets biased towards abundance of unique molecules relative to target proteins. Such models excel at molecule screening but struggle with inverse drug screening and generalization to novel targets. This data bias limitation hinders drug repurposing efforts and the development of truly general solutions capable of understanding interactions between any given drug and target. To address these challenges, we present AffinityLM, a multitask transformer model that creates comprehensive joint-feature representations of drug-target compounds. Our approach leverages both molecule-biased binding affinity and protein-biased binding site data, thus balancing the protein-molecule data distribution. This approach enables learning from a significantly more diverse dataset with more drug-target pairs, with the potential to improve accuracy and generalization. Despite being trained on a relatively smaller subset of data, comparative evaluations against state-of-the-art models on standard benchmarks like DAVIS and KIBA show that AffinityLM matches or surpasses existing models for binding affinity prediction. Our results demonstrate that models forced to learn rich feature spaces for drug-target interactions offer superior performance and versatility in DTA prediction tasks, paving the way for more effective and broadly applicable drug discovery strategies.
Drug-target affinity (DTA) prediction is crucial for drug discovery, but traditional experimental methods are costly and time-consuming. While earlier computational methods relied on 3D structural information, the limited availability of such data has led to increased interest in 1D sequence representations for large-scale DTA prediction. Machine learning, particularly transformer models, has emerged as a promising approach for DTA prediction.
Transformers excel at processing 1D sequential data like SMILES and protein sequences, and can leverage pre-training on large-scale datasets. However, generalizability remains a challenge, especially for novel compounds or proteins.
Current DTA prediction models are trained on datasets biased towards molecules, hindering inverse drug screening and generalization to new targets. Previous efforts to address this issue have not fully leveraged available molecular and protein data.
We present AffinityLM, a novel multitask transformer model for DTA prediction. By combining molecule-biased binding affinity data with protein-biased binding site data, AffinityLM balances the protein-molecule distribution. This approach allows exposure to an unprecedented scale of unique drug-target pairs, potentially enhancing generalization. AffinityLM’s multitask architecture forces the model to learn comprehensive representations of drug-target compounds, improving performance on novel drugs and targets.
AffinityLM is designed to address the challenges in DTA prediction by leveraging a multitask learning framework that integrates both DTA prediction and BSP. This dual-task approach enables the model to learn richer and more nuanced representations of drug-target interactions. By employing pretrained transformers for the initial feature extraction of protein and molecule sequences, AffinityLM capitalizes on the transformers’ ability to capture intricate sequence dependencies. This setup not only enhances the model’s predictive accuracy but also improves its generalization capabilities across diverse DTPs.
Mathematically, our model can be defined as a function that takes as input a protein sequence and a molecule sequence , and outputs both a binding affinity prediction and a binding site prediction ,
Here, we have , , , and , and represent the lengths of the protein and molecule sequences, respectively.
For batched inputs, we extend this formulation:
where , , , and , with representing the batch size.
Building upon the multitask specification, we now detail the architecture of AffinityLM, consisting of three main components: individual feature extraction, joint feature extraction, and task-specific prediction heads.
Pretrained transformers have shown remarkable generalization and accuracy in natural language processing tasks, and recent studies have successfully applied them to DTA prediction. We leverage these pretrained models for individual feature extraction of proteins and molecules, yielding rich representations. This approach offers two key benefits: it allows for caching individual embeddings (thus reducing computational costs for repeated analyses) and enhances generalization through models pretrained on extensive datasets.
For protein feature extraction, we use the Ankh transformer, while for molecule feature extraction, we use MolFormer. The process for each is as follows:
To address the issue of padded embeddings, we apply a mask to the batches: , where is the embedding tensor and is a binary mask tensor of the same shape as . The mask is 1 for valid tokens and 0 for padding tokens.
Based on previous research, we hypothesize that a multitask approach for DTA models can lead to a more comprehensive understanding of drug-target interactions. By predicting binding sites, the model may develop a better grasp of the spatial and chemical properties that contribute to binding, which could in turn improve its binding affinity predictions. Furthermore, because binding site prediction can be formulated as a token classification problem, it is well suited for sequence-based transformer models, which have exhibited high performance in recent literature. Besides, a multitask architecture may not only improve DTA prediction performance but also enhance the model’s potential as a feature extractor for other drug-target pair attributes, potentially providing a rich internal representation of drug-target compounds that can be valuable for various downstream tasks in drug discovery. The multitask learning (MTL) approach in AffinityLM is designed to enhance the model’s understanding of drug-target interactions by simultaneously predicting binding affinity and identifying binding sites. This strategy is motivated by the hypothesis that learning related tasks in parallel can lead to improved performance and generalization.
Binding Affinity Prediction: A regression task to predict the binding affinity between a protein and a molecule .
Binding Site Prediction: A sequence labeling task to predict which amino acids in the protein sequence are part of the binding site for the molecule .
Mathematically, we can formulate our multitask learning objective as follows:
For the binding affinity prediction task, we define the loss as the mean squared error (MSE) between the true affinity value and the predicted affinity .
The key hyperparameters and training settings used to train AffinityLM are summarized in the following table:
Parameter | Value |
---|---|
Embedding dimension | 768 |
Number of attention layers | 4 |
Number of attention heads | 8 |
Feedforward expansion rate | 4 |
Dropout rate | 0.2 |
Maximum sequence length | 4200 |
Optimizer | AdamW (fused) |
Learning rate | 5e-4 |
Batch size (affinity and binding site) | 56 |
Number of epochs | 3 |
Affinity loss weight | 2.0 |
Binding site loss weight | 0.5 |
We evaluated AffinityLM’s performance on the DAVIS and KIBA datasets, comparing it with several state-of-the-art models.
Method | MSE↓ | CI↑ | rm2↑ |
---|---|---|---|
KronRLS | 0.379 | 0.871 | 0.407 |
SimBoost | 0.282 | 0.872 | 0.644 |
DeepDTA | 0.261 | 0.878 | 0.630 |
GANsDTA | 0.276 | 0.881 | 0.653 |
GraphDTA | 0.229 | 0.893 | 0.649 |
TF-DTA | 0.231 | 0.886 | 0.670 |
MGraphDTA | 0.207 | 0.900 | 0.710 |
WGNN-DTA | 0.208 | 0.900 | 0.692 |
DGraphDTA | 0.202 | 0.904 | 0.700 |
AffinityLM | 0.205 | 0.899 | 0.712 |
Table: Performance Comparison on DAVIS Dataset. Best values are in bold and second-best values are underlined.
@article {Bindwell2025,
author = {Bindwell},
title = {AffinityLM: Binding-Site Informed Multitask Language Model for Drug-Target Affinity Prediction},
year = {2025},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}
}
@article {Rose2024.02.08.575577,
author = {Rose, Tyler and Monti, Nicol{\`o} and Anand, Navvye and Shen, Tianyu},
title = {PLAPT: Protein-Ligand Binding Affinity Prediction Using Pretrained Transformers},
elocation-id = {2024.02.08.575577},
year = {2024},
doi = {10.1101/2024.02.08.575577},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/02/09/2024.02.08.575577},
eprint = {https://www.biorxiv.org/content/early/2024/02/09/2024.02.08.575577.full.pdf},
journal = {bioRxiv}
}