342 lines
11 KiB
Markdown
342 lines
11 KiB
Markdown
# PRODIGY Nextflow Pipeline
|
|
|
|
A Nextflow pipeline for predicting binding affinity of protein-protein complexes using PRODIGY (PROtein binDIng enerGY prediction).
|
|
|
|
## Overview
|
|
|
|
PRODIGY is a contact-based method for predicting the binding affinity of protein-protein complexes from their 3D structures. This pipeline containerizes PRODIGY using Docker and orchestrates execution through Nextflow, enabling reproducible, scalable analysis of protein-protein interactions.
|
|
|
|
### Key Features
|
|
|
|
- **Automated binding affinity prediction** from PDB/mmCIF structures
|
|
- **Batch processing** of multiple protein complexes
|
|
- **Docker containerization** for reproducibility
|
|
- **Configurable parameters** for distance cutoffs, temperature, and chain selection
|
|
- **Optional outputs** including contact lists and PyMOL visualization scripts
|
|
|
|
## Scientific Background
|
|
|
|
PRODIGY predicts binding affinity by analyzing intermolecular contacts (ICs) at protein-protein interfaces. The method:
|
|
|
|
1. Identifies residue-residue contacts within a distance threshold (default: 5.5 Å)
|
|
2. Classifies contacts by residue type (charged, polar, apolar)
|
|
3. Analyzes the non-interacting surface (NIS) composition
|
|
4. Predicts binding free energy (ΔG) and dissociation constant (Kd)
|
|
|
|
The 5.5 Å distance cutoff was optimized to capture various non-bonded interactions including salt bridges, hydrogen bonds, and hydrophobic contacts.
|
|
|
|
## Requirements
|
|
|
|
### Software Dependencies
|
|
|
|
- [Nextflow](https://www.nextflow.io/) (≥21.04.0)
|
|
- [Docker](https://www.docker.com/) (≥20.10) or [Singularity](https://sylabs.io/singularity/) (≥3.0)
|
|
|
|
### Hardware Requirements
|
|
|
|
- CPU: 1+ cores per process
|
|
- Memory: 4 GB minimum recommended
|
|
- Storage: ~2 GB for Docker image
|
|
|
|
## Installation
|
|
|
|
### 1. Clone or Download the Pipeline
|
|
|
|
```bash
|
|
# Create pipeline directory
|
|
mkdir -p /path/to/prodigy_pipeline
|
|
cd /path/to/prodigy_pipeline
|
|
|
|
# Copy pipeline files (Dockerfile, main.nf, nextflow.config, params.json)
|
|
```
|
|
|
|
### 2. Build the Docker Image
|
|
|
|
```bash
|
|
docker build -t prodigy:latest .
|
|
```
|
|
|
|
### 3. Verify Installation
|
|
|
|
```bash
|
|
# Test Docker image
|
|
docker run --rm prodigy:latest prodigy --help
|
|
|
|
# Test Nextflow
|
|
nextflow run main.nf --help
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Run on a single PDB file
|
|
nextflow run main.nf --pdb /path/to/complex.pdb --outdir /path/to/output
|
|
|
|
# Run on multiple PDB files
|
|
nextflow run main.nf --pdb '/path/to/structures/*.pdb' --outdir /path/to/output
|
|
```
|
|
|
|
### With Custom Parameters
|
|
|
|
```bash
|
|
nextflow run main.nf \
|
|
--pdb '/path/to/structures/*.pdb' \
|
|
--outdir /path/to/output \
|
|
--distance_cutoff 5.5 \
|
|
--acc_threshold 0.05 \
|
|
--temperature 37.0 \
|
|
--contact_list true \
|
|
--pymol_selection true
|
|
```
|
|
|
|
### Chain Selection for Complex Interfaces
|
|
|
|
For antibody-antigen complexes or multi-chain proteins:
|
|
|
|
```bash
|
|
# Contacts between chains A and B only
|
|
nextflow run main.nf --pdb complex.pdb --selection 'A B'
|
|
|
|
# Heavy (H) and Light (L) chains as one molecule vs Antigen (A)
|
|
nextflow run main.nf --pdb antibody_antigen.pdb --selection 'H,L A'
|
|
|
|
# Three-way interface calculation
|
|
nextflow run main.nf --pdb complex.pdb --selection 'A B C'
|
|
```
|
|
|
|
### Using Singularity
|
|
|
|
```bash
|
|
nextflow run main.nf -profile singularity --pdb /path/to/complex.pdb
|
|
```
|
|
|
|
## Parameters
|
|
|
|
### Required Parameters
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `--pdb` | Path to input PDB/mmCIF file(s). Supports glob patterns. | `/mnt/OmicNAS/private/old/olamide/Prodigy/input/*.pdb` |
|
|
| `--outdir` | Output directory for results | `/mnt/OmicNAS/private/old/olamide/Prodigy/output` |
|
|
|
|
### Analysis Parameters
|
|
|
|
| Parameter | Description | Default | Range |
|
|
|-----------|-------------|---------|-------|
|
|
| `--distance_cutoff` | Distance threshold (Å) for defining intermolecular contacts | `5.5` | 1.0 - 20.0 |
|
|
| `--acc_threshold` | Relative accessibility threshold for surface residue identification | `0.05` | 0.0 - 1.0 |
|
|
| `--temperature` | Temperature (°C) for Kd calculation | `25.0` | -273.15 - 100.0 |
|
|
| `--selection` | Chain selection for interface calculation | `''` (all chains) | See examples |
|
|
|
|
### Output Control Parameters
|
|
|
|
| Parameter | Description | Default |
|
|
|-----------|-------------|---------|
|
|
| `--contact_list` | Generate detailed contact list file | `false` |
|
|
| `--pymol_selection` | Generate PyMOL visualization script | `false` |
|
|
| `--quiet` | Output only affinity values (minimal output) | `false` |
|
|
|
|
## Output Files
|
|
|
|
### Standard Output
|
|
|
|
For each input structure `<name>.pdb`, the pipeline generates:
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `<name>_prodigy.txt` | Main results file with binding affinity prediction |
|
|
|
|
### Optional Output (when enabled)
|
|
|
|
| File | Description | Parameter |
|
|
|------|-------------|-----------|
|
|
| `<name>_contacts.txt` | List of all interface contacts | `--contact_list true` |
|
|
| `<name>_interface.pml` | PyMOL script for interface visualization | `--pymol_selection true` |
|
|
|
|
### Example Output
|
|
|
|
```
|
|
[!] Structure contains gaps:
|
|
E ILE16 < Fragment 0 > E ALA183
|
|
E TYR184 < Fragment 1 > E GLY187
|
|
|
|
[+] Executing 1 task(s) in total
|
|
##########################################
|
|
[+] Processing structure 1ppe_model0
|
|
[+] No. of intermolecular contacts: 86
|
|
[+] No. of charged-charged contacts: 5.0
|
|
[+] No. of charged-polar contacts: 10.0
|
|
[+] No. of charged-apolar contacts: 27.0
|
|
[+] No. of polar-polar contacts: 0.0
|
|
[+] No. of apolar-polar contacts: 20.0
|
|
[+] No. of apolar-apolar contacts: 24.0
|
|
[+] Percentage of apolar NIS residues: 34.10
|
|
[+] Percentage of charged NIS residues: 18.50
|
|
[++] Predicted binding affinity (kcal.mol-1): -14.7
|
|
[++] Predicted dissociation constant (M) at 25.0˚C: 1.6e-11
|
|
```
|
|
|
|
### Output Interpretation
|
|
|
|
| Metric | Description |
|
|
|--------|-------------|
|
|
| **Intermolecular contacts** | Total number of residue-residue contacts at interface |
|
|
| **Contact types** | Breakdown by residue character (charged/polar/apolar) |
|
|
| **NIS residues** | Composition of non-interacting surface |
|
|
| **Binding affinity (ΔG)** | Predicted free energy of binding (kcal/mol). More negative = stronger binding |
|
|
| **Dissociation constant (Kd)** | Predicted Kd at specified temperature. Lower = tighter binding |
|
|
|
|
### Binding Affinity Scale
|
|
|
|
| ΔG (kcal/mol) | Kd (M) | Binding Strength |
|
|
|---------------|--------|------------------|
|
|
| -6 to -8 | 10⁻⁵ to 10⁻⁶ | Moderate |
|
|
| -8 to -10 | 10⁻⁶ to 10⁻⁷ | Strong |
|
|
| -10 to -12 | 10⁻⁷ to 10⁻⁹ | Very Strong |
|
|
| < -12 | < 10⁻⁹ | Extremely Strong |
|
|
|
|
## Test Data
|
|
|
|
Download example protein complexes from the RCSB PDB:
|
|
|
|
```bash
|
|
# Create input directory
|
|
mkdir -p /mnt/OmicNAS/private/old/olamide/Prodigy/input
|
|
|
|
# Download test structures
|
|
wget -O /mnt/OmicNAS/private/old/olamide/Prodigy/input/3bzd.pdb https://files.rcsb.org/download/3BZD.pdb
|
|
wget -O /mnt/OmicNAS/private/old/olamide/Prodigy/input/2oob.pdb https://files.rcsb.org/download/2OOB.pdb
|
|
wget -O /mnt/OmicNAS/private/old/olamide/Prodigy/input/1ppe.pdb https://files.rcsb.org/download/1PPE.pdb
|
|
```
|
|
|
|
### Expected Results
|
|
|
|
| Structure | Description | Expected ΔG (kcal/mol) |
|
|
|-----------|-------------|------------------------|
|
|
| 3BZD | Protein-protein complex | -9.4 |
|
|
| 2OOB | Protein-protein complex | -6.2 |
|
|
| 1PPE | Trypsin-inhibitor complex | -14.7 |
|
|
|
|
## Pipeline Structure
|
|
|
|
```
|
|
prodigy_pipeline/
|
|
├── Dockerfile # Docker image definition
|
|
├── main.nf # Nextflow pipeline script
|
|
├── nextflow.config # Pipeline configuration
|
|
├── params.json # Parameter documentation
|
|
└── README.md # This file
|
|
```
|
|
|
|
## Docker Image Details
|
|
|
|
The Docker image is based on Python 3.12 and includes:
|
|
|
|
- **prodigy-prot** (v2.4.0) - Main PRODIGY package
|
|
- **biopython** (≥1.80) - PDB structure parsing
|
|
- **freesasa** (≥2.2.1) - Solvent accessible surface area calculation
|
|
- **numpy** (≥2) - Numerical computations
|
|
|
|
### Building the Image
|
|
|
|
```bash
|
|
docker build -t prodigy:latest .
|
|
```
|
|
|
|
### Running Standalone
|
|
|
|
```bash
|
|
# Run PRODIGY directly
|
|
docker run --rm -v /path/to/data:/data prodigy:latest prodigy /data/complex.pdb
|
|
|
|
# Get help
|
|
docker run --rm prodigy:latest prodigy --help
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**1. Docker Hub Rate Limit Error**
|
|
```
|
|
ERROR: toomanyrequests: You have reached your pull rate limit
|
|
```
|
|
Solution: Log in to Docker Hub with `docker login` or wait and retry.
|
|
|
|
**2. Structure Contains Gaps Warning**
|
|
```
|
|
[!] Structure contains gaps
|
|
```
|
|
This is informational, not an error. PRODIGY handles missing residues automatically.
|
|
|
|
**3. No Intermolecular Contacts Found**
|
|
- Verify the structure contains multiple chains
|
|
- Check chain selection parameters
|
|
- Ensure chains are in contact (within distance cutoff)
|
|
|
|
**4. Permission Denied Errors**
|
|
```bash
|
|
# Run with user permissions
|
|
docker run --rm -u $(id -u):$(id -g) -v /path/to/data:/data prodigy:latest prodigy /data/complex.pdb
|
|
```
|
|
|
|
### Getting Help
|
|
|
|
```bash
|
|
# PRODIGY help
|
|
docker run --rm prodigy:latest prodigy --help
|
|
|
|
# Nextflow pipeline help
|
|
nextflow run main.nf --help
|
|
```
|
|
|
|
## Citation
|
|
|
|
If you use this pipeline, please cite the following publications:
|
|
|
|
### PRODIGY Method
|
|
|
|
1. **Xue LC, Rodrigues JP, Kastritis PL, Bonvin AM, Vangone A.** (2016)
|
|
PRODIGY: a web server for predicting the binding affinity of protein-protein complexes.
|
|
*Bioinformatics*, 32(23):3676-3678.
|
|
[DOI: 10.1093/bioinformatics/btw514](https://doi.org/10.1093/bioinformatics/btw514)
|
|
|
|
2. **Vangone A, Bonvin AM.** (2015)
|
|
Contacts-based prediction of binding affinity in protein-protein complexes.
|
|
*eLife*, 4:e07454.
|
|
[DOI: 10.7554/eLife.07454](https://doi.org/10.7554/eLife.07454)
|
|
|
|
3. **Kastritis PL, Rodrigues JP, Folkers GE, Boelens R, Bonvin AM.** (2014)
|
|
Proteins feel more than they see: Fine-tuning of binding affinity by properties of the non-interacting surface.
|
|
*Journal of Molecular Biology*, 426(14):2632-2652.
|
|
[DOI: 10.1016/j.jmb.2014.04.017](https://doi.org/10.1016/j.jmb.2014.04.017)
|
|
|
|
### Software Dependencies
|
|
|
|
- **Nextflow**: Di Tommaso P, et al. (2017) Nextflow enables reproducible computational workflows. *Nature Biotechnology*, 35:316-319.
|
|
- **Biopython**: Cock PJ, et al. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. *Bioinformatics*, 25(11):1422-1423.
|
|
- **FreeSASA**: Mitternacht S. (2016) FreeSASA: An open source C library for solvent accessible surface area calculations. *F1000Research*, 5:189.
|
|
|
|
## License
|
|
|
|
This pipeline is distributed under the Apache License 2.0, consistent with the PRODIGY software license.
|
|
|
|
## Links
|
|
|
|
- **PRODIGY Web Server**: [https://wenmr.science.uu.nl/prodigy/](https://wenmr.science.uu.nl/prodigy/)
|
|
- **PRODIGY GitHub**: [https://github.com/haddocking/prodigy](https://github.com/haddocking/prodigy)
|
|
- **BonvinLab**: [https://www.bonvinlab.org/](https://www.bonvinlab.org/)
|
|
- **Nextflow**: [https://www.nextflow.io/](https://www.nextflow.io/)
|
|
|
|
## Support
|
|
|
|
For questions about:
|
|
- **PRODIGY method**: Contact the BonvinLab team at [ask.bioexcel.eu](https://ask.bioexcel.eu/)
|
|
- **This pipeline**: Open an issue in the repository
|
|
|
|
---
|
|
|
|
*Pipeline version: 2.4.0 | Last updated: January 2026*
|