Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
- Update container image to harbor.cluster.omic.ai/omic/immunebuilder:latest - Update input/output paths to S3 (s3://omic/eureka/immunebuilder/) - Remove local mount containerOptions (not needed in k8s) - Update homepage to Gitea repo URL - Clean history to remove large model weight blobs
244 lines
7.1 KiB
Markdown
244 lines
7.1 KiB
Markdown
# ImmuneBuilder Nextflow Pipeline
|
|
|
|
A Nextflow pipeline for predicting the structures of immune proteins using ImmuneBuilder, including antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2), and T-cell receptors (TCRBuilder2).
|
|
|
|
## Overview
|
|
|
|
ImmuneBuilder is a set of deep learning models trained to accurately predict the structure of immune receptor proteins. This pipeline packages ImmuneBuilder for use with Nextflow and Docker for reproducible and scalable structure predictions.
|
|
|
|
## Features
|
|
|
|
- **ABodyBuilder2**: Predicts antibody structures with state-of-the-art accuracy (CDR-H3 RMSD: 2.81Å)
|
|
- **NanoBodyBuilder2**: Predicts nanobody structures (CDR-H3 RMSD: 2.89Å)
|
|
- **TCRBuilder2/TCRBuilder2+**: Predicts T-cell receptor structures with updated weights
|
|
|
|
## Requirements
|
|
|
|
- [Nextflow](https://www.nextflow.io/) (>=21.04.0)
|
|
- [Docker](https://www.docker.com/) or [Singularity](https://sylabs.io/singularity/)
|
|
|
|
## Installation
|
|
|
|
### Build Docker Image
|
|
|
|
**Important**: Use `--no-cache` to ensure model weights are downloaded properly.
|
|
|
|
```bash
|
|
docker build --no-cache -t immunebuilder:latest .
|
|
```
|
|
|
|
The build process will:
|
|
1. Install all dependencies (PyTorch, OpenMM, pdbfixer, ANARCI)
|
|
2. Pre-download all model weights (~500MB) from Zenodo to avoid rate limiting at runtime
|
|
|
|
Build time: approximately 10-15 minutes depending on network speed.
|
|
|
|
## Usage
|
|
|
|
### Setup Directories
|
|
|
|
```bash
|
|
# Create input/output directories
|
|
mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/input
|
|
mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/output
|
|
```
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Predict antibody structure (default mode)
|
|
nextflow run main.nf --fasta /path/to/antibody.fasta --mode antibody --outdir ./results
|
|
|
|
# Predict nanobody structure
|
|
nextflow run main.nf --fasta /path/to/nanobody.fasta --mode nanobody --outdir ./results
|
|
|
|
# Predict TCR structure
|
|
nextflow run main.nf --fasta /path/to/tcr.fasta --mode tcr --outdir ./results
|
|
```
|
|
|
|
### With GPU Support
|
|
|
|
```bash
|
|
nextflow run main.nf -profile gpu --fasta /path/to/sequences.fasta --mode antibody
|
|
```
|
|
|
|
### Process Multiple Files
|
|
|
|
```bash
|
|
nextflow run main.nf --fasta '/path/to/sequences/*.fasta' --mode antibody --outdir ./results
|
|
```
|
|
|
|
## Parameters
|
|
|
|
| Parameter | Description | Default | Required |
|
|
|-----------|-------------|---------|----------|
|
|
| `--fasta` | Input FASTA file(s) | null | Yes |
|
|
| `--outdir` | Output directory | ./results | Yes |
|
|
| `--mode` | Prediction mode: `antibody`, `nanobody`, or `tcr` | antibody | Yes |
|
|
| `--verbose` | Enable verbose output | true | No |
|
|
| `--original_weights` | Use original TCRBuilder2 weights (TCR mode only) | false | No |
|
|
|
|
## Input FASTA Format Requirements
|
|
|
|
### Important: Variable Region Sequences Only
|
|
|
|
ImmuneBuilder expects **variable region (Fv) sequences only**, not full-length antibody sequences. Each chain should be approximately:
|
|
- **Heavy chain (H)**: 110-130 amino acids
|
|
- **Light chain (L)**: 105-115 amino acids
|
|
- **Alpha chain (A)**: 105-115 amino acids
|
|
- **Beta chain (B)**: 110-120 amino acids
|
|
|
|
If you have full-length sequences from RCSB/PDB, extract only the variable region.
|
|
|
|
### Antibody (ABodyBuilder2)
|
|
|
|
```
|
|
>H
|
|
EVQLVESGGGVVQPGGSLRLSCAASGFTFNSYGMHWVRQAPGKGLEWVAFIRYDGGNKYYADSVKGRFTISRDNSKNTLYLQMKSLRAEDTAVYYCANLKDSRYSGSYYDYWGQGTLVTVSS
|
|
>L
|
|
DIQMTQSPSSLSASVGDRVTITCQASQDIRFYLNWYQQKPGKAPKLLISDASNMETGVPSRFSGSGSGTDFTFTISSLQPEDIATYYCQQYDNLPFTFGPGTKVDFK
|
|
```
|
|
|
|
### Nanobody (NanoBodyBuilder2)
|
|
|
|
```
|
|
>H
|
|
QVQLVESGGGLVQPGESLRLSCAASGSIFGIYAVHWFRMAPGKEREFTAGFGSHGSTNYAASVKGRFTMSRDNAKNTTYLQMNSLKPADTAVYYCHALIKNELGFLDYWGPGTQVTVSS
|
|
```
|
|
|
|
### TCR (TCRBuilder2)
|
|
|
|
```
|
|
>A
|
|
AQSVTQLGSHVSVSEGALVLLRCNYSSSVPPYLFWYVQYPNQGLQLLLKYTSAATLVKGINGFEAEFKKSETSFHLTKPSAHMSDAAEYFCAVSEQDDKIIFGKGTRLHILP
|
|
>B
|
|
ADVTQTPRNRITKTGKRIMLECSQTKGHDRMYWYRQDPGLGLRLIYYSFDVKDINKGEISDGYSVSRQAQAKFSLSLESAIPNQTALYFCATSDESYGYTFGSGTRLTVV
|
|
```
|
|
|
|
## Example Input Files
|
|
|
|
Example FASTA files are provided in the `input/` directory:
|
|
- `antibody_test.fasta` - Example antibody with H and L chains
|
|
- `nanobody_test.fasta` - Example nanobody with H chain only
|
|
- `tcr_test.fasta` - Example TCR with A and B chains
|
|
|
|
## Output
|
|
|
|
The pipeline produces:
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `{sample_name}.pdb` | Predicted 3D structure in PDB format |
|
|
| `run.log` | Execution log with prediction details |
|
|
| `pipeline_info/report.html` | Detailed execution report |
|
|
| `pipeline_info/timeline.html` | Timeline visualization |
|
|
| `pipeline_info/dag.html` | Workflow DAG visualization |
|
|
| `pipeline_info/trace.txt` | Execution trace |
|
|
|
|
### Visualizing Output Structures
|
|
|
|
```bash
|
|
# Using PyMOL
|
|
pymol output.pdb
|
|
|
|
# Using ChimeraX
|
|
chimerax output.pdb
|
|
```
|
|
|
|
Or upload to online viewers:
|
|
- [Mol* Viewer](https://molstar.org/viewer/)
|
|
- [NGL Viewer](https://nglviewer.org/)
|
|
|
|
## Profiles
|
|
|
|
| Profile | Description |
|
|
|---------|-------------|
|
|
| `standard` | Default Docker execution (CPU) |
|
|
| `gpu` | Docker execution with GPU support |
|
|
| `singularity` | Singularity container execution |
|
|
| `conda` | Conda environment execution |
|
|
|
|
## Performance
|
|
|
|
Typical prediction times on CPU:
|
|
|
|
| Mode | Duration | Output Size |
|
|
|------|----------|-------------|
|
|
| Antibody | ~3 min | ~280 KB |
|
|
| Nanobody | ~5 min | ~140 KB |
|
|
| TCR | ~5 min | ~280 KB |
|
|
|
|
## Troubleshooting
|
|
|
|
### Error: 429 Too Many Requests
|
|
|
|
If you see rate limiting errors from Zenodo:
|
|
```
|
|
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests
|
|
```
|
|
|
|
**Solution**: Rebuild the Docker image with `--no-cache`:
|
|
```bash
|
|
docker build --no-cache -t immunebuilder:latest .
|
|
```
|
|
|
|
### Error: KeyError 'H' or 'A' or 'L' or 'B'
|
|
|
|
The FASTA file has incorrect chain labels.
|
|
|
|
**Solution**: Ensure correct labels:
|
|
- Antibody: `>H` and `>L`
|
|
- Nanobody: `>H` only
|
|
- TCR: `>A` and `>B`
|
|
|
|
### Error: Sequences too long
|
|
|
|
ImmuneBuilder expects variable region sequences only (~110-130 aa).
|
|
|
|
**Solution**: Extract only the Fv (variable fragment) portion from full-length sequences.
|
|
|
|
### Error: Missing output file
|
|
|
|
Check `run.log` in the output directory for detailed error messages:
|
|
```bash
|
|
cat /path/to/output/run.log
|
|
```
|
|
|
|
## Citation
|
|
|
|
If you use this pipeline, please cite:
|
|
|
|
```bibtex
|
|
@article{Abanades2023,
|
|
author = {Abanades, Brennan and Wong, Wing Ki and Boyles, Fergus and Georges, Guy and Bujotzek, Alexander and Deane, Charlotte M.},
|
|
doi = {10.1038/s42003-023-04927-7},
|
|
journal = {Communications Biology},
|
|
number = {1},
|
|
pages = {575},
|
|
title = {ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins},
|
|
volume = {6},
|
|
year = {2023}
|
|
}
|
|
```
|
|
|
|
For TCRBuilder2+:
|
|
```bibtex
|
|
@article{Quast2024,
|
|
author = {Quast, Nele P. and Abanades, Brennan and Guloglu, Bora and Karuppiah, Vijaykumar and Harper, Stephen and Raybould, Matthew I. J. and Deane, Charlotte M.},
|
|
title = {T-cell receptor structures and predictive models reveal comparable alpha and beta chain structural diversity despite differing genetic complexity},
|
|
year = {2024},
|
|
doi = {10.1101/2024.05.20.594940},
|
|
journal = {bioRxiv}
|
|
}
|
|
```
|
|
|
|
## License
|
|
|
|
BSD 3-clause license (same as ImmuneBuilder)
|
|
|
|
## Links
|
|
|
|
- [ImmuneBuilder GitHub](https://github.com/oxpig/ImmuneBuilder)
|
|
- [ImmuneBuilder Paper](https://doi.org/10.1038/s42003-023-04927-7)
|
|
- [Google Colab Demo](https://colab.research.google.com/github/brennanaba/ImmuneBuilder/blob/main/notebook/ImmuneBuilder.ipynb)
|