Files
immunebuilder/README.md
Olamide Isreal 8887cbe592
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Configure ImmuneBuilder pipeline for WES execution
- Update container image to harbor.cluster.omic.ai/omic/immunebuilder:latest
- Update input/output paths to S3 (s3://omic/eureka/immunebuilder/)
- Remove local mount containerOptions (not needed in k8s)
- Update homepage to Gitea repo URL
- Clean history to remove large model weight blobs
2026-03-16 15:31:53 +01:00

244 lines
7.1 KiB
Markdown

# ImmuneBuilder Nextflow Pipeline
A Nextflow pipeline for predicting the structures of immune proteins using ImmuneBuilder, including antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2), and T-cell receptors (TCRBuilder2).
## Overview
ImmuneBuilder is a set of deep learning models trained to accurately predict the structure of immune receptor proteins. This pipeline packages ImmuneBuilder for use with Nextflow and Docker for reproducible and scalable structure predictions.
## Features
- **ABodyBuilder2**: Predicts antibody structures with state-of-the-art accuracy (CDR-H3 RMSD: 2.81Å)
- **NanoBodyBuilder2**: Predicts nanobody structures (CDR-H3 RMSD: 2.89Å)
- **TCRBuilder2/TCRBuilder2+**: Predicts T-cell receptor structures with updated weights
## Requirements
- [Nextflow](https://www.nextflow.io/) (>=21.04.0)
- [Docker](https://www.docker.com/) or [Singularity](https://sylabs.io/singularity/)
## Installation
### Build Docker Image
**Important**: Use `--no-cache` to ensure model weights are downloaded properly.
```bash
docker build --no-cache -t immunebuilder:latest .
```
The build process will:
1. Install all dependencies (PyTorch, OpenMM, pdbfixer, ANARCI)
2. Pre-download all model weights (~500MB) from Zenodo to avoid rate limiting at runtime
Build time: approximately 10-15 minutes depending on network speed.
## Usage
### Setup Directories
```bash
# Create input/output directories
mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/input
mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/output
```
### Basic Usage
```bash
# Predict antibody structure (default mode)
nextflow run main.nf --fasta /path/to/antibody.fasta --mode antibody --outdir ./results
# Predict nanobody structure
nextflow run main.nf --fasta /path/to/nanobody.fasta --mode nanobody --outdir ./results
# Predict TCR structure
nextflow run main.nf --fasta /path/to/tcr.fasta --mode tcr --outdir ./results
```
### With GPU Support
```bash
nextflow run main.nf -profile gpu --fasta /path/to/sequences.fasta --mode antibody
```
### Process Multiple Files
```bash
nextflow run main.nf --fasta '/path/to/sequences/*.fasta' --mode antibody --outdir ./results
```
## Parameters
| Parameter | Description | Default | Required |
|-----------|-------------|---------|----------|
| `--fasta` | Input FASTA file(s) | null | Yes |
| `--outdir` | Output directory | ./results | Yes |
| `--mode` | Prediction mode: `antibody`, `nanobody`, or `tcr` | antibody | Yes |
| `--verbose` | Enable verbose output | true | No |
| `--original_weights` | Use original TCRBuilder2 weights (TCR mode only) | false | No |
## Input FASTA Format Requirements
### Important: Variable Region Sequences Only
ImmuneBuilder expects **variable region (Fv) sequences only**, not full-length antibody sequences. Each chain should be approximately:
- **Heavy chain (H)**: 110-130 amino acids
- **Light chain (L)**: 105-115 amino acids
- **Alpha chain (A)**: 105-115 amino acids
- **Beta chain (B)**: 110-120 amino acids
If you have full-length sequences from RCSB/PDB, extract only the variable region.
### Antibody (ABodyBuilder2)
```
>H
EVQLVESGGGVVQPGGSLRLSCAASGFTFNSYGMHWVRQAPGKGLEWVAFIRYDGGNKYYADSVKGRFTISRDNSKNTLYLQMKSLRAEDTAVYYCANLKDSRYSGSYYDYWGQGTLVTVSS
>L
DIQMTQSPSSLSASVGDRVTITCQASQDIRFYLNWYQQKPGKAPKLLISDASNMETGVPSRFSGSGSGTDFTFTISSLQPEDIATYYCQQYDNLPFTFGPGTKVDFK
```
### Nanobody (NanoBodyBuilder2)
```
>H
QVQLVESGGGLVQPGESLRLSCAASGSIFGIYAVHWFRMAPGKEREFTAGFGSHGSTNYAASVKGRFTMSRDNAKNTTYLQMNSLKPADTAVYYCHALIKNELGFLDYWGPGTQVTVSS
```
### TCR (TCRBuilder2)
```
>A
AQSVTQLGSHVSVSEGALVLLRCNYSSSVPPYLFWYVQYPNQGLQLLLKYTSAATLVKGINGFEAEFKKSETSFHLTKPSAHMSDAAEYFCAVSEQDDKIIFGKGTRLHILP
>B
ADVTQTPRNRITKTGKRIMLECSQTKGHDRMYWYRQDPGLGLRLIYYSFDVKDINKGEISDGYSVSRQAQAKFSLSLESAIPNQTALYFCATSDESYGYTFGSGTRLTVV
```
## Example Input Files
Example FASTA files are provided in the `input/` directory:
- `antibody_test.fasta` - Example antibody with H and L chains
- `nanobody_test.fasta` - Example nanobody with H chain only
- `tcr_test.fasta` - Example TCR with A and B chains
## Output
The pipeline produces:
| File | Description |
|------|-------------|
| `{sample_name}.pdb` | Predicted 3D structure in PDB format |
| `run.log` | Execution log with prediction details |
| `pipeline_info/report.html` | Detailed execution report |
| `pipeline_info/timeline.html` | Timeline visualization |
| `pipeline_info/dag.html` | Workflow DAG visualization |
| `pipeline_info/trace.txt` | Execution trace |
### Visualizing Output Structures
```bash
# Using PyMOL
pymol output.pdb
# Using ChimeraX
chimerax output.pdb
```
Or upload to online viewers:
- [Mol* Viewer](https://molstar.org/viewer/)
- [NGL Viewer](https://nglviewer.org/)
## Profiles
| Profile | Description |
|---------|-------------|
| `standard` | Default Docker execution (CPU) |
| `gpu` | Docker execution with GPU support |
| `singularity` | Singularity container execution |
| `conda` | Conda environment execution |
## Performance
Typical prediction times on CPU:
| Mode | Duration | Output Size |
|------|----------|-------------|
| Antibody | ~3 min | ~280 KB |
| Nanobody | ~5 min | ~140 KB |
| TCR | ~5 min | ~280 KB |
## Troubleshooting
### Error: 429 Too Many Requests
If you see rate limiting errors from Zenodo:
```
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests
```
**Solution**: Rebuild the Docker image with `--no-cache`:
```bash
docker build --no-cache -t immunebuilder:latest .
```
### Error: KeyError 'H' or 'A' or 'L' or 'B'
The FASTA file has incorrect chain labels.
**Solution**: Ensure correct labels:
- Antibody: `>H` and `>L`
- Nanobody: `>H` only
- TCR: `>A` and `>B`
### Error: Sequences too long
ImmuneBuilder expects variable region sequences only (~110-130 aa).
**Solution**: Extract only the Fv (variable fragment) portion from full-length sequences.
### Error: Missing output file
Check `run.log` in the output directory for detailed error messages:
```bash
cat /path/to/output/run.log
```
## Citation
If you use this pipeline, please cite:
```bibtex
@article{Abanades2023,
author = {Abanades, Brennan and Wong, Wing Ki and Boyles, Fergus and Georges, Guy and Bujotzek, Alexander and Deane, Charlotte M.},
doi = {10.1038/s42003-023-04927-7},
journal = {Communications Biology},
number = {1},
pages = {575},
title = {ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins},
volume = {6},
year = {2023}
}
```
For TCRBuilder2+:
```bibtex
@article{Quast2024,
author = {Quast, Nele P. and Abanades, Brennan and Guloglu, Bora and Karuppiah, Vijaykumar and Harper, Stephen and Raybould, Matthew I. J. and Deane, Charlotte M.},
title = {T-cell receptor structures and predictive models reveal comparable alpha and beta chain structural diversity despite differing genetic complexity},
year = {2024},
doi = {10.1101/2024.05.20.594940},
journal = {bioRxiv}
}
```
## License
BSD 3-clause license (same as ImmuneBuilder)
## Links
- [ImmuneBuilder GitHub](https://github.com/oxpig/ImmuneBuilder)
- [ImmuneBuilder Paper](https://doi.org/10.1038/s42003-023-04927-7)
- [Google Colab Demo](https://colab.research.google.com/github/brennanaba/ImmuneBuilder/blob/main/notebook/ImmuneBuilder.ipynb)