- Update container image to harbor.cluster.omic.ai/omic/immunebuilder:latest - Update input/output paths to S3 (s3://omic/eureka/immunebuilder/) - Remove local mount containerOptions (not needed in k8s) - Update homepage to Gitea repo URL - Clean history to remove large model weight blobs
ImmuneBuilder Nextflow Pipeline
A Nextflow pipeline for predicting the structures of immune proteins using ImmuneBuilder, including antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2), and T-cell receptors (TCRBuilder2).
Overview
ImmuneBuilder is a set of deep learning models trained to accurately predict the structure of immune receptor proteins. This pipeline packages ImmuneBuilder for use with Nextflow and Docker for reproducible and scalable structure predictions.
Features
- ABodyBuilder2: Predicts antibody structures with state-of-the-art accuracy (CDR-H3 RMSD: 2.81Å)
- NanoBodyBuilder2: Predicts nanobody structures (CDR-H3 RMSD: 2.89Å)
- TCRBuilder2/TCRBuilder2+: Predicts T-cell receptor structures with updated weights
Requirements
- Nextflow (>=21.04.0)
- Docker or Singularity
Installation
Build Docker Image
Important: Use --no-cache to ensure model weights are downloaded properly.
docker build --no-cache -t immunebuilder:latest .
The build process will:
- Install all dependencies (PyTorch, OpenMM, pdbfixer, ANARCI)
- Pre-download all model weights (~500MB) from Zenodo to avoid rate limiting at runtime
Build time: approximately 10-15 minutes depending on network speed.
Usage
Setup Directories
# Create input/output directories
mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/input
mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/output
Basic Usage
# Predict antibody structure (default mode)
nextflow run main.nf --fasta /path/to/antibody.fasta --mode antibody --outdir ./results
# Predict nanobody structure
nextflow run main.nf --fasta /path/to/nanobody.fasta --mode nanobody --outdir ./results
# Predict TCR structure
nextflow run main.nf --fasta /path/to/tcr.fasta --mode tcr --outdir ./results
With GPU Support
nextflow run main.nf -profile gpu --fasta /path/to/sequences.fasta --mode antibody
Process Multiple Files
nextflow run main.nf --fasta '/path/to/sequences/*.fasta' --mode antibody --outdir ./results
Parameters
| Parameter | Description | Default | Required |
|---|---|---|---|
--fasta |
Input FASTA file(s) | null | Yes |
--outdir |
Output directory | ./results | Yes |
--mode |
Prediction mode: antibody, nanobody, or tcr |
antibody | Yes |
--verbose |
Enable verbose output | true | No |
--original_weights |
Use original TCRBuilder2 weights (TCR mode only) | false | No |
Input FASTA Format Requirements
Important: Variable Region Sequences Only
ImmuneBuilder expects variable region (Fv) sequences only, not full-length antibody sequences. Each chain should be approximately:
- Heavy chain (H): 110-130 amino acids
- Light chain (L): 105-115 amino acids
- Alpha chain (A): 105-115 amino acids
- Beta chain (B): 110-120 amino acids
If you have full-length sequences from RCSB/PDB, extract only the variable region.
Antibody (ABodyBuilder2)
>H
EVQLVESGGGVVQPGGSLRLSCAASGFTFNSYGMHWVRQAPGKGLEWVAFIRYDGGNKYYADSVKGRFTISRDNSKNTLYLQMKSLRAEDTAVYYCANLKDSRYSGSYYDYWGQGTLVTVSS
>L
DIQMTQSPSSLSASVGDRVTITCQASQDIRFYLNWYQQKPGKAPKLLISDASNMETGVPSRFSGSGSGTDFTFTISSLQPEDIATYYCQQYDNLPFTFGPGTKVDFK
Nanobody (NanoBodyBuilder2)
>H
QVQLVESGGGLVQPGESLRLSCAASGSIFGIYAVHWFRMAPGKEREFTAGFGSHGSTNYAASVKGRFTMSRDNAKNTTYLQMNSLKPADTAVYYCHALIKNELGFLDYWGPGTQVTVSS
TCR (TCRBuilder2)
>A
AQSVTQLGSHVSVSEGALVLLRCNYSSSVPPYLFWYVQYPNQGLQLLLKYTSAATLVKGINGFEAEFKKSETSFHLTKPSAHMSDAAEYFCAVSEQDDKIIFGKGTRLHILP
>B
ADVTQTPRNRITKTGKRIMLECSQTKGHDRMYWYRQDPGLGLRLIYYSFDVKDINKGEISDGYSVSRQAQAKFSLSLESAIPNQTALYFCATSDESYGYTFGSGTRLTVV
Example Input Files
Example FASTA files are provided in the input/ directory:
antibody_test.fasta- Example antibody with H and L chainsnanobody_test.fasta- Example nanobody with H chain onlytcr_test.fasta- Example TCR with A and B chains
Output
The pipeline produces:
| File | Description |
|---|---|
{sample_name}.pdb |
Predicted 3D structure in PDB format |
run.log |
Execution log with prediction details |
pipeline_info/report.html |
Detailed execution report |
pipeline_info/timeline.html |
Timeline visualization |
pipeline_info/dag.html |
Workflow DAG visualization |
pipeline_info/trace.txt |
Execution trace |
Visualizing Output Structures
# Using PyMOL
pymol output.pdb
# Using ChimeraX
chimerax output.pdb
Or upload to online viewers:
Profiles
| Profile | Description |
|---|---|
standard |
Default Docker execution (CPU) |
gpu |
Docker execution with GPU support |
singularity |
Singularity container execution |
conda |
Conda environment execution |
Performance
Typical prediction times on CPU:
| Mode | Duration | Output Size |
|---|---|---|
| Antibody | ~3 min | ~280 KB |
| Nanobody | ~5 min | ~140 KB |
| TCR | ~5 min | ~280 KB |
Troubleshooting
Error: 429 Too Many Requests
If you see rate limiting errors from Zenodo:
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests
Solution: Rebuild the Docker image with --no-cache:
docker build --no-cache -t immunebuilder:latest .
Error: KeyError 'H' or 'A' or 'L' or 'B'
The FASTA file has incorrect chain labels.
Solution: Ensure correct labels:
- Antibody:
>Hand>L - Nanobody:
>Honly - TCR:
>Aand>B
Error: Sequences too long
ImmuneBuilder expects variable region sequences only (~110-130 aa).
Solution: Extract only the Fv (variable fragment) portion from full-length sequences.
Error: Missing output file
Check run.log in the output directory for detailed error messages:
cat /path/to/output/run.log
Citation
If you use this pipeline, please cite:
@article{Abanades2023,
author = {Abanades, Brennan and Wong, Wing Ki and Boyles, Fergus and Georges, Guy and Bujotzek, Alexander and Deane, Charlotte M.},
doi = {10.1038/s42003-023-04927-7},
journal = {Communications Biology},
number = {1},
pages = {575},
title = {ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins},
volume = {6},
year = {2023}
}
For TCRBuilder2+:
@article{Quast2024,
author = {Quast, Nele P. and Abanades, Brennan and Guloglu, Bora and Karuppiah, Vijaykumar and Harper, Stephen and Raybould, Matthew I. J. and Deane, Charlotte M.},
title = {T-cell receptor structures and predictive models reveal comparable alpha and beta chain structural diversity despite differing genetic complexity},
year = {2024},
doi = {10.1101/2024.05.20.594940},
journal = {bioRxiv}
}
License
BSD 3-clause license (same as ImmuneBuilder)