Olamide Isreal 8887cbe592
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Configure ImmuneBuilder pipeline for WES execution
- Update container image to harbor.cluster.omic.ai/omic/immunebuilder:latest
- Update input/output paths to S3 (s3://omic/eureka/immunebuilder/)
- Remove local mount containerOptions (not needed in k8s)
- Update homepage to Gitea repo URL
- Clean history to remove large model weight blobs
2026-03-16 15:31:53 +01:00

ImmuneBuilder Nextflow Pipeline

A Nextflow pipeline for predicting the structures of immune proteins using ImmuneBuilder, including antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2), and T-cell receptors (TCRBuilder2).

Overview

ImmuneBuilder is a set of deep learning models trained to accurately predict the structure of immune receptor proteins. This pipeline packages ImmuneBuilder for use with Nextflow and Docker for reproducible and scalable structure predictions.

Features

  • ABodyBuilder2: Predicts antibody structures with state-of-the-art accuracy (CDR-H3 RMSD: 2.81Å)
  • NanoBodyBuilder2: Predicts nanobody structures (CDR-H3 RMSD: 2.89Å)
  • TCRBuilder2/TCRBuilder2+: Predicts T-cell receptor structures with updated weights

Requirements

Installation

Build Docker Image

Important: Use --no-cache to ensure model weights are downloaded properly.

docker build --no-cache -t immunebuilder:latest .

The build process will:

  1. Install all dependencies (PyTorch, OpenMM, pdbfixer, ANARCI)
  2. Pre-download all model weights (~500MB) from Zenodo to avoid rate limiting at runtime

Build time: approximately 10-15 minutes depending on network speed.

Usage

Setup Directories

# Create input/output directories
mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/input
mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/output

Basic Usage

# Predict antibody structure (default mode)
nextflow run main.nf --fasta /path/to/antibody.fasta --mode antibody --outdir ./results

# Predict nanobody structure
nextflow run main.nf --fasta /path/to/nanobody.fasta --mode nanobody --outdir ./results

# Predict TCR structure
nextflow run main.nf --fasta /path/to/tcr.fasta --mode tcr --outdir ./results

With GPU Support

nextflow run main.nf -profile gpu --fasta /path/to/sequences.fasta --mode antibody

Process Multiple Files

nextflow run main.nf --fasta '/path/to/sequences/*.fasta' --mode antibody --outdir ./results

Parameters

Parameter Description Default Required
--fasta Input FASTA file(s) null Yes
--outdir Output directory ./results Yes
--mode Prediction mode: antibody, nanobody, or tcr antibody Yes
--verbose Enable verbose output true No
--original_weights Use original TCRBuilder2 weights (TCR mode only) false No

Input FASTA Format Requirements

Important: Variable Region Sequences Only

ImmuneBuilder expects variable region (Fv) sequences only, not full-length antibody sequences. Each chain should be approximately:

  • Heavy chain (H): 110-130 amino acids
  • Light chain (L): 105-115 amino acids
  • Alpha chain (A): 105-115 amino acids
  • Beta chain (B): 110-120 amino acids

If you have full-length sequences from RCSB/PDB, extract only the variable region.

Antibody (ABodyBuilder2)

>H
EVQLVESGGGVVQPGGSLRLSCAASGFTFNSYGMHWVRQAPGKGLEWVAFIRYDGGNKYYADSVKGRFTISRDNSKNTLYLQMKSLRAEDTAVYYCANLKDSRYSGSYYDYWGQGTLVTVSS
>L
DIQMTQSPSSLSASVGDRVTITCQASQDIRFYLNWYQQKPGKAPKLLISDASNMETGVPSRFSGSGSGTDFTFTISSLQPEDIATYYCQQYDNLPFTFGPGTKVDFK

Nanobody (NanoBodyBuilder2)

>H
QVQLVESGGGLVQPGESLRLSCAASGSIFGIYAVHWFRMAPGKEREFTAGFGSHGSTNYAASVKGRFTMSRDNAKNTTYLQMNSLKPADTAVYYCHALIKNELGFLDYWGPGTQVTVSS

TCR (TCRBuilder2)

>A
AQSVTQLGSHVSVSEGALVLLRCNYSSSVPPYLFWYVQYPNQGLQLLLKYTSAATLVKGINGFEAEFKKSETSFHLTKPSAHMSDAAEYFCAVSEQDDKIIFGKGTRLHILP
>B
ADVTQTPRNRITKTGKRIMLECSQTKGHDRMYWYRQDPGLGLRLIYYSFDVKDINKGEISDGYSVSRQAQAKFSLSLESAIPNQTALYFCATSDESYGYTFGSGTRLTVV

Example Input Files

Example FASTA files are provided in the input/ directory:

  • antibody_test.fasta - Example antibody with H and L chains
  • nanobody_test.fasta - Example nanobody with H chain only
  • tcr_test.fasta - Example TCR with A and B chains

Output

The pipeline produces:

File Description
{sample_name}.pdb Predicted 3D structure in PDB format
run.log Execution log with prediction details
pipeline_info/report.html Detailed execution report
pipeline_info/timeline.html Timeline visualization
pipeline_info/dag.html Workflow DAG visualization
pipeline_info/trace.txt Execution trace

Visualizing Output Structures

# Using PyMOL
pymol output.pdb

# Using ChimeraX
chimerax output.pdb

Or upload to online viewers:

Profiles

Profile Description
standard Default Docker execution (CPU)
gpu Docker execution with GPU support
singularity Singularity container execution
conda Conda environment execution

Performance

Typical prediction times on CPU:

Mode Duration Output Size
Antibody ~3 min ~280 KB
Nanobody ~5 min ~140 KB
TCR ~5 min ~280 KB

Troubleshooting

Error: 429 Too Many Requests

If you see rate limiting errors from Zenodo:

requests.exceptions.HTTPError: 429 Client Error: Too Many Requests

Solution: Rebuild the Docker image with --no-cache:

docker build --no-cache -t immunebuilder:latest .

Error: KeyError 'H' or 'A' or 'L' or 'B'

The FASTA file has incorrect chain labels.

Solution: Ensure correct labels:

  • Antibody: >H and >L
  • Nanobody: >H only
  • TCR: >A and >B

Error: Sequences too long

ImmuneBuilder expects variable region sequences only (~110-130 aa).

Solution: Extract only the Fv (variable fragment) portion from full-length sequences.

Error: Missing output file

Check run.log in the output directory for detailed error messages:

cat /path/to/output/run.log

Citation

If you use this pipeline, please cite:

@article{Abanades2023,
    author = {Abanades, Brennan and Wong, Wing Ki and Boyles, Fergus and Georges, Guy and Bujotzek, Alexander and Deane, Charlotte M.},
    doi = {10.1038/s42003-023-04927-7},
    journal = {Communications Biology},
    number = {1},
    pages = {575},
    title = {ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins},
    volume = {6},
    year = {2023}
}

For TCRBuilder2+:

@article{Quast2024,
    author = {Quast, Nele P. and Abanades, Brennan and Guloglu, Bora and Karuppiah, Vijaykumar and Harper, Stephen and Raybould, Matthew I. J. and Deane, Charlotte M.},
    title = {T-cell receptor structures and predictive models reveal comparable alpha and beta chain structural diversity despite differing genetic complexity},
    year = {2024},
    doi = {10.1101/2024.05.20.594940},
    journal = {bioRxiv}
}

License

BSD 3-clause license (same as ImmuneBuilder)

Description
Nextflow pipeline for ImmuneBuilder - Deep-Learning models for predicting structures of immune proteins
Readme BSD-3-Clause 241 KiB
Languages
Jupyter Notebook 85.4%
Python 13%
Nextflow 0.7%
Dockerfile 0.6%
Shell 0.3%