# ImmuneBuilder Nextflow Pipeline A Nextflow pipeline for predicting the structures of immune proteins using ImmuneBuilder, including antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2), and T-cell receptors (TCRBuilder2). ## Overview ImmuneBuilder is a set of deep learning models trained to accurately predict the structure of immune receptor proteins. This pipeline packages ImmuneBuilder for use with Nextflow and Docker for reproducible and scalable structure predictions. ## Features - **ABodyBuilder2**: Predicts antibody structures with state-of-the-art accuracy (CDR-H3 RMSD: 2.81Å) - **NanoBodyBuilder2**: Predicts nanobody structures (CDR-H3 RMSD: 2.89Å) - **TCRBuilder2/TCRBuilder2+**: Predicts T-cell receptor structures with updated weights ## Requirements - [Nextflow](https://www.nextflow.io/) (>=21.04.0) - [Docker](https://www.docker.com/) or [Singularity](https://sylabs.io/singularity/) ## Installation ### Build Docker Image **Important**: Use `--no-cache` to ensure model weights are downloaded properly. ```bash docker build --no-cache -t immunebuilder:latest . ``` The build process will: 1. Install all dependencies (PyTorch, OpenMM, pdbfixer, ANARCI) 2. Pre-download all model weights (~500MB) from Zenodo to avoid rate limiting at runtime Build time: approximately 10-15 minutes depending on network speed. ## Usage ### Setup Directories ```bash # Create input/output directories mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/input mkdir -p /mnt/OmicNAS/private/old/olamide/ImmuneBuilder/output ``` ### Basic Usage ```bash # Predict antibody structure (default mode) nextflow run main.nf --fasta /path/to/antibody.fasta --mode antibody --outdir ./results # Predict nanobody structure nextflow run main.nf --fasta /path/to/nanobody.fasta --mode nanobody --outdir ./results # Predict TCR structure nextflow run main.nf --fasta /path/to/tcr.fasta --mode tcr --outdir ./results ``` ### With GPU Support ```bash nextflow run main.nf -profile gpu --fasta /path/to/sequences.fasta --mode antibody ``` ### Process Multiple Files ```bash nextflow run main.nf --fasta '/path/to/sequences/*.fasta' --mode antibody --outdir ./results ``` ## Parameters | Parameter | Description | Default | Required | |-----------|-------------|---------|----------| | `--fasta` | Input FASTA file(s) | null | Yes | | `--outdir` | Output directory | ./results | Yes | | `--mode` | Prediction mode: `antibody`, `nanobody`, or `tcr` | antibody | Yes | | `--verbose` | Enable verbose output | true | No | | `--original_weights` | Use original TCRBuilder2 weights (TCR mode only) | false | No | ## Input FASTA Format Requirements ### Important: Variable Region Sequences Only ImmuneBuilder expects **variable region (Fv) sequences only**, not full-length antibody sequences. Each chain should be approximately: - **Heavy chain (H)**: 110-130 amino acids - **Light chain (L)**: 105-115 amino acids - **Alpha chain (A)**: 105-115 amino acids - **Beta chain (B)**: 110-120 amino acids If you have full-length sequences from RCSB/PDB, extract only the variable region. ### Antibody (ABodyBuilder2) ``` >H EVQLVESGGGVVQPGGSLRLSCAASGFTFNSYGMHWVRQAPGKGLEWVAFIRYDGGNKYYADSVKGRFTISRDNSKNTLYLQMKSLRAEDTAVYYCANLKDSRYSGSYYDYWGQGTLVTVSS >L DIQMTQSPSSLSASVGDRVTITCQASQDIRFYLNWYQQKPGKAPKLLISDASNMETGVPSRFSGSGSGTDFTFTISSLQPEDIATYYCQQYDNLPFTFGPGTKVDFK ``` ### Nanobody (NanoBodyBuilder2) ``` >H QVQLVESGGGLVQPGESLRLSCAASGSIFGIYAVHWFRMAPGKEREFTAGFGSHGSTNYAASVKGRFTMSRDNAKNTTYLQMNSLKPADTAVYYCHALIKNELGFLDYWGPGTQVTVSS ``` ### TCR (TCRBuilder2) ``` >A AQSVTQLGSHVSVSEGALVLLRCNYSSSVPPYLFWYVQYPNQGLQLLLKYTSAATLVKGINGFEAEFKKSETSFHLTKPSAHMSDAAEYFCAVSEQDDKIIFGKGTRLHILP >B ADVTQTPRNRITKTGKRIMLECSQTKGHDRMYWYRQDPGLGLRLIYYSFDVKDINKGEISDGYSVSRQAQAKFSLSLESAIPNQTALYFCATSDESYGYTFGSGTRLTVV ``` ## Example Input Files Example FASTA files are provided in the `input/` directory: - `antibody_test.fasta` - Example antibody with H and L chains - `nanobody_test.fasta` - Example nanobody with H chain only - `tcr_test.fasta` - Example TCR with A and B chains ## Output The pipeline produces: | File | Description | |------|-------------| | `{sample_name}.pdb` | Predicted 3D structure in PDB format | | `run.log` | Execution log with prediction details | | `pipeline_info/report.html` | Detailed execution report | | `pipeline_info/timeline.html` | Timeline visualization | | `pipeline_info/dag.html` | Workflow DAG visualization | | `pipeline_info/trace.txt` | Execution trace | ### Visualizing Output Structures ```bash # Using PyMOL pymol output.pdb # Using ChimeraX chimerax output.pdb ``` Or upload to online viewers: - [Mol* Viewer](https://molstar.org/viewer/) - [NGL Viewer](https://nglviewer.org/) ## Profiles | Profile | Description | |---------|-------------| | `standard` | Default Docker execution (CPU) | | `gpu` | Docker execution with GPU support | | `singularity` | Singularity container execution | | `conda` | Conda environment execution | ## Performance Typical prediction times on CPU: | Mode | Duration | Output Size | |------|----------|-------------| | Antibody | ~3 min | ~280 KB | | Nanobody | ~5 min | ~140 KB | | TCR | ~5 min | ~280 KB | ## Troubleshooting ### Error: 429 Too Many Requests If you see rate limiting errors from Zenodo: ``` requests.exceptions.HTTPError: 429 Client Error: Too Many Requests ``` **Solution**: Rebuild the Docker image with `--no-cache`: ```bash docker build --no-cache -t immunebuilder:latest . ``` ### Error: KeyError 'H' or 'A' or 'L' or 'B' The FASTA file has incorrect chain labels. **Solution**: Ensure correct labels: - Antibody: `>H` and `>L` - Nanobody: `>H` only - TCR: `>A` and `>B` ### Error: Sequences too long ImmuneBuilder expects variable region sequences only (~110-130 aa). **Solution**: Extract only the Fv (variable fragment) portion from full-length sequences. ### Error: Missing output file Check `run.log` in the output directory for detailed error messages: ```bash cat /path/to/output/run.log ``` ## Citation If you use this pipeline, please cite: ```bibtex @article{Abanades2023, author = {Abanades, Brennan and Wong, Wing Ki and Boyles, Fergus and Georges, Guy and Bujotzek, Alexander and Deane, Charlotte M.}, doi = {10.1038/s42003-023-04927-7}, journal = {Communications Biology}, number = {1}, pages = {575}, title = {ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins}, volume = {6}, year = {2023} } ``` For TCRBuilder2+: ```bibtex @article{Quast2024, author = {Quast, Nele P. and Abanades, Brennan and Guloglu, Bora and Karuppiah, Vijaykumar and Harper, Stephen and Raybould, Matthew I. J. and Deane, Charlotte M.}, title = {T-cell receptor structures and predictive models reveal comparable alpha and beta chain structural diversity despite differing genetic complexity}, year = {2024}, doi = {10.1101/2024.05.20.594940}, journal = {bioRxiv} } ``` ## License BSD 3-clause license (same as ImmuneBuilder) ## Links - [ImmuneBuilder GitHub](https://github.com/oxpig/ImmuneBuilder) - [ImmuneBuilder Paper](https://doi.org/10.1038/s42003-023-04927-7) - [Google Colab Demo](https://colab.research.google.com/github/brennanaba/ImmuneBuilder/blob/main/notebook/ImmuneBuilder.ipynb)