{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "G4yBrceuFbf3" }, "source": [ "\n", "---\n", "\n", "
\n", " \n", "# ImmuneBuilder: Deep-Learning models fo predicting the structures of immune proteins \n", "\n", "
\n", "\n", "---\n", "\n", "Immune receptor proteins play a key role in the immune system and have shown great promise as biotherapeutics. The structure of these proteins is critical for understanding their antigen binding properties. Here, we present ImmuneBuilder, a set of deep learning models trained to accurately predict the structure of antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2) and T-Cell receptors (TCRBuilder2). We show that ImmuneBuilder generates structures with state of the art accuracy while being far faster than AlphaFold2. For example, on a benchmark of 34 recently solved antibodies, ABodyBuilder2 predicts CDR-H3 loops with an RMSD of 2.81Å, a 0.09Å improvement over AlphaFold-Multimer, while being over a hundred times faster. Similar results are also achieved for nanobodies, (NanoBodyBuilder2 predicts CDR-H3 loops with an average RMSD of 2.89Å, a 0.55Å improvement over AlphaFold2) and TCRs. By predicting an ensemble of structures, ImmuneBuilder also gives an error estimate for every residue in its final prediction.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "id": "kOblAo-xetgx" }, "outputs": [], "source": [ "#@title Input chain sequence(s), then hit `Runtime` -> `Run all`\n", "import sys\n", "python_version = f\"{sys.version_info.major}.{sys.version_info.minor}\"\n", "\n", "#@markdown Select what type of immune protein you are modelling\n", "\n", "protein_type = \"Antibody\" #@param [\"Antibody\", \"Nanobody\",\"TCR\"]\n", "\n", "#@markdown Insert the sequence for the variable domain. If modelling a Nanobody, only use one of the fields\n", "\n", "sequence_1 = 'VKLLEQSGAEVKKPGASVKVSCKASGYSFTSYGLHWVRQAPGQRLEWMGWISAGTGNTKYSQKFRGRVTFTRDTSATTAYMGLSSLRPEDTAVYYCARDQAGYTGGKSEFDYWGQGTLVTVSS' #@param {type:\"string\"}\n", "sequence_2 = 'ELVMTQSPSSLSASVGDRVNIACRASQGISSALAWYQQKPGKAPRLLIYDASNLESGVPSRFSGSGSGTDFTLTISSLQPEDFAIYYCQQFNSYPLTFGGGTKVEIKRTV' #@param {type:\"string\"}\n", "\n", "# remove whitespaces\n", "sequence_1 = \"\".join(sequence_1.split())\n", "sequence_2 = \"\".join(sequence_2.split())\n", "\n", "#@markdown Insert the output file name\n", "\n", "filename = 'ImmuneBuilder_model.pdb' #@param {type:\"string\"}\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "cellView": "form", "id": "iccGdbe_Pmt9", "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "#@title Install dependencies\n", "%%capture\n", "%%bash -s $python_version\n", "\n", "#@markdown This script will download and install the ImmuneBuilder code, ANARCI and OpenMM\n", "\n", "PYTHON_VERSION=$1\n", "set -e\n", "\n", "if [ ! -f CODE_READY ]; then\n", " # install dependencies\n", " pip install ImmuneBuilder 2>&1 1>/dev/null\n", " pip install py3Dmol 2>&1 1>/dev/null\n", " touch CODE_READY\n", "fi\n", "\n", "# setup conda\n", "if [ ! -f CONDA_READY ]; then\n", " wget -qnc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n", " bash Miniconda3-latest-Linux-x86_64.sh -bfp /usr/local 2>&1 1>/dev/null\n", " rm Miniconda3-latest-Linux-x86_64.sh\n", " touch CONDA_READY\n", "fi\n", "\n", "# setup openmm for amber refinement\n", "if [ ! -f AMBER_READY ]; then\n", " conda install -y -q -c conda-forge openmm=7.7.0 python=\"${PYTHON_VERSION}\" pdbfixer 2>&1 1>/dev/null\n", " touch AMBER_READY\n", "fi\n", "\n", "# setup anarci\n", "if [ ! -f ANARCI_READY ]; then\n", " conda install -y -q anarci hmmer biopython -c bioconda --no-deps --solver=classic 2>&1 1>/dev/null\n", " touch ANARCI_READY\n", "fi" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "cellView": "form", "id": "LZM_0K0amcrP" }, "outputs": [], "source": [ "#@title Download the model weights\n", "%%capture\n", "\n", "#@markdown This will take a few seconds the first time\n", "\n", "if f\"/usr/local/lib/python{python_version}/site-packages/\" not in sys.path:\n", " sys.path.insert(0, f\"/usr/local/lib/python{python_version}/site-packages/\")\n", "\n", "from ImmuneBuilder import ABodyBuilder2, NanoBodyBuilder2, TCRBuilder2\n", "\n", "if protein_type == \"Antibody\":\n", " predictor = ABodyBuilder2()\n", "elif protein_type == \"Nanobody\":\n", " predictor = NanoBodyBuilder2()\n", "elif protein_type == \"TCR\":\n", " predictor = TCRBuilder2()\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "cellView": "form", "id": "EbRzRvQkVKJs" }, "outputs": [], "source": [ "#@title Predict the structure\n", "\n", "from anarci import number\n", "\n", "# Find which input sequence is which\n", "_, chain1 = number(sequence_1)\n", "_, chain2 = number(sequence_2)\n", "\n", "input = dict()\n", "if chain1:\n", " input[chain1] = sequence_1\n", "if chain2:\n", " input[chain2] = sequence_2\n", "\n", "try:\n", " predictor.predict(input).save(filename)\n", "except KeyError as e:\n", " print(f\"ERROR: Missing sequence for chain {str(e)}\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/", "height": 514 }, "id": "uJg31EJjIz7L", "outputId": "1a176433-4d4e-46ab-a574-35cbf5b8bbd5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The error is calulated by comparing how much different models agree or disagree on the placement of each residue\n" ] }, { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#@title Visualise the prediction\n", "\n", "import py3Dmol\n", "\n", "#@markdown Choose visualization settings (rerun this cell to update):\n", "\n", "colour_by = \"predicted_error\" #@param [\"predicted_error\", \"chain\", \"rainbow\"]\n", "\n", "show_sidechains = True #@param {type:\"boolean\"}\n", "show_mainchains = False #@param {type:\"boolean\"}\n", "\n", "\n", "#First we assign the py3Dmol.view as view\n", "view=py3Dmol.view()\n", "#The following lines are used to add the addModel class\n", "#to read the PDB files of chain B and C\n", "view.addModel(open(filename, 'r').read(),'pdb')\n", "#Zooming into all visualized structures \n", "view.zoomTo()\n", "#Here we set the background color as white\n", "view.setBackgroundColor('white')\n", "\n", "\n", "if colour_by == \"chain\":\n", " #Here we set the visualization style for chain B and C\n", " view.setStyle({'chain':'H'},{'cartoon': {'color':'purple'}})\n", " view.setStyle({'chain':'L'},{'cartoon': {'color':'green'}})\n", "elif colour_by == \"rainbow\":\n", " view.setStyle({'cartoon': {'color':'spectrum'}})\n", "elif colour_by == \"predicted_error\":\n", " # Here we set visualization by b factor\n", " print(\"The error is calulated by comparing how much different models agree or disagree on the placement of each residue\")\n", " view.setStyle({'cartoon': {'colorscheme': {'prop':'b','gradient': 'roygb','min':5,'max':0}}})\n", "\n", "if show_sidechains:\n", " BB = ['C','O','N']\n", " view.addStyle({'and':[{'resn':[\"GLY\",\"PRO\"],'invert':True},{'atom':BB,'invert':True}]},\n", " {'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", " view.addStyle({'and':[{'resn':\"GLY\"},{'atom':'CA'}]},\n", " {'sphere':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", " view.addStyle({'and':[{'resn':\"PRO\"},{'atom':['C','O'],'invert':True}]},\n", " {'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}}) \n", "if show_mainchains:\n", " BB = ['C','O','N','CA']\n", " view.addStyle({'atom':BB},{'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", "\n", "\n", "\n", "#And we finally visualize the structures using the command below\n", "view.zoomTo()\n", "view.show()\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "id": "33g5IIegij5R", "outputId": "3fe3d5f4-f743-4bd7-950e-ddbf91e25fec" }, "outputs": [ { "data": { "application/javascript": "\n async function download(id, filename, size) {\n if (!google.colab.kernel.accessAllowed) {\n return;\n }\n const div = document.createElement('div');\n const label = document.createElement('label');\n label.textContent = `Downloading \"${filename}\": `;\n div.appendChild(label);\n const progress = document.createElement('progress');\n progress.max = size;\n div.appendChild(progress);\n document.body.appendChild(div);\n\n const buffers = [];\n let downloaded = 0;\n\n const channel = await google.colab.kernel.comms.open(id);\n // Send a message to notify the kernel that we're ready.\n channel.send({})\n\n for await (const message of channel.messages) {\n // Send a message to notify the kernel that we're ready.\n channel.send({})\n if (message.buffers) {\n for (const buffer of message.buffers) {\n buffers.push(buffer);\n downloaded += buffer.byteLength;\n progress.value = downloaded;\n }\n }\n }\n const blob = new Blob(buffers, {type: 'application/binary'});\n const a = document.createElement('a');\n a.href = window.URL.createObjectURL(blob);\n a.download = filename;\n div.appendChild(a);\n a.click();\n div.remove();\n }\n ", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "download(\"download_87fea1f8-22e6-467f-b3c0-fed8746d67c5\", \"ImmuneBuilder_model.pdb\", 280386)", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#@title Download predicted structure\n", "#@markdown If you are having issues downloading the result archive, try disabling your adblocker and run this cell again. If that fails click on the little folder icon to the left, navigate to the file, right-click and select \\\"Download\\\".\n", "\n", "from google.colab import files\n", "\n", "files.download(filename)" ] }, { "cell_type": "markdown", "metadata": { "id": "UGUBLzB3C6WN", "pycharm": { "name": "#%% md\n" } }, "source": [ "# Instructions \n", "**Quick start**\n", "1. Paste the sequence(s) of your antibody, nanobody or TCR in the input field.\n", "2. Select what type of protein it is.\n", "3. Press \"Runtime\" -> \"Run all\".\n", "4. The pipeline consists of 5 steps. The currently running step is indicated by a circle with a stop sign next to it.\n", "\n", "**Troubleshooting**\n", "* Check your input sequences. They should be antibody, nanobody or TCRs sequences. ImmuneBuilder is not capable of predicting the structure of general proteins.\n", "* Check that the runtime type is set to GPU at \"Runtime\" -> \"Change runtime type\".\n", "* Try to restart the session \"Runtime\" -> \"Factory reset runtime\".\n", "\n", "**Aknowledgements**\n", "* This colab notebook was heavily inspired by [ColabFold](https://github.com/sokrypton/ColabFold)." ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "provenance": [] }, "gpuClass": "standard", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 0 }