Initial commit: Chai-1 protein structure prediction pipeline for WES

- Nextflow pipeline using chai1 Docker image from Harbor
- S3-based input/output paths (s3://omic/eureka/chai-lab/)
- GPU-accelerated protein folding with MSA support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-16 12:55:08 +01:00
commit f971fd0e21
26 changed files with 1289 additions and 0 deletions

14
input/.nextflow.log Executable file
View File

@@ -0,0 +1,14 @@
Jan-27 12:03:24.006 [main] DEBUG nextflow.cli.Launcher - $> nextflow run main.nf
Jan-27 12:03:24.041 [main] DEBUG nextflow.cli.CmdRun - N E X T F L O W ~ version 24.10.3
Jan-27 12:03:24.052 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/root/.nextflow/plugins; core-plugins: nf-amazon@2.9.2,nf-azure@1.10.2,nf-cloudcache@0.4.2,nf-codecommit@0.2.2,nf-console@1.1.4,nf-google@1.15.3,nf-tower@1.9.3,nf-wave@1.7.4
Jan-27 12:03:24.067 [main] INFO o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Jan-27 12:03:24.067 [main] INFO o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Jan-27 12:03:24.069 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.12.0 in 'deployment' mode
Jan-27 12:03:24.074 [main] INFO org.pf4j.AbstractPluginManager - No plugins
Jan-27 12:03:24.083 [main] DEBUG nextflow.scm.ProviderConfig - Using SCM config path: /root/.nextflow/scm
Jan-27 12:03:24.089 [main] DEBUG nextflow.cli.Launcher - Operation aborted
nextflow.exception.AbortOperationException: Cannot find script file: main.nf
at nextflow.cli.CmdRun.getScriptFile(CmdRun.groovy:536)
at nextflow.cli.CmdRun.run(CmdRun.groovy:325)
at nextflow.cli.Launcher.run(Launcher.groovy:503)
at nextflow.cli.Launcher.main(Launcher.groovy:658)

View File

@@ -0,0 +1,4 @@
>protein|name=growth-hormone
FPTIPLSRLFDNAMLRAHRLHQLAFDTYQEFEEAYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQKSNLELLRISLLLIQSWLEPVQFLRSVFANSLVYGASDSNVYDLLKDLEEGIQTLMGRLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCFRKDMDKVETFLRIVQCRSVEGSCGF
>protein|name=growth-hormone-receptor
FSGSEATPGPLIFKWNHHSVFFDGYTSGGLQRFVHLHFGVSNKQLISICRKRANSKEPSSPIVPVPVGGQLLVDCSFRKLSGEGLHTYYYAAGQEEKTSDRSHRHGPGVGSCFRKTFEDGVYQCTARNEGYAYGHSITKSHRTSHQVCSRDGVPVLTENQAHLPEDFKEFTLRLKQKRQLLERGSPAMQDTFPAPSPETTVQEITSQHPGGTESPTVLRVKTEKSHQVYAGLSKYFHYAGQRGLRVLYLHKGESLARGTVTVPVKRDRGVLADRMVEAVDVQRWVGYLRNVYLTGQK

View File

@@ -0,0 +1,12 @@
# Restraints for growth hormone complex based on known binding interface
# Format: chain1 resid1 chain2 resid2 distance_lower distance_upper confidence
# Key interface contacts between growth hormone and its receptor
A 14 B 43 4.0 8.0 0.8
A 167 B 57 4.0 8.0 0.8
A 171 B 62 4.0 8.0 0.8
A 175 B 102 4.0 8.0 0.8
A 178 B 166 4.0 8.0 0.8
# Additional stabilizing contacts
A 65 B 150 4.0 9.0 0.7
A 164 B 191 4.0 9.0 0.7

6
input/insulin_complex.fasta Executable file
View File

@@ -0,0 +1,6 @@
>protein|name=insulin-a-chain
GIVEQCCTSICSLYQLENYCN
>protein|name=insulin-b-chain
FVNQHLCGSHLVEALYLVCGERGFFYTPKT
>protein|name=insulin-receptor-l1
PQAFVNWLRGGSQQVEVFVSDLPKLRNLLQGEELLGRGSFGVVYEGNARDIIKGEAETRVAVKTVNESASLRERIEFLNEASVMKGFTCHHVVRLLGVVSKGQPTLVVMELMAHGDLKSYLRSLRPEAENNPGRPPPTLQEMIQMAAEIADGMAYLNAKKFVHRDLAARNCMVAHDFTVKIGDFGMTRDIYETDYYRKGGKGLLPVRWMAPESLKDGVFTTSSDMWSFGVVLWEITSLAEQPYQGLSNEQVLKFVMDGGYLDQPDNCPERVTDLMRMCWQFNPKMRPTFLEIVNLLKDDLHPSFPEVSFFHSEENKAPESEELEMEFEDMENVPLDRSSHCQREEAGGRDGGSSLGFKRSYEEHIPYTHMNGGKKNGRILTLPRSNPS

10
input/insulin_restraints.txt Executable file
View File

@@ -0,0 +1,10 @@
# Restraints for insulin complex based on known interaction sites
# Format: chain1 resid1 chain2 resid2 distance_lower distance_upper confidence
# Insulin A chain to B chain contacts (disulfide bonds)
A 7 B 7 3.0 5.0 0.9
A 20 B 19 3.0 5.0 0.9
# Insulin (A+B) to receptor contacts
A 12 C 155 4.0 8.0 0.8
B 24 C 210 4.0 8.0 0.8
B 25 C 215 4.0 8.0 0.8