digital-patients

Olamide Isreal 9e6a16c19b Initial commit: digital-patients pipeline (clean, no large files)

Large reference/model files excluded from repo - to be staged to S3 or baked into Docker images.

2026-03-26 15:15:23 +01:00

app_filter

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

documentaion

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

k8s

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

.gitattributes

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

.gitignore

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

.gitmodules

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

create_LM22_sourceGEP_ref_file.py

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

docker-compose.yml

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Dockerfile_borzoi

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Dockerfile_cibersortx

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Dockerfile_corto

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Dockerfile_rna2proteinexpression

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Dockerfile_synthea

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Dockerfile_vcf2prot

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Dockerfile_vep

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Download_write_healthy_m_f_txt_file.ipynb

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

ensg2number.joblib

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_borzoi.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_cibersortx.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_corto.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_filter_outputs.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_no_mutations.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_rna2proteinexpression.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_synthea.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_synthea.nf.bk

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_synthea.nf.bk.2

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

main_vcf2prot.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

ncbiRefSeq_bigger.csv

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

nextflow.config

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Notes.txt

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

params.json

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

prot_bigger.csv

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

README.md

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

rna2protexpression.py

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

test_gen_patient.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

test_no_mutations.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

test.nf

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

test.nf.bk

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

tissue2number.joblib

Initial commit: digital-patients pipeline (clean, no large files)

2026-03-26 15:15:23 +01:00

Step	Status	Tool/Method	Input	Output	Location	Validation Data	Dependencies/Notes
1. Medical Records	✓	Synthea	-	Demographics, records	/Workspace/next/registry/tools/synthea	Target: 1000 patients/disease	-
2. Disease Genome	✓	Omic-UKBB	Alleles, positions, frequencies	VCF (hg38)	Part of Synthea repo	-	Only storing variants
3. Protein Variants	✓	vcf2prot	VCF	Protein fasta	Part of Synthea repo (tbc)	-	Multi-tissue support needed
4. Transcriptome	🚧	borzoi	Genomic sequences	RNAseq (TPM)	-	ENCODE, GTEx (E-MTAB-6814)	NIH ENCODE standards
5. Proteome	🚧	clei2block	RNAseq (log2-FC)	Fold-change	github.com/stasaki/clei2block	CellModelPassport, TCGA	Requires GTEx training
6. Metabolome	⏳	corto	RNAseq (TPM)	Metabolite profiles	github.com/federicogiorgi/corto	CCLE, NCI-60	-
7. Immunome	⏳	Ecotyper	RNAseq	Cell type profiles	-	SPICA30, SPICA17	-

Category	Component	Status	Notes
External Data	BindingDB	✓ Available	Binding affinities
	LINCS	✓ Available	Compound effects
	PharmGKB	⏳ Pending	Variant annotations
	Human Cell Atlas	⏳ Pending	Tissue-specific data
Compute	GPU Cluster	🚧 Scaling	For enformer/basenji
	Storage	✓ Configured	For variant data
	Distribution	⏳ Planned	For processing

Dataset	Usage	Status	Notes
ENCODE	Transcriptomics	✓ Ready	Primary validation
GTEx	Tissue-specific	✓ Ready	E-MTAB-6814
CCLE/GDSC2	Cell lines	🚧 In Progress	Cancer validation
TDC	ADMET	⏳ Planned	Benchmark data
Cross-species	Conservation	⏳ Planned	Evolutionary validation
Time-series	Metabolics	⏳ Planned	Kinetic validation

Scenario	Implementation Status	Handling Strategy
Rare variants	🚧 In Progress	Population frequency weighting
Multi-drug combinations	⏳ Planned	Interaction matrix modeling
Time-dependent effects	⏳ Planned	PK/PD time series modeling
Population specificity	🚧 In Progress	Demographic stratification

Class	Special Requirements	Status
Biologics	Membrane modeling, immunogenicity	⏳ Planned
Prodrugs	Metabolite prediction, activation	🚧 In Progress
Combination therapy	Interaction prediction, timing	⏳ Planned
PROTACs	Protein degradation modeling	⏳ Planned

README.md

Digital Patient and Drug Response Pipeline - Comprehensive Implementation Plan

Pipeline Overview

Part 1: Digital Patient Generation

Implementation Status Overview

Active Implementation Tasks

Transcriptome Generation

Multi-omic Integration

Part 2: Drug Discovery and Response

Drug Development Tools

Binding Site Prediction

Drug-Target Analysis

Chemical Property Prediction

Toxicity Prediction Pipeline

Drug Response Analysis

Critical Dependencies & Requirements

Validation Framework

Edge Cases & Special Considerations

Complex Scenarios

Special Drug Classes

Case Studies & Validation Examples

Drug	Outcome	Learning Points	Implementation Status
Amcenestrant	Efficacy failure	Target validation importance	✓ Integrated
Flupirtine	Liver toxicity	Metabolite prediction crucial	🚧 In Progress
Ranitidine	NDMA formation	Chemical stability prediction	⏳ Planned
Multi-drug Examples	Variable	Interaction modeling needed	⏳ Planned