Large reference/model files excluded from repo - to be staged to S3 or baked into Docker images.
210 lines
8.1 KiB
Markdown
210 lines
8.1 KiB
Markdown
# Digital Patient and Drug Response Pipeline - Comprehensive Implementation Plan
|
|
|
|
## Pipeline Overview
|
|
```mermaid
|
|
flowchart TB
|
|
subgraph Patient["Patient Profile Generation"]
|
|
A1[1. Medical Records Creation] --> A2[2. Disease-specific Genome]
|
|
A2 --> A3[3. Disease-specific Protein Variants]
|
|
A3 --> A4[4. Disease-specific Transcriptome]
|
|
A4 --> A5[5. Disease-specific Proteome]
|
|
A5 --> A6[6. Disease-specific Metabolome]
|
|
A6 --> A7[7. Disease-specific Immunome]
|
|
end
|
|
|
|
subgraph Drug["Drug Analysis & Modeling"]
|
|
B1[8. Drug-Target PK] --> B1a[8a. Binding Site Prediction]
|
|
B1a --> B1b[8b. Drug-Target Docking]
|
|
B1b --> B2[9. Drug-Proteome Screening]
|
|
B2 --> B3[10. Off-target Analysis]
|
|
B3 --> B4[11. Drug-Compound Screening]
|
|
B4 --> B5[12. Drug-Genome Sensitivity]
|
|
end
|
|
|
|
subgraph Response["Response Prediction"]
|
|
C1[13. Transcriptomic Changes] --> C2[14. Disease Stage Evaluation]
|
|
C2 --> C3[15-16. Proteomic & Metabolomic Changes]
|
|
C3 --> C4[17-19. Biological & Immune Response]
|
|
C4 --> C5[20-21. ADMET & Toxicity]
|
|
end
|
|
|
|
Patient --> Drug
|
|
Drug --> Response
|
|
```
|
|
|
|
## Part 1: Digital Patient Generation
|
|
|
|
### Implementation Status Overview
|
|
|
|
| Step | Status | Tool/Method | Input | Output | Location | Validation Data | Dependencies/Notes |
|
|
|------|--------|-------------|--------|---------|-----------|-----------------|-------------------|
|
|
| 1. Medical Records | ✓ | Synthea | - | Demographics, records | /Workspace/next/registry/tools/synthea | Target: 1000 patients/disease | - |
|
|
| 2. Disease Genome | ✓ | Omic-UKBB | Alleles, positions, frequencies | VCF (hg38) | Part of Synthea repo | - | Only storing variants |
|
|
| 3. Protein Variants | ✓ | vcf2prot | VCF | Protein fasta | Part of Synthea repo (tbc) | - | Multi-tissue support needed |
|
|
| 4. Transcriptome | 🚧 | borzoi | Genomic sequences | RNAseq (TPM) | - | ENCODE, GTEx (E-MTAB-6814) | NIH ENCODE standards |
|
|
| 5. Proteome | 🚧 | clei2block | RNAseq (log2-FC) | Fold-change | github.com/stasaki/clei2block | CellModelPassport, TCGA | Requires GTEx training |
|
|
| 6. Metabolome | ⏳ | corto | RNAseq (TPM) | Metabolite profiles | github.com/federicogiorgi/corto | CCLE, NCI-60 | - |
|
|
| 7. Immunome | ⏳ | Ecotyper | RNAseq | Cell type profiles | - | SPICA30, SPICA17 | - |
|
|
|
|
### Active Implementation Tasks
|
|
|
|
#### Transcriptome Generation
|
|
Current goal: Establish accurate transcriptome prediction pipeline
|
|
- [x] Implement and evaluate primary models:
|
|
* ~~enformer~~
|
|
* ~~basenji~~
|
|
* borzoi ~~for RNAseq profiles~~, built on basenji and enformer
|
|
- [x] Add SequenceModelBenchmark ridge regression - built into borzoi (tbc)
|
|
- [ ] Validate against ENCODE standards
|
|
- [ ] Implement GTEx validation pipeline
|
|
|
|
#### Multi-omic Integration
|
|
Current goal: Create robust data transformation pipeline
|
|
- [ ] Proteome prediction (clei2block):
|
|
* Implement GTEx training pipeline
|
|
* Add multi-tissue support
|
|
* Create validation framework against CellModelPassport
|
|
- [ ] Metabolome generation (corto):
|
|
* Setup CCLE data integration
|
|
* Implement NCI-60 validation
|
|
- [ ] Immunome profiling:
|
|
* ~~Evaluate Ecotyper vs CIBERTSORTx~~ CIBERTSORTx incorporated within Ecotyper
|
|
* Integrate SPICA datasets
|
|
* Setup immune cell validation pipeline
|
|
|
|
## Part 2: Drug Discovery and Response
|
|
|
|
### Drug Development Tools
|
|
Current goal: Establish comprehensive drug analysis pipeline
|
|
- [ ] Molecule Processing:
|
|
* SELFIES library for biologics/peptides conversion
|
|
* Implement molecule validation checks
|
|
* Setup standardization pipeline
|
|
- [ ] Structure Analysis:
|
|
* DreamDock + ConPlex score pipeline
|
|
* LightDock for membrane binding
|
|
* Validation framework with crystal structures
|
|
|
|
### Binding Site Prediction
|
|
Current goal: Create consensus model for binding site prediction
|
|
- [ ] Benchmark tools:
|
|
* DiffDock implementation and testing
|
|
* Qvina2 evaluation
|
|
* P2Rank integration
|
|
* FPocket analysis
|
|
- [ ] Specific considerations:
|
|
* Allosteric site detection
|
|
* Multiple binding site handling
|
|
* Protein flexibility modeling
|
|
- [ ] Validation:
|
|
* BindingDB integration
|
|
* Crystal structure comparison pipeline
|
|
* Edge case testing suite
|
|
|
|
### Drug-Target Analysis
|
|
Current goal: Robust docking and interaction prediction
|
|
- [ ] Primary docking pipeline:
|
|
* Uni-mol integration
|
|
* DreamDock implementation
|
|
* Path4Drug integration for pathways
|
|
- [ ] Molecule type-specific handling:
|
|
* Small molecule pipeline
|
|
* Biologics pathway
|
|
* PROTACs specific analysis
|
|
* Prodrug processing
|
|
- [ ] Interaction analysis:
|
|
* Agonist vs antagonist classification
|
|
* Protein-protein interaction integration
|
|
* Chemical_checker for bioactivity signatures
|
|
|
|
### Chemical Property Prediction
|
|
Current goal: Comprehensive property prediction system
|
|
- [ ] Model implementation:
|
|
* Chemprop evaluation
|
|
* Soltrannet integration
|
|
* Custom ADMET model development
|
|
- [ ] Property coverage:
|
|
* Solubility prediction
|
|
* BBB penetration
|
|
* Chemical stability
|
|
* Metabolic processing
|
|
|
|
### Toxicity Prediction Pipeline
|
|
Current goal: Multi-faceted toxicity assessment system
|
|
- [ ] Core modules:
|
|
* Cardiotoxicity (ion channel) prediction
|
|
* Hepatotoxicity (Phase 1/2 proteins)
|
|
* Nephrotoxicity assessment
|
|
* Lung toxicity prediction
|
|
* Neurotoxicity (BBB criteria)
|
|
* Inflammatory response modeling
|
|
* Bleeding/clotting risk analysis
|
|
- [ ] Integration components:
|
|
* Human Protein Atlas tissue proportion estimation
|
|
* Reactome pathway analysis
|
|
* Industry model benchmarking
|
|
|
|
### Drug Response Analysis
|
|
Current goal: Integrated response prediction system
|
|
- [ ] Transcriptomic response:
|
|
* LINCS data integration
|
|
* Expression change prediction
|
|
* Tissue-specific effects
|
|
- [ ] Multi-omic response:
|
|
* Proteomic change modeling
|
|
* Metabolomic adjustment prediction
|
|
* Immune response profiling
|
|
- [ ] Special cases:
|
|
* Multi-drug combinations
|
|
* Time-dependent effects
|
|
* Population-specific responses
|
|
|
|
## Critical Dependencies & Requirements
|
|
|
|
| Category | Component | Status | Notes |
|
|
|----------|-----------|---------|--------|
|
|
| **External Data** | BindingDB | ✓ Available | Binding affinities |
|
|
| | LINCS | ✓ Available | Compound effects |
|
|
| | PharmGKB | ⏳ Pending | Variant annotations |
|
|
| | Human Cell Atlas | ⏳ Pending | Tissue-specific data |
|
|
| **Compute** | GPU Cluster | 🚧 Scaling | For enformer/basenji |
|
|
| | Storage | ✓ Configured | For variant data |
|
|
| | Distribution | ⏳ Planned | For processing |
|
|
|
|
## Validation Framework
|
|
|
|
| Dataset | Usage | Status | Notes |
|
|
|---------|--------|---------|--------|
|
|
| ENCODE | Transcriptomics | ✓ Ready | Primary validation |
|
|
| GTEx | Tissue-specific | ✓ Ready | E-MTAB-6814 |
|
|
| CCLE/GDSC2 | Cell lines | 🚧 In Progress | Cancer validation |
|
|
| TDC | ADMET | ⏳ Planned | Benchmark data |
|
|
| Cross-species | Conservation | ⏳ Planned | Evolutionary validation |
|
|
| Time-series | Metabolics | ⏳ Planned | Kinetic validation |
|
|
|
|
## Edge Cases & Special Considerations
|
|
|
|
### Complex Scenarios
|
|
| Scenario | Implementation Status | Handling Strategy |
|
|
|----------|---------------------|-------------------|
|
|
| Rare variants | 🚧 In Progress | Population frequency weighting |
|
|
| Multi-drug combinations | ⏳ Planned | Interaction matrix modeling |
|
|
| Time-dependent effects | ⏳ Planned | PK/PD time series modeling |
|
|
| Population specificity | 🚧 In Progress | Demographic stratification |
|
|
|
|
### Special Drug Classes
|
|
| Class | Special Requirements | Status |
|
|
|-------|---------------------|---------|
|
|
| Biologics | Membrane modeling, immunogenicity | ⏳ Planned |
|
|
| Prodrugs | Metabolite prediction, activation | 🚧 In Progress |
|
|
| Combination therapy | Interaction prediction, timing | ⏳ Planned |
|
|
| PROTACs | Protein degradation modeling | ⏳ Planned |
|
|
|
|
## Case Studies & Validation Examples
|
|
|
|
| Drug | Outcome | Learning Points | Implementation Status |
|
|
|------|---------|----------------|----------------------|
|
|
| Amcenestrant | Efficacy failure | Target validation importance | ✓ Integrated |
|
|
| Flupirtine | Liver toxicity | Metabolite prediction crucial | 🚧 In Progress |
|
|
| Ranitidine | NDMA formation | Chemical stability prediction | ⏳ Planned |
|
|
| Multi-drug Examples | Variable | Interaction modeling needed | ⏳ Planned | |