- Push as Docker V2 manifest (no OCI index/attestation) so K8s can pull - Tag as v3 to avoid cached image issues - Fix gender: omit -g flag for 0.5 (both), Synthea only accepts M or F
Synthea All Diseases
A comprehensive pipeline for generating Synthea modules and synthetic patient data for any disease.
Overview
This pipeline leverages Nextflow to orchestrate the generation of disease modules and synthetic patient data using Synthea. It supports:
- Automatic generation of disease modules using Claude AI
- Synthetic patient generation with configurable parameters using the actual Synthea engine
- Analysis of generated patient data
Requirements
- Docker
- Docker Compose
- Nextflow (version 20.10.0 or higher)
- Java (required by Nextflow)
- Python 3.6+ (if running scripts directly)
Quick Start
The easiest way to get started is to use our convenience scripts:
# Set up the environment (builds Docker containers and prepares directories)
./scripts/prepare_environment.sh
# Run the pipeline for a specific disease
./scripts/run_pipeline.sh --disease "Parkinson's Disease" --patients --population 50
Manual Setup
-
Clone this repository:
git clone https://github.com/yourusername/synthea-alldiseases.git cd synthea-alldiseases -
Create a
.envfile with your API keys (or copy from.env.example):cp .env.example .env # Edit .env with your preferred text editor -
Build and start the Docker containers:
docker-compose build docker-compose up -d synthea
Usage
Basic Command
nextflow run main.nf --disease_name "Disease Name" [options]
Examples
Generate a module for Hypertension and create 100 patients:
nextflow run main.nf --disease_name "Hypertension" --generate_patients true --population 100 --gender 0.6
Generate a module for Parkinson's Disease, create 50 patients, and analyze the data:
nextflow run main.nf --disease_name "Parkinson's Disease" --generate_patients true --population 50 --analyze_patient_data true
Parameters
| Parameter | Description | Default |
|---|---|---|
--disease_name |
Name of the disease to model | (required) |
--modules_dir |
Directory for modules | modules |
--output_dir |
Directory for output files | output |
--generate_patients |
Generate patient data | false |
--population |
Number of patients to generate | 100 |
--gender |
Gender distribution (0-1 for % female) | 0.5 |
--min_age |
Minimum patient age | 0 |
--max_age |
Maximum patient age | 90 |
--seed |
Random seed for reproducibility | (random) |
--analyze_patient_data |
Analyze generated data | false |
--report_format |
Format for analysis report | html |
--force_generate |
Force regeneration of modules | false |
--publish_dir |
Directory for published output | published_output |
Understanding the Data Flow
- Module Generation: The pipeline first looks for an existing module for the specified disease. If not found, it generates one using the module_generator.
- Patient Generation: If requested, the pipeline uses the actual Synthea engine to generate synthetic patient data based on the disease module.
- Analysis: If requested, the pipeline analyzes the generated patient data and produces reports.
Directory Structure
modules/: Contains generated disease modulesmodule_generator/: Contains the AI-powered module generation scriptsscripts/: Utility scripts for the pipelineoutput/: Generated patient data (temporary)published_output/: Final output data that persists between runspublished_output/modules/: Contains the generated modulespublished_output/{disease_name}/: Contains patient data for each disease
Convenience Scripts
scripts/prepare_environment.sh: Sets up the environment and starts containersscripts/run_pipeline.sh: Simplified interface for running the pipelinescripts/analyze_patient_data.py: Analyzes generated patient datascripts/check_condition_structure.py: Validates module JSON structure
Troubleshooting
If you encounter issues:
-
Check that Docker containers are running:
docker ps | grep synthea -
Ensure your modules directory has the required modules:
ls -la modules/ -
Check logs for detailed error messages:
tail -f .nextflow.log -
Try rebuilding the Docker containers:
docker-compose down docker-compose build docker-compose up -d synthea -
If module generation fails, check that your API keys are correctly set in the .env file
License
This project uses the same license as Synthea.