# Synthea All Diseases A comprehensive pipeline for generating Synthea modules and synthetic patient data for any disease. ## Overview This pipeline leverages Nextflow to orchestrate the generation of disease modules and synthetic patient data using Synthea. It supports: 1. Automatic generation of disease modules using Claude AI 2. Synthetic patient generation with configurable parameters using the actual Synthea engine 3. Analysis of generated patient data ## Requirements - Docker - Docker Compose - Nextflow (version 20.10.0 or higher) - Java (required by Nextflow) - Python 3.6+ (if running scripts directly) ## Quick Start The easiest way to get started is to use our convenience scripts: ```bash # Set up the environment (builds Docker containers and prepares directories) ./scripts/prepare_environment.sh # Run the pipeline for a specific disease ./scripts/run_pipeline.sh --disease "Parkinson's Disease" --patients --population 50 ``` ## Manual Setup 1. Clone this repository: ```bash git clone https://github.com/yourusername/synthea-alldiseases.git cd synthea-alldiseases ``` 2. Create a `.env` file with your API keys (or copy from `.env.example`): ```bash cp .env.example .env # Edit .env with your preferred text editor ``` 3. Build and start the Docker containers: ```bash docker-compose build docker-compose up -d synthea ``` ## Usage ### Basic Command ```bash nextflow run main.nf --disease_name "Disease Name" [options] ``` ### Examples Generate a module for Hypertension and create 100 patients: ```bash nextflow run main.nf --disease_name "Hypertension" --generate_patients true --population 100 --gender 0.6 ``` Generate a module for Parkinson's Disease, create 50 patients, and analyze the data: ```bash nextflow run main.nf --disease_name "Parkinson's Disease" --generate_patients true --population 50 --analyze_patient_data true ``` ### Parameters | Parameter | Description | Default | |-----------|-------------|---------| | `--disease_name` | Name of the disease to model | (required) | | `--modules_dir` | Directory for modules | `modules` | | `--output_dir` | Directory for output files | `output` | | `--generate_patients` | Generate patient data | `false` | | `--population` | Number of patients to generate | `100` | | `--gender` | Gender distribution (0-1 for % female) | `0.5` | | `--min_age` | Minimum patient age | `0` | | `--max_age` | Maximum patient age | `90` | | `--seed` | Random seed for reproducibility | (random) | | `--analyze_patient_data` | Analyze generated data | `false` | | `--report_format` | Format for analysis report | `html` | | `--force_generate` | Force regeneration of modules | `false` | | `--publish_dir` | Directory for published output | `published_output` | ## Understanding the Data Flow 1. **Module Generation**: The pipeline first looks for an existing module for the specified disease. If not found, it generates one using the module_generator. 2. **Patient Generation**: If requested, the pipeline uses the actual Synthea engine to generate synthetic patient data based on the disease module. 3. **Analysis**: If requested, the pipeline analyzes the generated patient data and produces reports. ## Directory Structure - `modules/`: Contains generated disease modules - `module_generator/`: Contains the AI-powered module generation scripts - `scripts/`: Utility scripts for the pipeline - `output/`: Generated patient data (temporary) - `published_output/`: Final output data that persists between runs - `published_output/modules/`: Contains the generated modules - `published_output/{disease_name}/`: Contains patient data for each disease ## Convenience Scripts - `scripts/prepare_environment.sh`: Sets up the environment and starts containers - `scripts/run_pipeline.sh`: Simplified interface for running the pipeline - `scripts/analyze_patient_data.py`: Analyzes generated patient data - `scripts/check_condition_structure.py`: Validates module JSON structure ## Troubleshooting If you encounter issues: 1. Check that Docker containers are running: ```bash docker ps | grep synthea ``` 2. Ensure your modules directory has the required modules: ```bash ls -la modules/ ``` 3. Check logs for detailed error messages: ```bash tail -f .nextflow.log ``` 4. Try rebuilding the Docker containers: ```bash docker-compose down docker-compose build docker-compose up -d synthea ``` 5. If module generation fails, check that your API keys are correctly set in the .env file ## License This project uses the same license as Synthea.