Trying to fix basic functionality again.
This commit is contained in:
280
SYNTHEA_GUIDE.md
Normal file
280
SYNTHEA_GUIDE.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# Synthea Disease Module Generator Guide
|
||||
|
||||
This guide explains how to use our Nextflow pipeline to generate Synthea disease modules and synthetic patient data.
|
||||
|
||||
## Overview
|
||||
|
||||
Our pipeline provides three main functionalities:
|
||||
|
||||
1. **Disease Module Generation**: Creates Synthea disease modules using Claude AI
|
||||
2. **Synthetic Patient Generation**: Uses the generated modules to create synthetic patient data with configurable demographic characteristics
|
||||
3. **Patient Data Analysis**: Generates statistics and reports from the synthetic patient data
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Nextflow installed
|
||||
- Synthea installed (with Java 8 or 11 compatibility)
|
||||
- Anthropic API key (for Claude)
|
||||
|
||||
## Basic Usage
|
||||
|
||||
### 1. Generating Disease Modules
|
||||
|
||||
To generate disease modules for specific diseases:
|
||||
|
||||
```bash
|
||||
# Generate a module for a single disease
|
||||
nextflow run synthea_module_generator.nf --disease_name "Seasonal Allergies"
|
||||
|
||||
# Generate modules for multiple diseases
|
||||
nextflow run synthea_module_generator.nf --disease_name "Asthma,Diabetes,Hypertension"
|
||||
```
|
||||
|
||||
### 2. Generating Synthetic Patients
|
||||
|
||||
To generate synthetic patients with the specified diseases:
|
||||
|
||||
```bash
|
||||
# Generate 100 patients with Asthma (default parameters)
|
||||
nextflow run synthea_module_generator.nf --disease_name "Asthma" --generate_patients true
|
||||
|
||||
# Generate 1000 patients with specific parameters
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Diabetes" \
|
||||
--generate_patients true \
|
||||
--population 1000 \
|
||||
--gender 0.6 \
|
||||
--min_age 40 \
|
||||
--max_age 80 \
|
||||
--seed 12345 \
|
||||
--location "Massachusetts"
|
||||
```
|
||||
|
||||
### 3. Analyzing Patient Data
|
||||
|
||||
To generate patients and analyze the resulting data:
|
||||
|
||||
```bash
|
||||
# Generate patients and produce an HTML analysis report
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Asthma" \
|
||||
--generate_patients true \
|
||||
--population 500 \
|
||||
--analyze_output true \
|
||||
--report_format html
|
||||
```
|
||||
|
||||
## Available Parameters
|
||||
|
||||
### Module Generation Parameters
|
||||
|
||||
- `--disease_name`: Name of the disease(s) to generate (comma-separated for multiple)
|
||||
- `--modules_dir`: Directory for modules (default: `src/main/resources/modules`)
|
||||
- `--batch_size`: Number of modules to generate per batch (default: 1)
|
||||
- `--max_cost`: Maximum cost for API calls (default: 5.0 USD)
|
||||
- `--timeout`: Maximum time per batch in seconds (default: 300)
|
||||
|
||||
### Patient Generation Parameters
|
||||
|
||||
#### Basic Patient Parameters
|
||||
- `--generate_patients`: Set to `true` to generate patients (default: `false`)
|
||||
- `--population`: Number of patients to generate (default: 100)
|
||||
- `--gender`: Gender distribution - `M`, `F`, or a decimal for percent female (e.g., 0.7 = 70% female)
|
||||
- `--min_age`: Minimum patient age (default: 0)
|
||||
- `--max_age`: Maximum patient age (default: 100)
|
||||
- `--seed`: Random seed for reproducibility (default: 12345)
|
||||
- `--location`: Location for patients (default: random)
|
||||
- `--output_dir`: Output directory for patients (default: `output/synthetic_patients`)
|
||||
|
||||
#### Enhanced Demographic Parameters
|
||||
- `--race_ethnicity`: Comma-separated list of races with percentages (e.g., `white=0.7,hispanic=0.15,black=0.15`)
|
||||
- `--socioeconomic`: Socioeconomic status distribution (e.g., `high=0.2,middle=0.5,low=0.3`)
|
||||
- `--zip_codes`: Comma-separated list of ZIP codes to distribute patients (e.g., `02138,02139,02140`)
|
||||
|
||||
#### Disease Prevalence Parameters
|
||||
- `--prevalence`: Percentage of population with the disease, between 0.0 and 1.0 (e.g., `0.05` for 5%)
|
||||
- `--comorbidities`: Whether to include common comorbidities (set to `true` or `false`, default: `false`)
|
||||
|
||||
### Analysis Parameters
|
||||
- `--analyze_output`: Whether to run analysis on the output (default: `false`)
|
||||
- `--report_format`: Format for the analysis report (`html`, `csv`, `json`) (default: `html`)
|
||||
|
||||
## Detailed Configuration Guide
|
||||
|
||||
### Controlling Demographics
|
||||
|
||||
You can precisely control the demographic distribution of your patient population:
|
||||
|
||||
#### Gender Distribution
|
||||
|
||||
```bash
|
||||
# Generate all male patients
|
||||
nextflow run synthea_module_generator.nf --disease_name "Prostate Cancer" --generate_patients true --gender M
|
||||
|
||||
# Generate all female patients
|
||||
nextflow run synthea_module_generator.nf --disease_name "Ovarian Cancer" --generate_patients true --gender F
|
||||
|
||||
# Generate 60% female, 40% male
|
||||
nextflow run synthea_module_generator.nf --disease_name "Diabetes" --generate_patients true --gender 0.6
|
||||
```
|
||||
|
||||
#### Age Distribution
|
||||
|
||||
```bash
|
||||
# Generate pediatric patients (0-18 years)
|
||||
nextflow run synthea_module_generator.nf --disease_name "Asthma" --generate_patients true --min_age 0 --max_age 18
|
||||
|
||||
# Generate elderly patients (65+ years)
|
||||
nextflow run synthea_module_generator.nf --disease_name "Parkinsons" --generate_patients true --min_age 65 --max_age 90
|
||||
```
|
||||
|
||||
#### Race and Ethnicity
|
||||
|
||||
```bash
|
||||
# Generate patients with specific racial distribution
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Hypertension" \
|
||||
--generate_patients true \
|
||||
--race_ethnicity "white=0.6,black=0.2,hispanic=0.15,asian=0.05"
|
||||
```
|
||||
|
||||
#### Socioeconomic Status
|
||||
|
||||
```bash
|
||||
# Generate patients with specific socioeconomic distribution
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Diabetes" \
|
||||
--generate_patients true \
|
||||
--socioeconomic "high=0.2,middle=0.5,low=0.3"
|
||||
```
|
||||
|
||||
### Disease Prevalence Simulation
|
||||
|
||||
You can control the prevalence of diseases in your synthetic population:
|
||||
|
||||
```bash
|
||||
# Generate 1000 patients with 8% diabetes prevalence (realistic for US population)
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Diabetes" \
|
||||
--generate_patients true \
|
||||
--population 1000 \
|
||||
--prevalence 0.08
|
||||
|
||||
# Generate patients with comorbidities
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Hypertension" \
|
||||
--generate_patients true \
|
||||
--prevalence 0.3 \
|
||||
--comorbidities true
|
||||
```
|
||||
|
||||
### Analysis Reports
|
||||
|
||||
You can generate analysis reports in various formats:
|
||||
|
||||
```bash
|
||||
# Generate HTML report (default)
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Asthma" \
|
||||
--generate_patients true \
|
||||
--analyze_output true
|
||||
|
||||
# Generate CSV reports
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Diabetes" \
|
||||
--generate_patients true \
|
||||
--analyze_output true \
|
||||
--report_format csv
|
||||
```
|
||||
|
||||
## Example Scenarios
|
||||
|
||||
### Realistic Diabetes Population
|
||||
|
||||
Generate a realistic U.S. diabetes population with proper demographics:
|
||||
|
||||
```bash
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Diabetes" \
|
||||
--generate_patients true \
|
||||
--population 1000 \
|
||||
--prevalence 0.08 \
|
||||
--race_ethnicity "white=0.6,black=0.13,hispanic=0.18,asian=0.06,native=0.03" \
|
||||
--min_age 18 \
|
||||
--max_age 90 \
|
||||
--analyze_output true
|
||||
```
|
||||
|
||||
### Pediatric Asthma Study Cohort
|
||||
|
||||
Generate a pediatric asthma cohort for a simulated clinical study:
|
||||
|
||||
```bash
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Asthma" \
|
||||
--generate_patients true \
|
||||
--population 500 \
|
||||
--min_age 5 \
|
||||
--max_age 17 \
|
||||
--gender 0.5 \
|
||||
--prevalence 0.08 \
|
||||
--analyze_output true \
|
||||
--report_format html
|
||||
```
|
||||
|
||||
### Multi-Disease Elderly Population
|
||||
|
||||
Generate an elderly population with multiple chronic conditions:
|
||||
|
||||
```bash
|
||||
nextflow run synthea_module_generator.nf \
|
||||
--disease_name "Hypertension,Arthritis,COPD" \
|
||||
--generate_patients true \
|
||||
--population 1000 \
|
||||
--min_age 65 \
|
||||
--max_age 90 \
|
||||
--comorbidities true \
|
||||
--analyze_output true
|
||||
```
|
||||
|
||||
## Analysis Report Details
|
||||
|
||||
The analysis report includes:
|
||||
|
||||
1. **Patient Demographics**
|
||||
- Gender distribution
|
||||
- Age distribution (by age groups)
|
||||
- Race/ethnicity distribution
|
||||
|
||||
2. **Disease Statistics**
|
||||
- Top 10 conditions in the patient population
|
||||
- Top 10 medications prescribed
|
||||
|
||||
3. **Summary Statistics**
|
||||
- Total number of patients
|
||||
- Age ranges (min, max, average)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Compatibility Issues
|
||||
|
||||
If you encounter Java compatibility issues, ensure you're using Java 8 or 11 which are most compatible with Synthea:
|
||||
|
||||
```bash
|
||||
# Set JAVA_HOME to Java 8 before running
|
||||
export JAVA_HOME=/path/to/java8
|
||||
```
|
||||
|
||||
### Debugging Module Generation
|
||||
|
||||
If module generation fails:
|
||||
1. Check the `.error` file in the modules directory
|
||||
2. Verify your API key is set correctly
|
||||
3. Try generating a simpler disease first
|
||||
|
||||
### Patient Generation Issues
|
||||
|
||||
If patient generation fails:
|
||||
1. Check that Synthea is properly installed
|
||||
2. Verify the modules exist in the modules directory
|
||||
3. Check that parameter values are within valid ranges
|
||||
Reference in New Issue
Block a user