Taction Software — FHIR Integration with Mirth Connect
Blog·May 12, 2026·Taction Software

Mirth Connect Disaster Recovery Architecture

RTO/RPO targets, backup strategy, channel exports, multi-region failover patterns, runbook templates, and the testing cadence that makes a Mirth Connect disaster recovery plan actually work when it has to.

Mirth ConnectDisaster RecoveryHIPAAAWSArchitecture
TL;DR

A production Mirth Connect disaster recovery architecture requires four components: database replication with cross-region copies, automated nightly channel configuration exports to S3 or equivalent versioned storage, infrastructure-as-code so the entire deployment can be rebuilt from version-controlled templates, and quarterly tested restore procedures. Target RTO under 1 hour and RPO under 5 minutes for most healthcare integration workloads. A backup you have never restored is not a backup.

Quick answer

A production Mirth Connect disaster recovery architecture requires four components: (1) database replication with cross-region copies, (2) automated nightly channel configuration exports to S3 or equivalent versioned storage, (3) infrastructure-as-code so the entire deployment can be rebuilt from version-controlled templates, and (4) quarterly tested restore procedures. Target RTO under 1 hour and RPO under 5 minutes for most healthcare integration workloads. A backup you have never restored is not a backup.

This guide walks through the HA-vs-DR distinction, the RTO/RPO targets that drive the architecture, the four backup layers every Mirth deployment has, three reference architectures with their cost-vs-recovery-time tradeoffs, the runbook template, the testing cadence, and the ten mistakes we see in nearly every Mirth DR plan we audit. Written by the engineers who deliver Mirth Connect support for US healthcare organizations.

HA vs DR — they solve different problems

Before architecting anything, get the terminology right. These two get conflated constantly.

High Availability (HA) protects against component-level failures within a region or data center. A single EC2 instance dies, but the load balancer routes traffic to a healthy instance. A database primary fails, but RDS automatically promotes the standby. HA failures are common and HA recovery is automatic.

Disaster Recovery (DR) protects against site-level or region-level failures. An entire AWS region becomes unavailable. A data center loses power for an extended period. A ransomware attack encrypts production infrastructure. DR failures are rare but catastrophic, and DR recovery involves human decision-making and procedural execution.

Production Mirth Connect deployments need both. HA gives you near-zero downtime for the failure modes you'll encounter regularly. DR gives you survivability for the rare events that would otherwise be existential.

ConcernHA solves itDR solves it
Single EC2 instance failure
Single AZ outage
Database primary failure
Region-wide AWS outage
Ransomware encrypts production
Data center fire
Accidental destruction of production
Compromise of all in-region backups✓ (with cross-region backup)

Setting RTO and RPO targets

Two numbers drive every DR architecture decision.

RTO (Recovery Time Objective) — how long can the system be down before business impact becomes unacceptable. Measured in minutes, hours, or days.

RPO (Recovery Point Objective) — how much data loss is acceptable. Measured in minutes, hours, or days of data.

These are business decisions, not technical decisions. Set them before designing infrastructure.

Typical RTO/RPO targets for healthcare integration workloads:

Workload typeRTORPO
Critical clinical (ADT, real-time orders)Under 15 minNear-zero
Standard clinical integrationUnder 1 hourUnder 5 min
Lab results, schedulingUnder 4 hoursUnder 30 min
Analytics / reporting feedsUnder 24 hoursUnder 4 hours
Archival / historicalUnder 7 daysUnder 24 hours

The tighter the targets, the more expensive the architecture. A 15-minute RTO with near-zero RPO requires active-passive multi-region with synchronous replication — meaningfully more expensive than a 4-hour RTO with 30-minute RPO. Match the architecture to the actual business requirement, not aspiration.

The four backup layers in a Mirth Connect deployment

Mirth Connect deployments have four distinct things that need backup, each with different mechanisms.

Layer 1 — The Mirth Connect database

The database holds channel definitions, channel statistics, message history, audit logs, and user accounts. Loss of this database means loss of operational history and visibility, even if channels can be reconstructed from exports.

Backup approach:

  • RDS automated backups with 30-day retention (HIPAA minimum)
  • Multi-AZ for synchronous replication within a region
  • Cross-region read replica for multi-region DR (one-way async replication)
  • Periodic manual snapshots before major changes
  • Snapshot copying to a separate AWS account for ransomware protection

Restore considerations:

  • Point-in-time recovery within the retention window
  • Cross-region snapshot copies enable region-failure recovery
  • Test restore quarterly to a non-production environment

For the broader database configuration choices that underpin this layer, see Mirth Connect database configuration.

Layer 2 — Channel configurations (the infrastructure-as-code layer)

Channel configurations are your most important asset for fast recovery. If you have current channel exports, you can rebuild a Mirth Connect deployment in hours. Without them, rebuilding from scratch is a multi-week project.

Backup approach:

  • Scheduled nightly export of all channels to S3
  • S3 bucket with versioning enabled (point-in-time recovery)
  • S3 bucket in a separate AWS account or different region
  • Channel exports committed to a Git repository for change history

Implementation: the Mirth Connect API provides export endpoints. A scheduled Lambda function or cron job calls these endpoints daily, writes the resulting XML to S3 with timestamp-based key naming, and triggers a backup-verified alarm if the export count drops.

Layer 3 — Message store

The message store contains the actual HL7/FHIR messages processed by Mirth. For HIPAA-covered organizations, retention requirements typically range from 6-7 years. The message store can grow very large.

Backup approach:

  • Database backups cover recent messages (within the retention window)
  • For long-term retention, archive completed messages to S3 with lifecycle policies
  • S3 lifecycle: standard for 90 days → Glacier Instant Retrieval → Glacier Deep Archive for 7+ year retention
  • Use SSE-KMS with customer-managed keys for PHI

Recovery consideration: message store recovery is typically the slowest part of a full DR scenario. If RTO requires fast restoration, prioritize recent message data and accept that older archives may take hours-to-days longer to make queryable.

Layer 4 — Attachments and large objects

If your channels handle large attachments (DICOM images, PDF documents, large HL7 messages), these are typically stored in S3 (recommended) or in the Mirth database (not recommended at scale).

Backup approach:

  • S3 with cross-region replication for HIPAA-eligible buckets
  • Versioning enabled to protect against accidental deletion
  • Object Lock for compliance-critical attachments that must not be deleted

Holding large attachments in the Mirth database is also one of the most common causes of the heap-space error covered in our Mirth Connect Java heap space error post — another reason to keep attachments in S3.

Three common DR architectures

The right architecture depends on your RTO/RPO targets and budget.

Architecture A — Single-region with backups (basic DR)

For workloads tolerating RTO of 4-24 hours and RPO of 4 hours.

Region us-east-1:
  - Mirth Connect (Multi-AZ EC2 or ECS)
  - RDS Multi-AZ
  - S3 with versioning (channels, attachments)
  - Daily database snapshots

Cross-region:
  - Snapshots copied to us-west-2 (manual or automated)
  - S3 cross-region replication

DR scenario: a region fails. Engineering team builds new infrastructure in us-west-2 from infrastructure-as-code, restores database from cross-region snapshot, imports channels from S3. Total recovery time: 4-24 hours.

Cost: Lowest of the three patterns. No standby infrastructure running in the second region.

Architecture B — Active-passive multi-region (standard DR)

For workloads requiring RTO of 1 hour and RPO of 5 minutes or less.

Region us-east-1 (primary):
  - Mirth Connect (Multi-AZ, actively processing)
  - RDS Multi-AZ
  - S3 with versioning

Region us-west-2 (passive):
  - Mirth Connect infrastructure deployed but not processing (warm standby)
  - RDS cross-region read replica
  - S3 cross-region replication
  - Route 53 health checks ready to fail over DNS

DR scenario: primary region fails. Health checks detect failure. DNS fails over to us-west-2. Read replica promoted to primary. Mirth instances start processing. Total recovery time: under 1 hour.

Cost: Standby infrastructure costs running 24/7, but at typically 30-50% of primary cost.

Architecture C — Active-active multi-region (advanced DR)

For workloads requiring RTO of minutes and tolerating zero downtime.

Region us-east-1 (active):
  - Mirth Connect actively processing channels for east customers
  - RDS primary with cross-region replication

Region us-west-2 (active):
  - Mirth Connect actively processing channels for west customers
  - RDS primary with cross-region replication

Both regions:
  - GeoDNS routes clients to nearest region
  - Channel state and message stores replicated bidirectionally (complex)

DR scenario: one region fails. Traffic automatically routes to the surviving region. Total recovery time: seconds-to-minutes.

Cost: Highest. Both regions run full production capacity. Operational complexity is significantly higher due to bidirectional replication.

Warning:Active-active is hard to do well. Most healthcare organizations achieve their actual business RTO/RPO targets more reliably with Architecture B. Don't pick active-active for prestige reasons.

For the underlying AWS deployment patterns these architectures sit on top of, see Mirth Connect on AWS Deployment Guide.

Infrastructure-as-code is non-negotiable

A DR plan that relies on human memory or wiki documentation will fail at exactly the wrong moment. Express your entire Mirth deployment in code.

What to put in code:

  • VPC, subnets, security groups, network ACLs
  • EC2 instances or ECS task definitions
  • RDS instances and configuration
  • S3 buckets and lifecycle policies
  • IAM roles and policies
  • CloudWatch alarms and log groups
  • Route 53 records
  • Load balancers and target groups
  • Secrets Manager entries (encrypted)

What NOT to put in code:

  • Sensitive values themselves (use Secrets Manager references)
  • Manually generated certificates (use ACM)
  • Customer-specific configuration that changes frequently (use parameter store)

Tools: CloudFormation, Terraform, AWS CDK, Pulumi — all work. Pick one and standardize.

Storage: the infrastructure-as-code repository must be backed up like any other critical asset. A private Git repository in your version control system, with branch protection and access logging.

The DR runbook — what it actually contains

A runbook is a step-by-step procedure for executing recovery. It exists because the people executing DR are often not the people who built the system, and they're operating under stress at 3am.

Minimum runbook contents:

  1. DR triggers and decision authority. Who declares a DR event. What conditions justify declaration. Who has authority to begin failover.
  2. Communication plan. Who gets notified, in what order, by what mechanism. Status update cadence during recovery.
  3. Pre-failover verification. Confirm primary is actually down (not a false alarm). Confirm DR target is healthy.
  4. Infrastructure deployment commands. Exact commands to deploy infrastructure-as-code to the DR region.
  5. Database restore procedure. Exact commands or console steps to restore from snapshot or promote replica.
  6. Channel import procedure. Commands to import latest channel exports from S3 into the new Mirth instance.
  7. Verification procedure. Specific test messages or queries to verify channels are processing correctly.
  8. DNS failover. Steps to route traffic to the recovered environment.
  9. Stakeholder communication. Template messages to send when recovery is complete.
  10. Post-incident review schedule. When and how to debrief.

Critical: the runbook must be tested. Reading it during a real DR event is too late.

Testing the plan

A backup you have never restored is not a backup. A DR plan you have never executed is not a plan.

Recommended testing cadence:

Test typeFrequencyScope
Database restoreMonthlyRestore latest backup to dev environment, verify queryable
Channel restoreMonthlyImport latest channel export to dev, verify channels start
Full DR drillQuarterlyBuild entire stack from code, restore data, verify message flow
Region failure simulationAnnuallyFull failover to DR region with full clinical workflow validation

Most organizations discover problems with their DR plan during the first quarterly drill. That's the point — discover problems in drills, not in incidents.

Common issues found during DR drills:

  • IAM role permissions missing in the DR region
  • Secrets Manager secrets not replicated
  • DNS TTLs too high for fast failover
  • Channel exports older than expected
  • Infrastructure-as-code references hardcoded to the primary region
  • Cross-region replication lag higher than expected RPO
  • Manual snapshot copy schedule broken
  • Test message routes don't exist in the DR environment

Each found issue is a win. Each issue not found is a future incident.

Common mistakes in Mirth Connect DR planning

Ten failure modes we see in nearly every Mirth Connect DR plan we audit:

  • Mistake 1 — Confusing HA with DR. Multi-AZ RDS is HA, not DR. It will not save you from a region-wide AWS event or a ransomware attack. Both are needed.
  • Mistake 2 — Backups in the same account as production. If an attacker compromises your AWS account, they can delete backups in that account too. Cross-account backup replication is the defense.
  • Mistake 3 — No infrastructure-as-code. Rebuilding from scratch in a real DR scenario takes days. With IaC, it takes hours. The investment in IaC pays for itself the first time you need it.
  • Mistake 4 — Never testing restore. Backups can fail silently. Snapshot processes can break. Channel export schedules can stop. The only way to know your backups work is to restore them periodically.
  • Mistake 5 — RTO/RPO targets without business validation.Engineering decides “we'll target 1 hour RTO” without confirming whether the business can tolerate 1 hour. Either commit to less, or accept honestly that the business can tolerate more. Misaligned targets lead to over-investment or under-investment.
  • Mistake 6 — Channel exports stored only on the production Mirth instance. When the production instance is gone, so are the exports. Channel exports must live outside the Mirth deployment they describe.
  • Mistake 7 — Forgetting about external dependencies.Mirth doesn't operate alone. EHRs, downstream consumers, identity providers, and partner systems all need consideration in a DR scenario. A Mirth instance that's recovered but can't reach its partners isn't actually recovered.
  • Mistake 8 — Underestimating recovery time for the message store. Restoring 6 years of message history takes meaningfully longer than restoring channel configurations. Architectures should make the recent message store available first and the historical archive available second.
  • Mistake 9 — Treating DR as a one-time project. A DR plan from 2023 may not match production in 2026. The DR architecture needs the same maintenance discipline as production itself.
  • Mistake 10 — No documented decision authority for declaring DR.A real DR scenario is high-pressure. Confusion over “are we actually doing this” wastes hours that the RTO budget doesn't have.

What good DR looks like — a checklist

A production-ready Mirth Connect deployment with proper DR has all of the following:

  • RTO and RPO targets defined and documented with business sign-off
  • HA architecture covers single-instance and single-AZ failures
  • Database has Multi-AZ and cross-region replication
  • Database backups have 30-day minimum retention and tested point-in-time recovery
  • Channel configurations exported nightly to versioned S3 storage
  • Channel exports stored in a separate AWS account or region
  • Entire infrastructure expressed in CloudFormation, Terraform, or equivalent
  • Infrastructure-as-code repository backed up with access controls and change history
  • Secrets Manager entries replicated to DR region
  • Route 53 health checks and failover routing configured
  • DR runbook documented with step-by-step procedures
  • Quarterly DR drills scheduled and tracked
  • Annual region-failure simulation exercise completed
  • Post-drill issues tracked to resolution
  • DR plan reviewed and updated whenever production architecture changes
  • Stakeholders trained on DR procedures
  • DR documentation accessible without production access (e.g., not stored only in Confluence on the production network)

If any items are unchecked, those are your priorities.

When to get help

Mirth Connect DR architecture sits at the intersection of cloud infrastructure, database engineering, healthcare compliance, and operational discipline. Getting it right requires experience across all four. Most teams that build their first DR plan discover gaps during their first drill, regardless of how carefully they planned.

Our free Mirth Connect health checkexplicitly covers DR posture as one of the audit points. If you're not sure whether your current DR plan would actually survive a regional event, the audit will tell you.

Book a Free Health Check →

For related operational context, see our Mirth Connect on AWS Deployment Guide, our Mirth Connect Performance Tuning post, and our Mirth Connect Security and HIPAA Checklist.

To estimate the cost of building a production-ready DR architecture, run our pricing calculator — select your full scope including the DR region.

Prefer email? info@tactionsoft.com — we reply within 4 business hours.

FAQ

Frequently Asked Questions

What is the difference between HA and DR for Mirth Connect?
High Availability (HA) and Disaster Recovery (DR) solve different problems. HA protects against single-component failures within a region or data center using redundancy — multiple Mirth nodes, Multi-AZ database. DR protects against region-wide or site-wide failures using cross-region replication and infrastructure-as-code. Production Mirth deployments need both. HA gives you near-zero downtime for common failures. DR gives you survivability for catastrophic events.
What RTO and RPO targets should I set for Mirth Connect?
For typical healthcare integration workloads, target RTO (Recovery Time Objective) of 1 hour or less and RPO (Recovery Point Objective) of 5 minutes or less. Critical clinical systems may require RTO under 15 minutes and near-zero RPO, which requires multi-region active-passive architecture. Less critical workloads (analytics pipelines, reporting) may tolerate RTO of 4-24 hours and RPO of 1 hour.
How do you back up Mirth Connect channels?
Back up Mirth Connect channels by scheduling regular exports of all channel configurations to a persistent store outside the Mirth instance, typically S3 with versioning enabled. Use the Mirth API to export channels programmatically, or schedule the export from a backup script. Channel configurations are XML files that capture all source, destination, filter, and transformer logic. They must survive total instance loss.
Does Mirth Connect support multi-region deployment?
Mirth Connect supports multi-region deployment through external infrastructure rather than native clustering across regions. Common patterns include active-passive with cross-region database replication and DNS-based failover, or active-active with regional Mirth deployments each serving their local clinical systems. Multi-region adds significant operational complexity and cost — implement it only when RTO and RPO targets actually require it.
How often should you test Mirth Connect disaster recovery?
Test Mirth Connect disaster recovery at minimum quarterly through a documented drill that restores database, deploys infrastructure from code, imports channels from backup, and verifies end-to-end message flow in a staging environment. Annual full DR exercises that simulate region failure are recommended for critical clinical workloads. Untested backups frequently fail when finally needed — the first restore attempt typically discovers gaps in the DR plan.
What backup retention period should Mirth Connect use for HIPAA compliance?
HIPAA itself does not mandate a specific backup retention period, but the broader Security Rule requires that PHI be recoverable. Practical guidance: 30 days minimum for operational restores, and 6-7 years for long-term message archive in line with typical state medical record retention laws. Use S3 lifecycle policies to move older archives to Glacier and Glacier Deep Archive for cost-effective long-term retention.
Are RDS Multi-AZ backups sufficient for Mirth Connect DR?
No. RDS Multi-AZ is HA, not DR. It protects against single-AZ failures but does not protect against region-wide AWS outages, ransomware, accidental destruction of production, or compromise of the AWS account. Multi-AZ should be combined with cross-region replication (read replica or copied snapshots) and cross-account backup replication for a complete DR posture.
Where should Mirth Connect channel exports be stored?
Outside the Mirth deployment they describe. Production-grade pattern: nightly export via the Mirth API to an S3 bucket with versioning enabled, in a separate AWS account from production, with cross-region replication. Optionally also commit channel XML to a private Git repository so you have full change history. Never rely on channel exports stored only on the production Mirth instance — when production is gone, so are the exports.
How long does Mirth Connect DR recovery actually take?
Depends on architecture. Architecture A (single-region with cross-region snapshots): 4-24 hours, dominated by infrastructure rebuild and snapshot restore time. Architecture B (active-passive multi-region warm standby): under 1 hour, dominated by DNS propagation and read-replica promotion. Architecture C (active-active multi-region): seconds-to-minutes, automatic. Most healthcare organizations choose Architecture B because it hits the RTO target reliably without the complexity of bidirectional replication.
Where can I get help building a Mirth Connect DR architecture?
Our team builds disaster recovery architectures for Mirth Connect deployments at hospitals, labs, and health-tech teams across North America. Book a free Mirth Connect health check — DR posture is one of the audit points — or run our pricing calculator to scope a DR build-out. Email info@tactionsoft.com — we reply within 4 business hours.

Need expert Mirth Connect support?

Whether you have a one-time integration project or need ongoing managed support, every engagement is named, scoped, and priced upfront — productized packages, no hourly billing.

Talk to a Mirth Solutions Architect

60-second form. Senior engineer responds within one business day.

What is 6 + 9 ?