Quick answer: to integrate an LLM with healthcare data, build a pipeline that (1) extracts HL7 v2 or FHIR R4 data from clinical systems, (2) transforms it into prompt-ready structured JSON or natural-language summaries, and (3) routes it to a HIPAA-eligible LLM endpoint with a signed BAA. Mirth Connect handles steps 1 and 2. Azure OpenAI, AWS Bedrock, and Google Vertex AI handle step 3. Never send PHI to a non-BAA-covered LLM.
This guide pairs with our HL7 v2 to FHIR R4 mapping reference, our Mirth Connect on AWS deployment guide, and the broader Mirth Connect complete guide. If you'd rather hand this off, our Mirth Connect support team works on these architectures regularly.
1. Why This Matters Now
LLMs have moved from interesting demos to production tools in healthcare. Clinical summarization, prior authorization automation, ambient documentation, decision support, and chart abstraction are all being shipped at major health systems and startups.
The integration question that nobody documented well: how do you actually get HL7 v2 and FHIR R4 data into an LLM, and how do you do it without violating HIPAA?This guide answers both. It assumes you're a technical reader — engineer, architect, or product lead — building something real.
2. The Three-Layer Architecture
Every healthcare-LLM integration we've built follows this structure.
Layer 1 — Data ingestion
Clinical data arrives from EHRs and other source systems. Typically HL7 v2 (real-time event stream) and/or FHIR R4 (request-response or bulk export). This is where Mirth Connect or another integration engine sits.
Layer 2 — Transformation and context preparation
Raw HL7/FHIR is converted into the format the LLM needs — usually structured JSON or natural-language summaries. PHI gating, de-identification, and prompt template assembly happen here.
Layer 3 — LLM inference
The prepared context is sent to an LLM endpoint. The model returns its response — summary, classification, extracted data, recommendation. Results flow back to your application.
Each layer has independent HIPAA, security, and operational considerations. The integration engine pattern keeps them separated cleanly.
3. HIPAA Compliance for LLMs — What's Actually Required
Before any architecture decisions, get this part right.
Sending PHI to an LLM requires a Business Associate Agreement (BAA) with the LLM provider. This is non-negotiable under HIPAA. Without a BAA, sending PHI to a model is a regulatory violation regardless of how secure the connection is.
3.1 HIPAA-eligible LLM platforms (with BAA available)
| Platform | Models available | BAA from |
|---|---|---|
| Azure OpenAI Service | GPT-4, GPT-4o, GPT-3.5 | Microsoft |
| AWS Bedrock | Claude, Llama, Titan, others | Amazon |
| Google Vertex AI | Gemini, PaLM, Med-PaLM | |
| Self-hosted on HIPAA-eligible infra | Llama, Mistral, Mixtral, Meditron | Your own infrastructure |
3.2 Not HIPAA-eligible (do not send PHI)
- Standard ChatGPT (chat.openai.com)
- Anthropic's direct API without enterprise BAA
- Google Gemini consumer product
- Any free-tier or consumer-facing LLM endpoint
3.3 The de-identification alternative
If you want to use a non-BAA model, you must remove PHI before sending data. HIPAA Safe Harbor de-identification requires removing 18 specific identifier types (names, dates more specific than year, addresses, phone numbers, MRNs, etc.). Done properly, de-identified data is no longer PHI and can flow to any LLM. Done improperly, you've still violated HIPAA. Investment in a robust de-identification pipeline is significant. For the broader compliance baseline see our HIPAA compliance for integration engineers primer.
4. Approach 1 — HL7 v2 Directly to LLM
Sometimes the simplest pattern wins. Parse the HL7 message into structured JSON, include it in the prompt, ask the LLM to do something with it.
4.1 When this works well
- Quick prototypes and proofs of concept
- Single-message context (one ADT, one ORU)
- LLMs with strong structured-data parsing (GPT-4, Claude)
- Use cases where preserving raw HL7 detail matters
4.2 Pipeline shape
HL7 v2 Message
↓
Mirth Connect channel
↓
JavaScript/Groovy transformer → structured JSON
↓
HTTP Sender destination → LLM endpoint
↓
Response handler → downstream system4.3 Example transformer (Mirth, JavaScript)
// Parse HL7 v2 message into structured JSON
var pid = msg['PID'];
var patient = {
mrn: pid['PID.3']['PID.3.1'].toString(),
name: {
family: pid['PID.5']['PID.5.1'].toString(),
given: pid['PID.5']['PID.5.2'].toString()
},
dob: pid['PID.7']['PID.7.1'].toString(),
gender: pid['PID.8'].toString()
};
var obx_results = [];
for (var i = 0; i < msg['OBX'].length(); i++) {
var obx = msg['OBX'][i];
obx_results.push({
observation: obx['OBX.3']['OBX.3.2'].toString(),
value: obx['OBX.5'].toString(),
units: obx['OBX.6'].toString(),
abnormal_flag: obx['OBX.8'].toString()
});
}
var llm_context = {
patient: patient,
observations: obx_results
};
channelMap.put('llm_context', JSON.stringify(llm_context));4.4 Prompt template
You are a clinical informatics assistant. The following lab results
were just received for a patient:
{{llm_context}}
Identify any results that are clinically significant and warrant
attention. Return your response as JSON with this structure:
{
"significant_findings": [...],
"recommended_actions": [...],
"confidence": "high|medium|low"
}Limitations:HL7 v2's positional structure is harder for LLMs to reason over than FHIR's semantic structure. For anything beyond simple use cases, Approach 2 is cleaner.
5. Approach 2 — HL7 to FHIR, Then FHIR to LLM (Recommended)
This is the pattern we use most in production. Convert HL7 v2 to FHIR R4 first, then send FHIR resources to the LLM.
5.1 Why FHIR works better than raw HL7 for LLMs
- FHIR resources have semantic names (
Patient,Observation) instead of cryptic segments (PID,OBX) - Resource references are explicit — LLMs reason about them naturally
- JSON structure is well-formed and consistent
- LLMs have been trained on far more FHIR examples than raw HL7
5.2 Pipeline shape
HL7 v2 Message
↓
Mirth Connect channel — HL7 listener
↓
HL7 to FHIR R4 transformer
↓
FHIR Bundle stored in clinical data store
↓
LLM service layer queries FHIR data and constructs prompts
↓
LLM inference
↓
Response handling and downstream actionsThe decoupling matters. Your HL7-to-FHIR transformation is reusable for non-LLM use cases. Your LLM service layer can evolve independently. Your FHIR data is queryable for purposes beyond LLM context. For the transformation step itself, see our HL7 v2 to FHIR R4 mapping reference.
5.3 Prompt template
You are a clinical informatics assistant. The following FHIR Bundle
contains a patient's recent clinical events:
{{fhir_bundle}}
Summarize the clinical situation in 3-5 sentences. Identify any
conditions, medications, or observations that require follow-up.
Format your response as structured JSON matching the schema in the
system prompt.6. Approach 3 — Pre-Summarize, Then Send the Summary
For high-volume use cases where token cost or latency matters, generate a natural-language summary from FHIR data first, then feed the summary (not the raw data) to the LLM.
6.1 When this works well
- High-volume scenarios where tokens cost real money
- Use cases where the LLM only needs a clinical narrative, not exact values
- Latency-sensitive applications
- Multi-turn conversations where context window matters
6.2 Pipeline shape
FHIR Bundle
↓
Template-based summarization (rule-based, deterministic)
↓
Natural-language clinical summary (1-2 paragraphs)
↓
LLM inference with summary as context6.3 Example summary template output
Patient is a 67-year-old male with a history of Type 2 diabetes
mellitus and hypertension. On 2026-05-10, he was admitted via the
emergency department with chief complaint of chest pain. Vital signs
on admission: BP 168/94, HR 96, SpO2 97%. Recent labs include troponin
0.04 ng/mL (normal), BNP 124 pg/mL (elevated), and HbA1c 8.2%
(elevated). Current medications include metformin 1000mg BID,
lisinopril 20mg daily, and atorvastatin 40mg nightly.This summary is far cheaper to feed into an LLM than the underlying 12-resource FHIR Bundle, and for many use cases (clinical question-answering, decision support reasoning), it's sufficient context.
7. Common Production Use Cases
These are the LLM-in-healthcare patterns we see in real deployments.
- Clinical document summarization. Feed FHIR
DocumentReferencecontent (or extracted note text) to an LLM with a summarization prompt. Output goes to a clinician dashboard or EHR sidebar. - Lab result interpretation. Feed Observation resources for new lab results, return clinical significance commentary for clinician review. Generally framed as decision support, not autonomous action.
- Ambient documentation. Audio capture during a patient encounter is transcribed, then summarized into a structured clinical note. The note is written back to the EHR as a FHIR
DocumentReference. One of the most commercially successful healthcare LLM patterns to date. - Prior authorization automation. LLM reads patient FHIR data plus payer requirements, generates the prior auth submission packet. Human-in-the-loop reviews before submission.
- Patient-facing Q&A. Patient asks a question via a portal or chatbot. The LLM has access to the patient's own FHIR data (via SMART on FHIR) and answers in plain language. Requires careful guardrails — patient-facing clinical answers carry liability.
- Quality measure abstraction. LLM reads structured and unstructured patient data to determine quality measure compliance. Reduces manual chart abstraction work.
- Coding assistance. LLM reads encounter notes and proposes ICD-10 / CPT codes. Human coder reviews and accepts.
8. Architecture Patterns
8.1 Pattern A — Synchronous request-response
Best for interactive use cases (clinical Q&A, real-time decision support).
User action / event
↓
Application server gathers context (FHIR API or local store)
↓
LLM inference (typically 1-10 seconds)
↓
Response rendered to userTrade-offs: Latency-bound. Token cost per request. Caching strategies help.
8.2 Pattern B — Asynchronous batch processing
Best for high-volume back-office automation (chart abstraction, coding assistance, summarization).
Trigger event (new encounter completed, scheduled batch job)
↓
Queue task with patient context reference
↓
Worker pool processes tasks
↓
Worker fetches FHIR data, calls LLM, writes results
↓
Downstream consumers read resultsTrade-offs: Higher throughput, lower cost per task, but not interactive.
8.3 Pattern C — Real-time event-driven (HL7 + LLM)
Best for ambient clinical use cases where events trigger LLM reasoning.
HL7 v2 event (ADT, ORU, etc.)
↓
Mirth Connect routes event
↓
Transform to FHIR + enrich with patient history
↓
LLM inference
↓
Result published to downstream systemsTrade-offs: Bridges legacy HL7-based systems with modern LLM workflows. We use this pattern frequently.
9. Security and Operational Considerations
Beyond HIPAA compliance, healthcare LLM integrations have specific operational concerns.
- Prompt injection. If patient-supplied data (e.g., patient-entered text) reaches an LLM, malicious or accidental prompts can break the system. Validate and sanitize any free-text input. Treat user-supplied content as untrusted.
- Hallucination management. LLMs make things up confidently. For any output that informs clinical decisions, design for verification — the LLM is a suggestion engine, not the decision maker. Always log inputs and outputs for retrospective review.
- Cost monitoring. Token costs at scale add up fast. Monitor tokens per request, requests per day, and cost per use case. Set alerts on cost anomalies. Healthcare LLM bills can grow from negligible to material in weeks.
- Latency budgets. Synchronous LLM calls add 1-10 seconds. Plan for this in user-facing flows. Cache aggressively. Consider Pattern B (async) when sync isn't required.
- Model versioning. LLM providers update models. Output behavior changes. Pin model versions where possible (
gpt-4-0613, notgpt-4). Regression-test when versions change. - Audit logging. Log every LLM request and response with timestamp, user (if applicable), patient context, and full prompt/completion. Required for HIPAA accounting and clinical incident review.
- Rate limits. All managed LLM platforms have rate limits. Plan for them. Use queuing and backoff strategies.
10. How Mirth Connect Fits Into LLM Pipelines
Mirth Connect is well-suited as the integration engine in healthcare LLM architectures because it handles the messy parts.
- HL7 v2 parsing and transformation
- HL7 to FHIR conversion (with custom transformers or off-the-shelf channel templates)
- HTTP/HTTPS integration with cloud LLM endpoints
- Reliable message queuing and retry
- Audit logging
- Error handling
A typical Mirth Connect channel for an LLM pipeline:
- Source: HL7 v2 MLLP listener, FHIR API poller, or scheduled database query.
- Filter: Determine which messages need LLM processing (not all do).
- Transformer: Parse HL7, convert to FHIR or structured JSON, enrich with patient context, assemble the prompt.
- Destination: HTTP Sender to Azure OpenAI / AWS Bedrock / Google Vertex AI endpoint.
- Response handler: Parse LLM response, route to downstream systems (EHR, dashboard, queue).
For setup guidance on Mirth itself, see our Mirth Connect on AWS deployment guide and the Mirth Connect guide pillar.
11. Common Mistakes in Healthcare LLM Integration
Mistake 1 — Sending PHI to non-BAA LLMs during development
Even “just for testing” with a synthetic-looking patient name is a HIPAA violation if the data came from real records. Use synthetic data generators (Synthea, Mirth-bundled synthetic samples) for dev.
Mistake 2 — No de-identification when one is required
If your architecture depends on a non-BAA model, your de-identification pipeline is part of the critical path. It must be auditable and reliably remove all 18 HIPAA identifier types.
Mistake 3 — Putting raw HL7 v2 in prompts
HL7 v2's pipe-delimited positional format is hard for LLMs to reason over. Convert to FHIR or structured JSON first. Output quality jumps dramatically.
Mistake 4 — No human-in-the-loop for clinical outputs
LLMs are wrong sometimes. Outputs that influence clinical decisions must be reviewed by a clinician before action. Architecture should enforce this.
Mistake 5 — Ignoring token cost at scale
A prompt that costs 2 cents in development costs $2,000 per day at production volume. Cost-model your use case before deploying.
Mistake 6 — Pinning to model versions you'll later regret
Or, conversely, never pinning and being surprised when behavior changes. Both are problems. Choose a strategy: pin and test on version upgrades, or run continuously against the latest and accept variability.
Mistake 7 — Using the LLM for things rule-based logic can handle
LLMs are expensive and probabilistic. Some healthcare use cases (e.g., “flag any troponin > X”) should be deterministic rules, not LLM calls.
12. Vendor Selection — Quick Reference
For most healthcare organizations choosing an LLM platform in 2026, the practical decision is between three managed options and self-hosting.
| Need | Recommended choice |
|---|---|
| Already on Azure, want OpenAI models | Azure OpenAI Service |
| Already on AWS, want Claude or Llama | AWS Bedrock |
| Already on GCP, want Gemini or Med-PaLM | Google Vertex AI |
| Want full data sovereignty / no third-party LLM exposure | Self-hosted Llama or Mistral on HIPAA-eligible infrastructure |
| Need lowest cost at high volume | AWS Bedrock with Llama, or self-hosted |
| Need highest reasoning quality | GPT-4-class via Azure OpenAI, or Claude via AWS Bedrock |
| Building patient-facing chat | Whichever has best latency and BAA in your region |
Sign the BAA before any PHI flows. Validate the BAA covers the specific service variants you'll use (some BAAs cover only specific Azure regions or AWS account types).
13. What's Next in Healthcare LLMs
Trends worth tracking through 2026 and beyond:
- Multimodal models — combining text with medical imaging (radiology, pathology, dermatology). Already in clinical pilots.
- Smaller specialized models — fine-tuned medical models with smaller footprints, suitable for self-hosting and lower-cost inference.
- Native FHIR understanding — models trained specifically on FHIR semantics rather than treating FHIR as generic JSON.
- Real-time ambient AI — passive listening during clinical encounters, real-time documentation, real-time decision support.
- Regulatory clarity — the FDA continues to refine its framework for AI/ML in software-as-a-medical-device. Generative AI specifically remains an evolving area.
- Cost compression — inference costs continue to drop. Use cases that aren't economical in 2026 will be in 2027.
14. Frequently Asked Questions
How do you integrate an LLM with healthcare data?
Build a pipeline that extracts HL7 v2 or FHIR R4 data from source systems, transforms it into prompt-ready context (typically structured JSON or natural-language summaries), and routes it to a HIPAA-eligible LLM endpoint. Mirth Connect is commonly used as the integration engine. Use HIPAA-eligible LLM services like Azure OpenAI, AWS Bedrock with Claude, or Google Vertex AI with a signed BAA.
Is ChatGPT HIPAA compliant for healthcare data?
Standard ChatGPT (OpenAI's consumer product) is not HIPAA compliant and should not receive PHI. For HIPAA-compliant LLM use, choose Azure OpenAI Service with a signed BAA from Microsoft, AWS Bedrock with a BAA from Amazon, Google Vertex AI with a BAA from Google, or self-hosted open-source models like Llama running within your own HIPAA-compliant infrastructure.
Which LLM is best for healthcare?
It depends on the use case. For clinical reasoning over structured data, GPT-4-class models (via Azure OpenAI) and Claude (via AWS Bedrock) both perform strongly. For high-volume document summarization, models with longer context windows like Claude offer cost and quality advantages. For self-hosted scenarios, Llama-based fine-tuned medical models like Meditron offer HIPAA-compliant alternatives without third-party data exposure.
Can I send PHI to an LLM?
Only if you have a Business Associate Agreement with the LLM provider and the underlying infrastructure is HIPAA-eligible. Azure OpenAI, AWS Bedrock, and Google Vertex AI all offer BAAs. Without a BAA, sending PHI to an LLM is a HIPAA violation. De-identification is an alternative — strip PHI before sending to non-HIPAA models.
How do you convert HL7 to a format an LLM can use?
Three approaches work: (1) parse HL7 to structured JSON using an integration engine like Mirth Connect, then include the JSON in the prompt; (2) transform HL7 to FHIR R4 resources for cleaner semantic structure; (3) generate natural-language summaries from HL7 data and feed the summaries to the LLM. Approach 2 — HL7 to FHIR — is generally cleanest for production use.
What's the difference between Azure OpenAI and the public ChatGPT API?
Azure OpenAI runs the same OpenAI models inside Microsoft Azure with enterprise controls, regional data residency, and a signed BAA available for healthcare customers. The public OpenAI API and consumer ChatGPT do not offer a HIPAA BAA. For any deployment that touches PHI, Azure OpenAI is the supported path; the public OpenAI API is not.
Should I use HL7 v2 directly or convert to FHIR before sending to an LLM?
For anything beyond simple prototypes, convert to FHIR first. FHIR resources have semantic names (Patient, Observation) that LLMs reason over far more reliably than HL7 v2's positional pipe-delimited segments. LLMs have also seen much more FHIR than raw HL7 in training data, so output quality is meaningfully better.
How do I keep LLM costs under control in a healthcare integration?
Three levers. First, pre-summarize FHIR data into natural-language clinical summaries before sending — fewer tokens per request. Second, cache aggressively for repeated context (patient history). Third, route only events that need LLM reasoning; use deterministic rules for the rest. Monitor tokens per request and cost per use case from day one; healthcare LLM bills grow fast.
Can Mirth Connect call an LLM API directly?
Yes. Use a Mirth HTTP Sender destination pointed at the LLM endpoint (Azure OpenAI, AWS Bedrock, Google Vertex AI). The transformer assembles the prompt and request body; the response handler parses the LLM output and routes it to downstream systems. This pattern keeps the entire pipeline — HL7 ingestion, FHIR transformation, LLM call, downstream routing — inside one channel with built-in retry and audit logging.
What's the biggest mistake teams make integrating LLMs with healthcare data?
Sending PHI to non-BAA LLMs during development. Even with a synthetic-looking patient name, if the underlying data came from a real record, it's a HIPAA violation. Use synthetic data generators like Synthea for dev. The second-biggest mistake is putting raw HL7 v2 directly in prompts — output quality improves dramatically when you convert to FHIR or structured JSON first.
Next Steps
If you're building a healthcare LLM integration and want senior engineering support, we work on these architectures regularly. The HL7-to-FHIR transformation, the Mirth Connect pipeline, the HIPAA architecture review, and the prompt engineering for clinical data are all areas where experience saves significant time.
To estimate the cost of a healthcare LLM integration project, run our pricing calculator. For broader context, see the FHIR integration guide pillar.
Related Reading
- HL7 v2 to FHIR R4 Mapping Reference
- Mirth Connect on AWS Deployment Guide
- FHIR Bulk Data ($export) Implementation Guide
- HIPAA Compliance for Integration Engineers
- Mirth Connect Security and HIPAA Checklist
- Mirth Connect: The Complete Guide
- FHIR Integration: The Complete Guide
- HL7 Integration: The Complete Guide
- Healthcare Interoperability & Compliance Guide