Healthcare LLM Integration Guide | HL7 & FHIR to GPT, Claude

Quick answer: to integrate an LLM with healthcare data, build a pipeline that (1) extracts HL7 v2 or FHIR R4 data from clinical systems, (2) transforms it into prompt-ready structured JSON or natural-language summaries, and (3) routes it to a HIPAA-eligible LLM endpoint with a signed BAA. Mirth Connect handles steps 1 and 2. Azure OpenAI, AWS Bedrock, and Google Vertex AI handle step 3. Never send PHI to a non-BAA-covered LLM.

This guide pairs with our HL7 v2 to FHIR R4 mapping reference, our Mirth Connect on AWS deployment guide, and the broader Mirth Connect complete guide. If you'd rather hand this off, our Mirth Connect support team works on these architectures regularly.

1. Why This Matters Now

LLMs have moved from interesting demos to production tools in healthcare. Clinical summarization, prior authorization automation, ambient documentation, decision support, and chart abstraction are all being shipped at major health systems and startups.

The integration question that nobody documented well: how do you actually get HL7 v2 and FHIR R4 data into an LLM, and how do you do it without violating HIPAA?This guide answers both. It assumes you're a technical reader — engineer, architect, or product lead — building something real.

2. The Three-Layer Architecture

Every healthcare-LLM integration we've built follows this structure.

Layer 1 — Data ingestion

Clinical data arrives from EHRs and other source systems. Typically HL7 v2 (real-time event stream) and/or FHIR R4 (request-response or bulk export). This is where Mirth Connect or another integration engine sits.

Layer 2 — Transformation and context preparation

Raw HL7/FHIR is converted into the format the LLM needs — usually structured JSON or natural-language summaries. PHI gating, de-identification, and prompt template assembly happen here.

Layer 3 — LLM inference

The prepared context is sent to an LLM endpoint. The model returns its response — summary, classification, extracted data, recommendation. Results flow back to your application.

Each layer has independent HIPAA, security, and operational considerations. The integration engine pattern keeps them separated cleanly.

3. HIPAA Compliance for LLMs — What's Actually Required

Before any architecture decisions, get this part right.

Sending PHI to an LLM requires a Business Associate Agreement (BAA) with the LLM provider. This is non-negotiable under HIPAA. Without a BAA, sending PHI to a model is a regulatory violation regardless of how secure the connection is.

3.1 HIPAA-eligible LLM platforms (with BAA available)

Platform	Models available	BAA from
Azure OpenAI Service	GPT-4, GPT-4o, GPT-3.5	Microsoft
AWS Bedrock	Claude, Llama, Titan, others	Amazon
Google Vertex AI	Gemini, PaLM, Med-PaLM	Google
Self-hosted on HIPAA-eligible infra	Llama, Mistral, Mixtral, Meditron	Your own infrastructure

3.2 Not HIPAA-eligible (do not send PHI)

Standard ChatGPT (chat.openai.com)
Anthropic's direct API without enterprise BAA
Google Gemini consumer product
Any free-tier or consumer-facing LLM endpoint

3.3 The de-identification alternative

If you want to use a non-BAA model, you must remove PHI before sending data. HIPAA Safe Harbor de-identification requires removing 18 specific identifier types (names, dates more specific than year, addresses, phone numbers, MRNs, etc.). Done properly, de-identified data is no longer PHI and can flow to any LLM. Done improperly, you've still violated HIPAA. Investment in a robust de-identification pipeline is significant. For the broader compliance baseline see our HIPAA compliance for integration engineers primer.

4. Approach 1 — HL7 v2 Directly to LLM

Sometimes the simplest pattern wins. Parse the HL7 message into structured JSON, include it in the prompt, ask the LLM to do something with it.

4.1 When this works well

Quick prototypes and proofs of concept
Single-message context (one ADT, one ORU)
LLMs with strong structured-data parsing (GPT-4, Claude)
Use cases where preserving raw HL7 detail matters

4.2 Pipeline shape

HL7 v2 Message
    ↓
Mirth Connect channel
    ↓
JavaScript/Groovy transformer → structured JSON
    ↓
HTTP Sender destination → LLM endpoint
    ↓
Response handler → downstream system

4.3 Example transformer (Mirth, JavaScript)

// Parse HL7 v2 message into structured JSON
var pid = msg['PID'];
var patient = {
    mrn: pid['PID.3']['PID.3.1'].toString(),
    name: {
        family: pid['PID.5']['PID.5.1'].toString(),
        given: pid['PID.5']['PID.5.2'].toString()
    },
    dob: pid['PID.7']['PID.7.1'].toString(),
    gender: pid['PID.8'].toString()
};

var obx_results = [];
for (var i = 0; i < msg['OBX'].length(); i++) {
    var obx = msg['OBX'][i];
    obx_results.push({
        observation: obx['OBX.3']['OBX.3.2'].toString(),
        value: obx['OBX.5'].toString(),
        units: obx['OBX.6'].toString(),
        abnormal_flag: obx['OBX.8'].toString()
    });
}

var llm_context = {
    patient: patient,
    observations: obx_results
};

channelMap.put('llm_context', JSON.stringify(llm_context));

4.4 Prompt template

You are a clinical informatics assistant. The following lab results
were just received for a patient:

{{llm_context}}

Identify any results that are clinically significant and warrant
attention. Return your response as JSON with this structure:
{
  "significant_findings": [...],
  "recommended_actions": [...],
  "confidence": "high|medium|low"
}

Limitations:HL7 v2's positional structure is harder for LLMs to reason over than FHIR's semantic structure. For anything beyond simple use cases, Approach 2 is cleaner.

5. Approach 2 — HL7 to FHIR, Then FHIR to LLM (Recommended)

This is the pattern we use most in production. Convert HL7 v2 to FHIR R4 first, then send FHIR resources to the LLM.

5.1 Why FHIR works better than raw HL7 for LLMs

FHIR resources have semantic names (Patient, Observation) instead of cryptic segments (PID, OBX)
Resource references are explicit — LLMs reason about them naturally
JSON structure is well-formed and consistent
LLMs have been trained on far more FHIR examples than raw HL7

5.2 Pipeline shape

HL7 v2 Message
    ↓
Mirth Connect channel — HL7 listener
    ↓
HL7 to FHIR R4 transformer
    ↓
FHIR Bundle stored in clinical data store
    ↓
LLM service layer queries FHIR data and constructs prompts
    ↓
LLM inference
    ↓
Response handling and downstream actions

The decoupling matters. Your HL7-to-FHIR transformation is reusable for non-LLM use cases. Your LLM service layer can evolve independently. Your FHIR data is queryable for purposes beyond LLM context. For the transformation step itself, see our HL7 v2 to FHIR R4 mapping reference.

5.3 Prompt template

You are a clinical informatics assistant. The following FHIR Bundle
contains a patient's recent clinical events:

{{fhir_bundle}}

Summarize the clinical situation in 3-5 sentences. Identify any
conditions, medications, or observations that require follow-up.
Format your response as structured JSON matching the schema in the
system prompt.

6. Approach 3 — Pre-Summarize, Then Send the Summary

For high-volume use cases where token cost or latency matters, generate a natural-language summary from FHIR data first, then feed the summary (not the raw data) to the LLM.

6.1 When this works well

High-volume scenarios where tokens cost real money
Use cases where the LLM only needs a clinical narrative, not exact values
Latency-sensitive applications
Multi-turn conversations where context window matters

6.2 Pipeline shape

FHIR Bundle
    ↓
Template-based summarization (rule-based, deterministic)
    ↓
Natural-language clinical summary (1-2 paragraphs)
    ↓
LLM inference with summary as context

6.3 Example summary template output

Patient is a 67-year-old male with a history of Type 2 diabetes
mellitus and hypertension. On 2026-05-10, he was admitted via the
emergency department with chief complaint of chest pain. Vital signs
on admission: BP 168/94, HR 96, SpO2 97%. Recent labs include troponin
0.04 ng/mL (normal), BNP 124 pg/mL (elevated), and HbA1c 8.2%
(elevated). Current medications include metformin 1000mg BID,
lisinopril 20mg daily, and atorvastatin 40mg nightly.

This summary is far cheaper to feed into an LLM than the underlying 12-resource FHIR Bundle, and for many use cases (clinical question-answering, decision support reasoning), it's sufficient context.

7. Common Production Use Cases

These are the LLM-in-healthcare patterns we see in real deployments.

Clinical document summarization. Feed FHIR DocumentReference content (or extracted note text) to an LLM with a summarization prompt. Output goes to a clinician dashboard or EHR sidebar.
Lab result interpretation. Feed Observation resources for new lab results, return clinical significance commentary for clinician review. Generally framed as decision support, not autonomous action.
Ambient documentation. Audio capture during a patient encounter is transcribed, then summarized into a structured clinical note. The note is written back to the EHR as a FHIR DocumentReference. One of the most commercially successful healthcare LLM patterns to date.
Prior authorization automation. LLM reads patient FHIR data plus payer requirements, generates the prior auth submission packet. Human-in-the-loop reviews before submission.
Patient-facing Q&A. Patient asks a question via a portal or chatbot. The LLM has access to the patient's own FHIR data (via SMART on FHIR) and answers in plain language. Requires careful guardrails — patient-facing clinical answers carry liability.
Quality measure abstraction. LLM reads structured and unstructured patient data to determine quality measure compliance. Reduces manual chart abstraction work.
Coding assistance. LLM reads encounter notes and proposes ICD-10 / CPT codes. Human coder reviews and accepts.

8. Architecture Patterns

8.1 Pattern A — Synchronous request-response

Best for interactive use cases (clinical Q&A, real-time decision support).

User action / event
    ↓
Application server gathers context (FHIR API or local store)
    ↓
LLM inference (typically 1-10 seconds)
    ↓
Response rendered to user

Trade-offs: Latency-bound. Token cost per request. Caching strategies help.

8.2 Pattern B — Asynchronous batch processing

Best for high-volume back-office automation (chart abstraction, coding assistance, summarization).

Trigger event (new encounter completed, scheduled batch job)
    ↓
Queue task with patient context reference
    ↓
Worker pool processes tasks
    ↓
Worker fetches FHIR data, calls LLM, writes results
    ↓
Downstream consumers read results

Trade-offs: Higher throughput, lower cost per task, but not interactive.

8.3 Pattern C — Real-time event-driven (HL7 + LLM)

Best for ambient clinical use cases where events trigger LLM reasoning.

HL7 v2 event (ADT, ORU, etc.)
    ↓
Mirth Connect routes event
    ↓
Transform to FHIR + enrich with patient history
    ↓
LLM inference
    ↓
Result published to downstream systems

Trade-offs: Bridges legacy HL7-based systems with modern LLM workflows. We use this pattern frequently.

9. Security and Operational Considerations

Beyond HIPAA compliance, healthcare LLM integrations have specific operational concerns.

Prompt injection. If patient-supplied data (e.g., patient-entered text) reaches an LLM, malicious or accidental prompts can break the system. Validate and sanitize any free-text input. Treat user-supplied content as untrusted.
Hallucination management. LLMs make things up confidently. For any output that informs clinical decisions, design for verification — the LLM is a suggestion engine, not the decision maker. Always log inputs and outputs for retrospective review.
Cost monitoring. Token costs at scale add up fast. Monitor tokens per request, requests per day, and cost per use case. Set alerts on cost anomalies. Healthcare LLM bills can grow from negligible to material in weeks.
Latency budgets. Synchronous LLM calls add 1-10 seconds. Plan for this in user-facing flows. Cache aggressively. Consider Pattern B (async) when sync isn't required.
Model versioning. LLM providers update models. Output behavior changes. Pin model versions where possible (gpt-4-0613, not gpt-4). Regression-test when versions change.
Audit logging. Log every LLM request and response with timestamp, user (if applicable), patient context, and full prompt/completion. Required for HIPAA accounting and clinical incident review.
Rate limits. All managed LLM platforms have rate limits. Plan for them. Use queuing and backoff strategies.

10. How Mirth Connect Fits Into LLM Pipelines

Mirth Connect is well-suited as the integration engine in healthcare LLM architectures because it handles the messy parts.

HL7 v2 parsing and transformation
HL7 to FHIR conversion (with custom transformers or off-the-shelf channel templates)
HTTP/HTTPS integration with cloud LLM endpoints
Reliable message queuing and retry
Audit logging
Error handling

A typical Mirth Connect channel for an LLM pipeline:

Source: HL7 v2 MLLP listener, FHIR API poller, or scheduled database query.
Filter: Determine which messages need LLM processing (not all do).
Transformer: Parse HL7, convert to FHIR or structured JSON, enrich with patient context, assemble the prompt.
Destination: HTTP Sender to Azure OpenAI / AWS Bedrock / Google Vertex AI endpoint.
Response handler: Parse LLM response, route to downstream systems (EHR, dashboard, queue).

For setup guidance on Mirth itself, see our Mirth Connect on AWS deployment guide and the Mirth Connect guide pillar.

11. Common Mistakes in Healthcare LLM Integration

Mistake 1 — Sending PHI to non-BAA LLMs during development

Even “just for testing” with a synthetic-looking patient name is a HIPAA violation if the data came from real records. Use synthetic data generators (Synthea, Mirth-bundled synthetic samples) for dev.

Mistake 2 — No de-identification when one is required

If your architecture depends on a non-BAA model, your de-identification pipeline is part of the critical path. It must be auditable and reliably remove all 18 HIPAA identifier types.

Mistake 3 — Putting raw HL7 v2 in prompts

HL7 v2's pipe-delimited positional format is hard for LLMs to reason over. Convert to FHIR or structured JSON first. Output quality jumps dramatically.

Mistake 4 — No human-in-the-loop for clinical outputs

LLMs are wrong sometimes. Outputs that influence clinical decisions must be reviewed by a clinician before action. Architecture should enforce this.

Mistake 5 — Ignoring token cost at scale

A prompt that costs 2 cents in development costs $2,000 per day at production volume. Cost-model your use case before deploying.

Mistake 6 — Pinning to model versions you'll later regret

Or, conversely, never pinning and being surprised when behavior changes. Both are problems. Choose a strategy: pin and test on version upgrades, or run continuously against the latest and accept variability.

Mistake 7 — Using the LLM for things rule-based logic can handle

LLMs are expensive and probabilistic. Some healthcare use cases (e.g., “flag any troponin > X”) should be deterministic rules, not LLM calls.

12. Vendor Selection — Quick Reference

For most healthcare organizations choosing an LLM platform in 2026, the practical decision is between three managed options and self-hosting.

Need	Recommended choice
Already on Azure, want OpenAI models	Azure OpenAI Service
Already on AWS, want Claude or Llama	AWS Bedrock
Already on GCP, want Gemini or Med-PaLM	Google Vertex AI
Want full data sovereignty / no third-party LLM exposure	Self-hosted Llama or Mistral on HIPAA-eligible infrastructure
Need lowest cost at high volume	AWS Bedrock with Llama, or self-hosted
Need highest reasoning quality	GPT-4-class via Azure OpenAI, or Claude via AWS Bedrock
Building patient-facing chat	Whichever has best latency and BAA in your region

Sign the BAA before any PHI flows. Validate the BAA covers the specific service variants you'll use (some BAAs cover only specific Azure regions or AWS account types).

13. What's Next in Healthcare LLMs

Trends worth tracking through 2026 and beyond:

Multimodal models — combining text with medical imaging (radiology, pathology, dermatology). Already in clinical pilots.
Smaller specialized models — fine-tuned medical models with smaller footprints, suitable for self-hosting and lower-cost inference.
Native FHIR understanding — models trained specifically on FHIR semantics rather than treating FHIR as generic JSON.
Real-time ambient AI — passive listening during clinical encounters, real-time documentation, real-time decision support.
Regulatory clarity — the FDA continues to refine its framework for AI/ML in software-as-a-medical-device. Generative AI specifically remains an evolving area.
Cost compression — inference costs continue to drop. Use cases that aren't economical in 2026 will be in 2027.

14. Frequently Asked Questions

How do you integrate an LLM with healthcare data?

Build a pipeline that extracts HL7 v2 or FHIR R4 data from source systems, transforms it into prompt-ready context (typically structured JSON or natural-language summaries), and routes it to a HIPAA-eligible LLM endpoint. Mirth Connect is commonly used as the integration engine. Use HIPAA-eligible LLM services like Azure OpenAI, AWS Bedrock with Claude, or Google Vertex AI with a signed BAA.

Is ChatGPT HIPAA compliant for healthcare data?

Standard ChatGPT (OpenAI's consumer product) is not HIPAA compliant and should not receive PHI. For HIPAA-compliant LLM use, choose Azure OpenAI Service with a signed BAA from Microsoft, AWS Bedrock with a BAA from Amazon, Google Vertex AI with a BAA from Google, or self-hosted open-source models like Llama running within your own HIPAA-compliant infrastructure.

Which LLM is best for healthcare?

It depends on the use case. For clinical reasoning over structured data, GPT-4-class models (via Azure OpenAI) and Claude (via AWS Bedrock) both perform strongly. For high-volume document summarization, models with longer context windows like Claude offer cost and quality advantages. For self-hosted scenarios, Llama-based fine-tuned medical models like Meditron offer HIPAA-compliant alternatives without third-party data exposure.

Can I send PHI to an LLM?

Only if you have a Business Associate Agreement with the LLM provider and the underlying infrastructure is HIPAA-eligible. Azure OpenAI, AWS Bedrock, and Google Vertex AI all offer BAAs. Without a BAA, sending PHI to an LLM is a HIPAA violation. De-identification is an alternative — strip PHI before sending to non-HIPAA models.

How do you convert HL7 to a format an LLM can use?

Three approaches work: (1) parse HL7 to structured JSON using an integration engine like Mirth Connect, then include the JSON in the prompt; (2) transform HL7 to FHIR R4 resources for cleaner semantic structure; (3) generate natural-language summaries from HL7 data and feed the summaries to the LLM. Approach 2 — HL7 to FHIR — is generally cleanest for production use.

What's the difference between Azure OpenAI and the public ChatGPT API?

Azure OpenAI runs the same OpenAI models inside Microsoft Azure with enterprise controls, regional data residency, and a signed BAA available for healthcare customers. The public OpenAI API and consumer ChatGPT do not offer a HIPAA BAA. For any deployment that touches PHI, Azure OpenAI is the supported path; the public OpenAI API is not.

Should I use HL7 v2 directly or convert to FHIR before sending to an LLM?

For anything beyond simple prototypes, convert to FHIR first. FHIR resources have semantic names (Patient, Observation) that LLMs reason over far more reliably than HL7 v2's positional pipe-delimited segments. LLMs have also seen much more FHIR than raw HL7 in training data, so output quality is meaningfully better.

How do I keep LLM costs under control in a healthcare integration?

Three levers. First, pre-summarize FHIR data into natural-language clinical summaries before sending — fewer tokens per request. Second, cache aggressively for repeated context (patient history). Third, route only events that need LLM reasoning; use deterministic rules for the rest. Monitor tokens per request and cost per use case from day one; healthcare LLM bills grow fast.

Can Mirth Connect call an LLM API directly?

Yes. Use a Mirth HTTP Sender destination pointed at the LLM endpoint (Azure OpenAI, AWS Bedrock, Google Vertex AI). The transformer assembles the prompt and request body; the response handler parses the LLM output and routes it to downstream systems. This pattern keeps the entire pipeline — HL7 ingestion, FHIR transformation, LLM call, downstream routing — inside one channel with built-in retry and audit logging.

What's the biggest mistake teams make integrating LLMs with healthcare data?

Sending PHI to non-BAA LLMs during development. Even with a synthetic-looking patient name, if the underlying data came from a real record, it's a HIPAA violation. Use synthetic data generators like Synthea for dev. The second-biggest mistake is putting raw HL7 v2 directly in prompts — output quality improves dramatically when you convert to FHIR or structured JSON first.

Next Steps

If you're building a healthcare LLM integration and want senior engineering support, we work on these architectures regularly. The HL7-to-FHIR transformation, the Mirth Connect pipeline, the HIPAA architecture review, and the prompt engineering for clinical data are all areas where experience saves significant time.

To estimate the cost of a healthcare LLM integration project, run our pricing calculator. For broader context, see the FHIR integration guide pillar.

Healthcare LLM Integration —Piping HL7 and FHIR Data to LLMs

Table of Contents