HL7 v2 Message Structure Explained: MSH, Fields, Encoding

HL7 v2 is the most widely used clinical messaging format in US healthcare. Every hospital EHR emits it, every interface engine parses it, and every integration engineer eventually learns to read it the way a sysadmin reads a syslog line — at a glance, byte by byte. This article is the reference guide that teaches you to do that.

We will walk an ADT, ORU, and ORM message segment by segment, explain the exact role of each delimiter, cover the five encoding characters, talk about Z-segments (and why they are both useful and dangerous), break down the escape sequence syntax, explain how character sets work in MSH-18, and show how versioning via MSH-12 plays out in real production integrations.

By the end you should be able to read any HL7 v2 message, spot the common mistakes, and build new messages that pass validation against strict receivers. For the wider ecosystem, see HL7 v2 vs v3 vs FHIR and our healthcare interoperability guide.

1. The Shape of an HL7 v2 Message

An HL7 v2 message is a plain-text record. Each line is a segment, and each segment has a three-character name followed by fields separated by the pipe character (|). Segments end with a single carriage return (<CR>, hex 0x0D) — never LF, never CRLF.

MSH|^~\&|EPIC|UCHEALTH|MIRTH|TACTION|20260421093000||ADT^A01|MSG00001|P|2.5
EVN|A01|20260421093000
PID|1||123456^^^UCHEALTH^MR||DOE^JOHN^Q||19720101|M|||123 MAIN ST^^DENVER^CO^80202
PV1|1|I|2000^2012^01||||0010^ATTEND^ALBERT|||CAR||||ADM|A0

Five things to notice:

The first segment is always MSH.
Fields are separated by |.
Components inside a field are separated by ^.
Empty fields are legal and common — consecutive pipes (||) represent a null field.
The message-type (ADT^A01) determines which segments are expected and in what order.

Each message type has a formal structure specified in the HL7 v2 chapter for its domain — ADT in Chapter 3, Orders in Chapter 4, Observations in Chapter 7, and so on. Receivers validate the segment sequence against that spec.

2. The MSH Segment

MSH is unique in that its first field declares the field separator itself. The byte immediately after MSH is the field separator (|), and the next four characters are the encoding characters.

MSH|^~\&|EPIC|UCHEALTH|MIRTH|TACTION|20260421093000||ADT^A01|MSG00001|P|2.5|||AL|NE|USA|ASCII

Field-by-field (first ~18 fields):

MSH-1 Field Separator — |. Declared by position, not value.
MSH-2 Encoding Characters — ^~\\& (component, repetition, escape, subcomponent).
MSH-3 Sending Application — EPIC.
MSH-4 Sending Facility — UCHEALTH.
MSH-5 Receiving Application — MIRTH.
MSH-6 Receiving Facility — TACTION.
MSH-7 Date/Time of Message — 20260421093000.
MSH-8 Security — rarely populated.
MSH-9 Message Type — ADT^A01 (message code ^ trigger event).
MSH-10 Message Control ID — unique per message, echoed in MSA-2 of the ACK.
MSH-11 Processing ID — P (production), T (training), D (debug).
MSH-12 Version ID — 2.5.
MSH-13 Sequence Number — rarely used.
MSH-14 Continuation Pointer — rarely used.
MSH-15 Accept Ack Type — for enhanced mode: AL, NE, ER, SU.
MSH-16 Application Ack Type — for enhanced mode.
MSH-17 Country Code — e.g., USA.
MSH-18 Character Set — e.g., UNICODE UTF-8, 8859/1, ASCII.

MSH-19, MSH-20, and MSH-21 cover principal language of message, alternate character handling, and message profile identifiers — rarely populated outside specific regional or vendor conventions.

3. Encoding Characters (MSH-2)

MSH-2 declares the four encoding characters, in a specific order. The canonical (and universally used) value is ^~\\&:

^ (caret) — component separator inside a field.
~ (tilde) — repetition separator for a repeating field.
\\ (backslash) — escape character.
& (ampersand) — subcomponent separator inside a component.

The field separator | is declared implicitly by position (MSH-1). Together these five characters — | ^ ~ \\ & — are the only structural punctuation in HL7 v2.

Almost nobody changes these. If a vendor integration guide specifies different encoding characters, verify it is actually needed; nine times out of ten it is an accident.

4. Delimiters, Fields, Components, Subcomponents

The delimiter hierarchy matters because it tells the parser how to decompose a field into its parts.

Look at PID-5 (Patient Name):

PID|1||123456^^^UCHEALTH^MR||DOE^JOHN^Q^JR^DR^&SMITH&MAIDEN||19720101

Breaking PID-5 down:

PID-5 (field): DOE^JOHN^Q^JR^DR^&SMITH&MAIDEN
PID-5.1 Family Name: DOE
PID-5.2 Given Name: JOHN
PID-5.3 Middle Name: Q
PID-5.4 Suffix: JR
PID-5.5 Prefix: DR
PID-5.6 Degree (subcomponents): &SMITH&MAIDEN → three subcomponents separated by &.

The nesting goes exactly three levels deep: field → component → subcomponent. HL7 v2 does not support deeper nesting; if you need it, you are modelling the wrong thing.

5. Field Repetition

Some fields can hold multiple values separated by the repetition character ~. Phone numbers are the classic example:

PID|1||123456||DOE^JOHN||||||123 MAIN ST^^DENVER^CO^80202||(303)555-1234~(720)555-5678~john.doe@example.com

PID-13 (Phone Number — Home) contains three repetitions: a primary phone, a mobile, and an email. Each repetition has its own components (phone type, etc.) — so you have fields, repetitions within a field, components within a repetition, and subcomponents within a component.

Other commonly repeating fields:

PID-3 Patient Identifier List — multiple IDs per patient (MRN, SSN, driver's license, national health ID).
PID-11 Patient Address — home, work, billing addresses.
OBR-32 Principal Result Interpreter — multiple interpreters.
NK1 Next of Kin — usually repeated as multiple NK1 segments, but relationships within a single contact repeat in fields.

6. Segment Structure and Required vs Optional

Each HL7 v2 message-type specifies a sequence of segments, marked as required or optional, and with cardinality (zero-or-one, exactly-one, zero-or-more, one-or-more). The specification uses a bracket notation:

ADT^A01 (v2.5) structure:

MSH          — Message Header (required, 1)
[SFT]        — Software Segment (optional, 0..*)
EVN          — Event Type (required, 1)
PID          — Patient Identification (required, 1)
[PD1]        — Patient Additional Demographic (optional, 0..1)
[{ROL}]      — Role (optional, 0..*)
[{NK1}]      — Next of Kin (optional, 0..*)
PV1          — Patient Visit (required, 1)
[PV2]        — Patient Visit Additional (optional, 0..1)
[{ROL}]      — Role (optional, 0..*)
[{DB1}]      — Disability (optional, 0..*)
[{OBX}]      — Observation/Result (optional, 0..*)
[{AL1}]      — Allergy Information (optional, 0..*)
[{DG1}]      — Diagnosis (optional, 0..*)
[DRG]        — Diagnosis Related Group (optional, 0..1)
[{ {PR1} [{ROL}] }]  — Procedure group (optional, 0..*)
[{GT1}]      — Guarantor (optional, 0..*)
[{ {IN1} [IN2] [{IN3}] [{ROL}] }]  — Insurance group
[ACC]        — Accident (optional, 0..1)
[UB1]        — UB82 Data (optional, 0..1)
[UB2]        — UB92 Data (optional, 0..1)

Bracket conventions:

[X] — X is optional (0 or 1).
{X} — X can repeat (1 or more).
[{X}] — X is optional and can repeat (0 or more).
{ {A} [B] } — a repeating group where A is mandatory and B is optional per iteration.

7. Z-Segments

HL7 reserves segment names starting with Zfor local use. Any site or vendor can define Z-segments to carry information the standard segments don't cover. They are everywhere in production integrations.

PID|1||123456||DOE^JOHN
PV1|1|I|2000^2012^01
ZIN|1|EMPLOYER001^ACME CORP|PHIL-BLUECROSS|GROUPABC123
ZVI|1|HEMAT|20260421|STAT|NURSE123^SMITH^JANE

The good:

Z-segments let you carry site-specific data without bastardizing standard segments.
They are clearly marked as non-standard, so receivers can ignore them safely.
Vendors use them to expose richer metadata than v2 natively supports.

The bad:

No two implementations agree on what a given Z-segment means.
Strict parsers may reject unknown segments if not configured to tolerate them.
Downstream mappings become fragile if Z-segments carry business-critical information.

Rule of thumb: document every Z-segment in your interface spec. Never rely on Z-segment content across organizations without written agreement. For Mirth Connect tips on handling unknown segments, see our common HL7 integration errors post.

8. Escape Sequences for Special Characters

When a field value naturally contains one of the delimiter characters, you have to escape it. HL7 v2 escapes look like \\X\\, where X is a code:

\\F\\ — field separator (|).
\\S\\ — component separator (^).
\\T\\ — subcomponent separator (&).
\\R\\ — repetition separator (~).
\\E\\ — escape character (\\).
\\X0d\\ — hexadecimal byte insertion (carriage return in this example).
\\.br\\ — line break inside formatted text (e.g., inside an OBX-5).
\\H\\, \\N\\ — highlight start/end.

Example of escaping inside an OBX observation value:

OBX|1|TX|NOTE^Clinical Note||Patient states: \T\She refused the treatment\T\ per chart||||||F
OBX|2|TX|INTERP^Interpretation||Elevated WBC \F\ possible infection||||||F

Receivers un-escape during parsing. If you are reading a file and see \\F\\ where you expected text, run it through a proper HL7 parser rather than treating the raw string as final.

9. Character Set Handling (MSH-18)

Hospitals increasingly handle data in multiple languages — patient names with accented characters, address lines in Spanish or Chinese, free-text clinical notes in whatever language the clinician wrote in. MSH-18 declares which character set is in use.

Common values:

ASCII — 7-bit ASCII. Default if MSH-18 is empty.
8859/1 — ISO-8859-1 (Latin-1). Common historically.
UNICODE UTF-8 — modern recommended value. Use this for new integrations.
ISO IR14, ISO IR87, ISO IR159 — Japanese, among others.
GB 18030-2000 — Chinese.

Character-set mismatch between sender and receiver is one of the most frequent silent-corruption bugs in HL7 integration. Symptoms: patient names display with question marks or mojibake; free-text fields truncate on a non-ASCII byte. Fix by declaring the correct value in MSH-18 on both sides and validating round-trip.

In Mirth Connect, the channel encoding is set per connector. Match it to MSH-18 or the engine will silently re-encode.

10. Versioning (MSH-12)

MSH-12 declares the HL7 version the sender is using. Receivers must decide whether to accept the message; version mismatch is a common reason for AR responses.

Version cheat sheet:

2.1, 2.2 — deprecated; rarely seen in production.
2.3, 2.3.1 — widespread in legacy feeds, especially older labs and radiology.
2.4 — introduced enhanced acknowledgement mode.
2.5, 2.5.1 — most widely deployed US hospital EHR version in 2026.
2.6, 2.7, 2.8 — additions for newer workflows; adoption growing.
2.9 — current; small additions.

HL7 v2 is backward compatible since 2.3.1 — a 2.5 receiver generally parses a 2.3 message correctly, because later versions only add fields. Validation strictness varies; be tolerant on ingestion and strict on emission. If you must deal with v2 to v2 version mismatches, a canonical-model approach reduces pain.

11. Full-Message Examples

Worked examples of the four most common message types you will encounter in the first year of HL7 work.

11.1 ADT^A01 — Patient Admission

MSH|^~\&|EPIC|UCHEALTH|MIRTH|TACTION|20260421093000||ADT^A01|MSG00001|P|2.5
EVN|A01|20260421093000|||ADMIT123
PID|1||123456^^^UCHEALTH^MR||DOE^JOHN^Q||19720101|M|||123 MAIN ST^^DENVER^CO^80202||(303)555-1234|||M||ACCT123|123-45-6789
NK1|1|DOE^JANE|SPO|||(303)555-5678
PV1|1|I|2000^2012^01||||0010^ATTEND^ALBERT|||CAR||||ADM|A0|||||||||||||||||||||||||||20260421093000
DG1|1|I10|A09^Infectious gastroenteritis^I10|Infectious gastroenteritis|20260421|A
AL1|1||PENICILLIN|S|HIVES
IN1|1|BCBS|BCBS001|BLUE CROSS BLUE SHIELD

11.2 ORU^R01 — Lab Result

MSH|^~\&|LAB|HOSPITAL|EMR|HOSPITAL|20260421093100||ORU^R01|MSG00002|P|2.5
PID|1||123456^^^HOSP^MR||DOE^JOHN^Q||19720101|M
OBR|1|ORDER123|FILLER456|CBC^Complete Blood Count||20260421090000|20260421092500
OBX|1|NM|WBC^White Blood Cell Count^LN||7.2|10*9/L|4.0-11.0|N|||F
OBX|2|NM|RBC^Red Blood Cell Count^LN||4.7|10*12/L|4.2-5.9|N|||F
OBX|3|NM|HGB^Hemoglobin^LN||14.1|g/dL|13.0-17.0|N|||F
OBX|4|NM|HCT^Hematocrit^LN||42.3|%|40.0-52.0|N|||F
OBX|5|NM|PLT^Platelet Count^LN||250|10*9/L|150-400|N|||F

11.3 ORM^O01 — Order Entry

MSH|^~\&|EMR|HOSPITAL|LIS|HOSPITAL|20260421093200||ORM^O01|MSG00003|P|2.5
PID|1||123456^^^HOSP^MR||DOE^JOHN^Q||19720101|M
PV1|1|I|2000^2012^01
ORC|NW|ORDER123||||||||20260421093200|||0010^ATTEND^ALBERT
OBR|1|ORDER123||CBC^Complete Blood Count^LN|||20260421093500||||N|||||0010^ATTEND^ALBERT

11.4 SIU^S12 — Appointment Scheduled

MSH|^~\&|SCHED|HOSPITAL|EMR|HOSPITAL|20260421093300||SIU^S12|MSG00004|P|2.5
SCH|APPT789|APPT789||||APPT|FOLLOW-UP||30|MIN|^^30^20260425100000^20260425103000
PID|1||123456^^^HOSP^MR||DOE^JOHN^Q||19720101|M
RGS|1|A
AIS|1|A|CBC^Complete Blood Count^LN
AIP|1|A|0010^ATTEND^ALBERT|D^Attending Doctor|20260425100000|30|MIN

For the message-type-by-message-type deep dives, see ADT messages reference, ORM/ORU lab workflow, SIU scheduling, MDM documents, DFT billing, and VXU immunization. For testing tools and sample corpora, see HL7 testing tools & sample messages.

12. Frequently Asked Questions

What does MSH stand for in HL7?

MSH stands for Message Header. It is the first segment in every HL7 v2 message and carries the metadata a receiver needs to route, version, and acknowledge the message — sender, receiver, timestamp, message type, control ID, processing ID, and version.

What are the standard HL7 delimiters?

HL7 v2 uses five encoding characters declared in MSH-1 and MSH-2: | (field), ^ (component), & (subcomponent), ~ (repetition), and \ (escape). The canonical MSH-2 value is ^~\& exactly, in that order.

Can I use different delimiters?

Technically yes — MSH-1 and MSH-2 declare which characters are in use — but no real-world receiver expects non-standard delimiters. Always use | ^ ~ \ & unless a specific vendor integration guide explicitly directs otherwise.

What is a Z-segment?

A Z-segment is a local-use segment whose name starts with Z (ZIN, ZPV, ZMA, etc.). HL7 reserves the Z prefix for site-specific extensions; no two implementations agree on what a given Z-segment means, so they require per-interface documentation.

How do I include a pipe character in a field value?

Use the escape sequence \F\ in place of the pipe. Similarly \S\ for ^, \T\ for &, \R\ for ~, and \E\ for \. The receiver un-escapes during parsing.

What character set does HL7 v2 use?

Declared in MSH-18. Common values are ASCII, 8859/1 (Latin-1), UTF-8, and various ISO-2022 variants for Asian languages. If MSH-18 is empty, receivers typically assume ASCII or the engine default. Always set MSH-18 explicitly for interfaces carrying non-ASCII data.

What version of HL7 v2 should I use?

2.5.1 is the most widely deployed today in US hospitals; 2.3.1 is common in legacy feeds; 2.6, 2.7, and 2.8 add fields but change little fundamentally. Match the version your peer system expects — see MSH-12.

Can an HL7 v2 message have multiple instances of a segment?

Yes. Segments like OBX (observation), NK1 (next of kin), DG1 (diagnosis), and IN1 (insurance) frequently repeat. The order matters — sequences of segments form logical groups defined by the message-type specification.

How do I handle line endings inside an HL7 message?

HL7 v2 segments end with a single carriage return (CR, 0x0D). Never use LF or CRLF — strict parsers will reject them. Binary-safe tooling is essential when editing or generating HL7.

What is MSH-10 used for?

MSH-10 is the Message Control ID — a sender-generated unique identifier (UUID, sequential, or timestamp-based). The receiver echoes it in MSA-2 of the ACK. Duplicate MSH-10 values let receivers detect retries and deduplicate.

HL7 v2 Message Structure Explained:
A Deep Anatomy Guide

Table of Contents

1. The Shape of an HL7 v2 Message

2. The MSH Segment

3. Encoding Characters (MSH-2)

4. Delimiters, Fields, Components, Subcomponents

5. Field Repetition

6. Segment Structure and Required vs Optional

7. Z-Segments

8. Escape Sequences for Special Characters

9. Character Set Handling (MSH-18)

10. Versioning (MSH-12)

11. Full-Message Examples

11.1 ADT^A01 — Patient Admission

11.2 ORU^R01 — Lab Result

11.3 ORM^O01 — Order Entry

11.4 SIU^S12 — Appointment Scheduled

12. Frequently Asked Questions

What does MSH stand for in HL7?

What are the standard HL7 delimiters?

Can I use different delimiters?

What is a Z-segment?

How do I include a pipe character in a field value?

What character set does HL7 v2 use?

What version of HL7 v2 should I use?

Can an HL7 v2 message have multiple instances of a segment?

How do I handle line endings inside an HL7 message?

What is MSH-10 used for?

Related Reading

Need help with HL7 v2 parsing or mapping?

Tell us about your HL7 v2 project

HL7 v2 Message Structure Explained:A Deep Anatomy Guide

Table of Contents

1. The Shape of an HL7 v2 Message

2. The MSH Segment

3. Encoding Characters (MSH-2)

4. Delimiters, Fields, Components, Subcomponents

5. Field Repetition

6. Segment Structure and Required vs Optional

7. Z-Segments

8. Escape Sequences for Special Characters

9. Character Set Handling (MSH-18)

10. Versioning (MSH-12)

11. Full-Message Examples

11.1 ADT^A01 — Patient Admission

11.2 ORU^R01 — Lab Result

11.3 ORM^O01 — Order Entry

11.4 SIU^S12 — Appointment Scheduled

12. Frequently Asked Questions

What does MSH stand for in HL7?

What are the standard HL7 delimiters?

Can I use different delimiters?

What is a Z-segment?

How do I include a pipe character in a field value?

What character set does HL7 v2 use?

What version of HL7 v2 should I use?

Can an HL7 v2 message have multiple instances of a segment?

How do I handle line endings inside an HL7 message?

What is MSH-10 used for?

Related Reading

Need help with HL7 v2 parsing or mapping?

Tell us about your HL7 v2 project

HL7 v2 Message Structure Explained:
A Deep Anatomy Guide