US Healthcare ETL/ELT Architecture
The Shift from Batch to Streaming Ingestion
Historically, US healthcare data ingestion has relied heavily on nightly batch processing. Electronic Health Record (EHR) systems like Epic or Cerner would export massive flat files or relational dumps at midnight, which were then processed by traditional ETL (Extract, Transform, Load) tools by morning.
Today, the demand for real-time clinical decision support and rapid claims processing is driving a shift toward streaming architecture. While batch processing remains relevant for historical analytics and monthly reporting, streaming ingestion using event-driven architectures is becoming the gold standard for operational healthcare data.
HL7 v2 Ingestion Pipelines
HL7 v2 remains the workhorse of hospital interoperability. An effective HL7 ingestion pipeline must handle the chaotic nature of legacy HL7 messages. A modern architecture typically involves:
- Mirth Connect or similar integration engine acting as the secure TCP listener.
- Message queues (e.g., Kafka or AWS SQS) to buffer incoming ADT, ORU, and SIU messages.
- Serverless parsers that convert the pipe-delimited HL7 format into a structured JSON payload.
Instead of transforming the data immediately (ETL), the modern approach is ELT (Extract, Load, Transform). Raw HL7 messages are dumped directly into a data lake (e.g., S3 or Azure Data Lake), preserving the original message for auditability and replayability.
The Transition to FHIR-based Data Lakes
Fast Healthcare Interoperability Resources (FHIR) is revolutionizing the data lake. By storing data natively in FHIR JSON format, healthcare organizations can achieve a unified schema that is immediately compatible with modern SMART on FHIR applications and third-party APIs.
Building a FHIR-based data lake involves using services like AWS HealthLake or Google Cloud Healthcare API to natively ingest, validate, and index FHIR resources. This allows data engineers to run complex SQL queries over unstructured clinical notes and structured lab results simultaneously, unlocking unprecedented analytical power.