Queue-Based Stage Decoupling¶
Architecture for a high-throughput message processing system with decoupled stages, queue-based routing, and real-time push delivery.
Problem¶
Design a system that can:
- Accept messages from any authenticated client.
- Route messages through processing stages without blocking on I/O.
- Deliver messages in real time to subscribed clients.
- Scale horizontally at each stage.
Approach¶
The design breaks the system into five decoupled stages connected by queues. No stage blocks on another — each reads from an input queue, does its work, and writes to an output queue.
1. Client Registration¶
Clients register via an API endpoint with a listen-to payload listing the usernames, groups, or topics they want to subscribe to. Registration creates a dedicated delivery queue for that client in RabbitMQ or Kafka.
2. Router (First Hop)¶
A software load balancer sits in front of the message endpoint. It inspects the message type or payload and routes to the appropriate processing queue. This is the first hop — its only job is classification and forwarding.
3. Destination Queues¶
Messages land in topic- or group-based queues. These queues are stateless from the consumer's perspective — any processing server can pick up work from any queue. This decoupling is what makes the system horizontally scalable.
4. Processing Servers¶
Several types of servers consume from queues:
- Routing server — determines which client delivery queues a message should enter.
- Persistence server — writes messages to durable storage.
- Analytics server — computes metrics, feeds data into ML models for anomaly detection or content classification.
- Push server — maintains open connections to clients for real-time delivery.
Each server type scales independently based on its throughput needs.
5. Push Delivery¶
For real-time delivery, the push server maintains a persistent connection per client using one of:
- Long polling — server holds the HTTP request open until data is available, then responds and the client immediately reconnects.
- HTTP persistent connections — reuse an open connection for multiple messages.
- WebSockets — full-duplex channel (best for high-frequency bidirectional communication).
Long polling is the most broadly compatible. WebSockets are preferable when the client needs continuous streaming — for example, a dashboard rendering live inference results.
Message Format¶
Simple JSON key-value pairs. Schema validation happens at the router stage before messages enter processing queues.
Takeaway¶
Queue-based stage decoupling is the go-to pattern for bursty, heterogeneous workloads. Chat platforms, inference pipelines, log processing -- any system where different stages have different throughput needs ends up looking like this.