Queue-Based Stage Decoupling¶

Architecture for a high-throughput message processing system with decoupled stages, queue-based routing, and real-time push delivery.

Problem¶

Design a system that can:

Accept messages from any authenticated client.
Route messages through processing stages without blocking on I/O.
Deliver messages in real time to subscribed clients.
Scale horizontally at each stage.

Approach¶

The design breaks the system into five decoupled stages connected by queues. No stage blocks on another — each reads from an input queue, does its work, and writes to an output queue.

1. Client Registration¶

Clients register via an API endpoint with a listen-to payload listing the usernames, groups, or topics they want to subscribe to. Registration creates a dedicated delivery queue for that client in RabbitMQ or Kafka.

2. Router (First Hop)¶

A software load balancer sits in front of the message endpoint. It inspects the message type or payload and routes to the appropriate processing queue. This is the first hop — its only job is classification and forwarding.

3. Destination Queues¶

Messages land in topic- or group-based queues. These queues are stateless from the consumer's perspective — any processing server can pick up work from any queue. This decoupling is what makes the system horizontally scalable.

4. Processing Servers¶

Several types of servers consume from queues:

Routing server — determines which client delivery queues a message should enter.
Persistence server — writes messages to durable storage.
Analytics server — computes metrics, feeds data into ML models for anomaly detection or content classification.
Push server — maintains open connections to clients for real-time delivery.

Each server type scales independently based on its throughput needs.

5. Push Delivery¶

For real-time delivery, the push server maintains a persistent connection per client using one of:

Long polling — server holds the HTTP request open until data is available, then responds and the client immediately reconnects.
HTTP persistent connections — reuse an open connection for multiple messages.
WebSockets — full-duplex channel (best for high-frequency bidirectional communication).

Long polling is the most broadly compatible. WebSockets are preferable when the client needs continuous streaming — for example, a dashboard rendering live inference results.

Message Format¶

{
  "client_id": "sender-123",
  "type": "notification",
  "payload": {
    "key": "value"
  }
}

Simple JSON key-value pairs. Schema validation happens at the router stage before messages enter processing queues.

Takeaway¶

Queue-based stage decoupling is the go-to pattern for bursty, heterogeneous workloads. Chat platforms, inference pipelines, log processing -- any system where different stages have different throughput needs ends up looking like this.

Back to Microservices & APIs