Message Queues & Event-Driven Architecture
Chapter 1: Foundations - Breaking the Tight Coupling
When Service A calls Service B directly (HTTP), they are "tightly coupled". If Service B is slow, Service A is slow. If Service B is down, Service A fails.
Messaging decouples them. Service A puts a message in a box and walks away. Service B picks it up later.
📞 Real World: Phone Call vs SMSSynchronous (Phone Call): "Hey, can you fix this?" You have to stay on the line until they answer. If they are busy, you wait.
Asynchronous (SMS): "Hey, can you fix this?" You send it and go back to work. They read it when they are free.
Chapter 2: The Two Main Patterns
2.1 Message Queues (Point-to-Point)
Analogy: A Bank Teller Line.
- There is one line (Queue).
- There are 5 tellers (Consumers).
- If 100 people join the line, each person is served by ONLY ONE teller.
- Use Case: "Process this payment", "Resize this image". (Work Distribution)
2.2 Pub/Sub (Publish / Subscribe)
Analogy: A Radio Station.
- The DJ speaks (Publisher).
- 10,000 people are listening (Subscribers).
- Everyone hears the same message at the same time.
- Use Case: "User Signed Up" -> (Email Service sends welcome email) AND (Analytics Service records sign up).
Chapter 3: Delivery Guarantees - The Promises
Networks are unreliable. Messaging systems offer different promises about delivery.
3.1 At-Most-Once ("Fire and Forget")
The system tries to send the message. If it fails, it gives up.
- Pros: Extremely fast.
- Cons: You might lose data.
- Use Case: IoT sensor data (if you miss one temperature reading, it's fine).
3.2 At-Least-Once (The Standard)
The system keeps trying until the receiver confirms ("Acks") receipt.
- Pros: You never lose data.
- Cons: You might receive the same message twice. (e.g., Worker A finishes the job but crashes before sending the receipt. The Queue thinks it failed and sends it to Worker B).
- Use Case: Payments, Orders, critical data.
3.3 Exactly-Once (The Holy Grail)
Each message is delivered exactly one time. Very hard (and expensive) to achieve in distributed systems. Usually, we fake this using Idempotency.
Chapter 4: Handling Failures - The Poison Message
What happens if a message contains bad data that crashes your consumer code?
- Worker picks up message.
- Worker crashes (Exception).
- Queue sees failure, puts message back at the front.
- Worker picks up message again.
- Worker crashes again. (Infinite Loop!)
The Solution: Dead Letter Queue (DLQ)
Configure your queue: "If a message fails 3 times, move it to a separate queue called failed-messages."
Human engineers can then inspect the DLQ, fix the bug, and re-process the messages.
Chapter 5: Idempotency - Safety Handling Duplicates
Since we use "At-Least-Once" delivery, your code MUST be able to handle receiving the same message twice without breaking.
Unsafe Consumer:
// If running twice, user gets $20!
user.Balance += 10;
Idempotent Consumer:
if (db.HasProcessed(msg.Id)) {
return; // Already done!
}
user.Balance += 10;
db.MarkProcessed(msg.Id);
Chapter 6: Choosing a Broker
| Technology | Type | Best For |
|---|---|---|
| RabbitMQ | General Purpose | Complex routing, Standard Microservices. |
| Apache Kafka | Stream Processing | Massive scale (millions/sec), keeping history (replayable). |
| AWS SQS | Cloud Managed | Simple, zero maintenance, auto-scaling. |
| Redis Pub/Sub | Lightweight | Real-time chat, ephemeral notifications (data lost if crash). |
Chapter 7: Summary Checklist
Messaging Best Practices:
- [ ] Use At-Least-Once for anything important.
- [ ] Make Consumers Idempotent. Assume you will get duplicates.
- [ ] Configure a DLQ. Never let one bad message block the whole pipe.
- [ ] Monitor Queue Depth. If 10,000 messages are waiting, you need more workers.
- [ ] Keep Messages Small. Don't send a 10MB PDF. Send a link to S3.
Quick Review
Message queues and event-driven architecture decouple services by sending work as messages or events, allowing producers and consumers to scale and fail independently.
âś… Two patterns
- Queue (point-to-point): one message goes to one consumer (work distribution).
- Pub/Sub: one event goes to many subscribers (broadcast updates).
âś… Delivery guarantees
- At-most-once: may drop messages, never duplicates.
- At-least-once: may duplicate messages, tries not to lose them (common default).
- Exactly-once: very hard; usually approximated with idempotency + careful processing.
âś… The safety trio
- Idempotent consumer: processing the same message twice is safe.
- DLQ: isolates “poison messages” after retries so the pipeline keeps moving.
- Backpressure: watch queue depth and scale consumers when lag grows.