Real-time Messaging - Slack Engineering

The Nugget

  • Slack efficiently delivers real-time messaging globally using a robust server architecture with components like Channel Servers, Gateway Servers, and Admin Servers, ensuring user connectivity and message delivery with minimal latency.

Make it stick

  • 🌍 Slack sends millions of messages daily across various channels in real-time.
  • 💡 Each user has a persistent websocket connection for receiving instant updates.
  • ⚙️ Central to Slack's architecture are Channel Servers (CS) and Gateway Servers (GS) which manage message routing and user subscriptions.
  • ⏱️ Message delivery is typically completed in 500ms, showcasing the system's speed and efficiency.

Key insights

Real-time Messaging Architecture

  • Channel Servers (CS):

    • Statefully manage channel histories.
    • Host up to 16 million channels per server.
    • Utilize consistent hashing for mapping channels, allowing quick recovery from server issues (under 20 seconds).
  • Gateway Servers (GS):

    • Statefully manage user sessions and channel subscriptions.
    • Deployed across multiple geographic regions for optimal user connection.
  • Admin Servers (AS):

    • Stateless servers that interface between Slack's backend and channel servers.
  • Presence Servers (PS):

    • Track online user statuses, enabling real-time presence notifications.

Message and Event Handling

  • Message Flow:

    • Clients send messages via websocket connections to Gateway Servers, which then route to the appropriate Channel Server.
    • The message is broadcasted to all clients subscribed to the channel in real time.
  • Event Processing:

    • Different types of events trigger real-time updates (e.g., reactions, member additions).
    • Spikes in event traffic are often linked to scheduled reminders and messages sent at the top of the hour.

Scalability and Future Growth

  • Slack's architecture currently supports tens of millions of channels and clients, with room for further scalability to accommodate larger customer bases.
  • The architecture is designed to handle future growth while maintaining performance and reliability.

Key quotes

  • "If we look at the traffic on a typical work day, it shows that most users are online between 9am and 5pm local time."
  • "Our message stats shows that the multiplicative factor for message broadcast is different across regions."
  • "Slack serves tens of millions of channels per host, tens of millions of connected clients, and delivers messages globally in 500ms."
  • "A single Slack team has all of its channels mapped across all the CSs."
  • "With the linear scalability of our current architecture, we can serve many more customers."
This summary contains AI-generated information and may have important inaccuracies or omissions.