Get a quote

Building Real-Time Operational Dashboards with Go WebSockets and Redis Pub/Sub

Every restaurant group and SaaS operator eventually wants a live dashboard showing what is happening right now. The architecture decisions you make at the WebSocket layer determine whether that dashboard scales gracefully or collapses the first time fifty users open it simultaneously.

Every restaurant group and SaaS operator eventually wants a live dashboard showing what is happening right now. Orders coming in, tables turning, inventory moving, queues building. The architecture decisions you make at the WebSocket layer determine whether that dashboard scales gracefully or collapses the first time fifty users open it simultaneously.

This is the architecture we use for real-time operational dashboards in Go, and what we have learned deploying it for restaurant management systems and SaaS operations platforms across Lebanon and the MENA region.

Why HTTP polling fails for operational dashboards

The first implementation most teams reach for is polling: the browser sends a request every second, the server responds with current state. It works in a demo. It breaks under real usage.

Consider fifty concurrent dashboard users each polling once per second. That is fifty HTTP requests per second against your API, fifty database queries per second reading the same tables, fifty response serializations per second. For a dashboard that might show three counters and a list of recent events, this is enormous waste. The database is reading the same rows over and over. The CPU is serializing the same JSON over and over. The network is carrying responses that are identical to the previous response because nothing changed.

The latency matters more than it appears. One-second polling means an event that happens at T+0.001 is not visible to the dashboard user until T+1.001. For a POS system where a manager is watching order flow during a dinner rush, a one-second lag is tolerable. For a payment processing dashboard or a live auction system, it is not.

WebSockets solve both problems. One persistent connection per client. The server pushes events as they happen. Zero unnecessary database reads when nothing changes.

The core architecture: connection hub with Redis pub/sub

A single Go instance can handle thousands of WebSocket connections using goroutines. The pattern is a central hub that manages all active connections and broadcasts messages to them.

type Hub struct {
    connections map[string]map[*Connection]bool // tenantID -> connections
    register    chan *Connection
    unregister  chan *Connection
    broadcast   chan Message
    mu          sync.RWMutex
}

type Connection struct {
    ws       *websocket.Conn
    tenantID string
    send     chan []byte
}

type Message struct {
    TenantID string
    Data     []byte
}

func (h *Hub) Run() {
    for {
        select {
        case conn := <-h.register:
            h.mu.Lock()
            if h.connections[conn.tenantID] == nil {
                h.connections[conn.tenantID] = make(map[*Connection]bool)
            }
            h.connections[conn.tenantID][conn] = true
            h.mu.Unlock()

        case conn := <-h.unregister:
            h.mu.Lock()
            if conns, ok := h.connections[conn.tenantID]; ok {
                delete(conns, conn)
                close(conn.send)
                if len(conns) == 0 {
                    delete(h.connections, conn.tenantID)
                }
            }
            h.mu.Unlock()

        case msg := <-h.broadcast:
            h.mu.RLock()
            for conn := range h.connections[msg.TenantID] {
                select {
                case conn.send <- msg.Data:
                default:
                    close(conn.send)
                    delete(h.connections[msg.TenantID], conn)
                }
            }
            h.mu.RUnlock()
        }
    }
}

Each connection gets its own goroutine for writing. The main hub loop is the only place that modifies the connections map, which keeps locking simple and predictable.

The message flow is: application event occurs (order placed, payment processed) -> event published to Redis channel -> Redis subscriber goroutine receives it -> message pushed to hub broadcast channel -> hub delivers to all connected clients for that tenant.

Scaling across multiple Go instances

A single Go instance works fine until you need multiple instances for availability or load. The problem: a WebSocket client connects to Instance A. An application event occurs and is processed on Instance B. Instance B needs to notify the client on Instance A.

Redis pub/sub is the standard solution. Each instance subscribes to the same Redis channels. When any instance publishes an event to Redis, all instances receive it and deliver to their local connections.

func (h *Hub) SubscribeRedis(ctx context.Context, rdb *redis.Client, channels []string) {
    pubsub := rdb.Subscribe(ctx, channels...)
    defer pubsub.Close()

    ch := pubsub.Channel()
    for {
        select {
        case msg := <-ch:
            tenantID := strings.TrimPrefix(msg.Channel, "events:")
            h.broadcast <- Message{
                TenantID: tenantID,
                Data:     []byte(msg.Payload),
            }
        case <-ctx.Done():
            return
        }
    }
}

Each Redis message is delivered to every instance, and each instance filters down to its local connections for that tenant. For systems with hundreds of tenants, partition the Redis channels using a consistent hash ring to avoid unnecessary delivery overhead.

Handling reconnection and missed messages

WebSocket connections break. Mobile browsers disconnect when the phone sleeps. Load balancers time out idle connections. Corporate proxies reset long-lived connections.

Client-side: use exponential backoff for reconnection attempts. Start at 500ms, double on each failure, cap at 30 seconds.

Server-side: maintain a short event replay buffer per tenant. A Redis sorted set keyed by sequence number with a TTL works well. Keep 60 to 300 seconds of history depending on your data volume. When a client reconnects, it sends its last received sequence number and the server replays missed events.

For dashboards that show aggregated state rather than individual events (total orders today, current table occupancy), replay is simpler: send a full state snapshot as the first message after reconnection. This handles the case where the client has been offline long enough that its sequence number has expired.

Authentication on WebSocket connections

HTTP upgrade requests carry the same cookies and headers as regular HTTP requests. Validate the JWT or session token on the HTTP upgrade request before establishing the connection. If validation fails, return 401 and do not upgrade.

func wsHandler(hub *Hub, w http.ResponseWriter, r *http.Request) {
    claims, err := validateToken(r)
    if err != nil {
        http.Error(w, "Unauthorized", http.StatusUnauthorized)
        return
    }

    conn, err := upgrader.Upgrade(w, r, nil)
    if err != nil {
        return
    }

    c := &Connection{
        ws:       conn,
        tenantID: claims.TenantID,
        send:     make(chan []byte, 256),
    }
    hub.register <- c
    go c.writePump()
    go c.readPump(hub)
}

Token rotation is the harder problem. A JWT issued at connection time might expire while the connection is still open. The simplest approach: close the connection at token expiry and require the client to reconnect. For most operational dashboard use cases, reconnecting once an hour is acceptable.

Tenant isolation is non-negotiable. Never broadcast a message to connections belonging to a different tenant. The hub's map-per-tenant structure makes this structurally impossible rather than just conventionally avoided.

Production operational concerns

A Go goroutine uses roughly 2 to 8 KB of memory in its initial state. One thousand concurrent WebSocket connections means two thousand goroutines (one reader, one writer each) plus send channel buffers. Plan for 0.5 to 1 MB per connected client when sizing instances.

Graceful shutdown requires draining connections. On SIGTERM, stop accepting new connections, close the Redis subscription, drain the hub broadcast channel, and then close all remaining WebSocket connections with a CloseMessage status. Give clients 5 to 10 seconds to reconnect to another instance before the process exits.

Memory leaks come from connections that never cleanly close. Mobile browsers disconnect without sending a close frame. Implement a read deadline and a periodic ping/pong mechanism. If a client fails to respond to a ping within 10 seconds, close the connection server-side.

func (c *Connection) readPump(hub *Hub) {
    defer func() {
        hub.unregister <- c
        c.ws.Close()
    }()
    c.ws.SetReadDeadline(time.Now().Add(60 * time.Second))
    c.ws.SetPongHandler(func(string) error {
        c.ws.SetReadDeadline(time.Now().Add(60 * time.Second))
        return nil
    })
    for {
        _, _, err := c.ws.ReadMessage()
        if err != nil {
            return
        }
    }
}

Monitoring: track connected client count per tenant, message lag (time from Redis publish to WebSocket delivery), and connection error rates. A sudden drop in connected clients often signals a deployment or infrastructure event. Rising message lag indicates the hub or Redis subscription is backing up.

Key lessons from production

Start with a single-instance hub before adding Redis. Validate that the base architecture works, then introduce the distributed message layer.

The tenant isolation model in the hub is not optional on a multi-tenant platform. A bug that broadcasts messages across tenant boundaries is a serious data exposure incident. Make it structurally impossible in code.

Slow consumers will block your hub if you do not handle them explicitly. The default case in the broadcast select that disconnects slow consumers is not a nice-to-have. Under load, a handful of clients with full send buffers will cause backpressure that cascades to all clients.

For dashboards that show aggregated state and have no need for bidirectional communication, Server-Sent Events is simpler than WebSockets. SSE works through proxies that sometimes struggle with WebSockets, and the implementation is considerably less complex. Use WebSockets when you need client-to-server messages or when your team is already comfortable with the operational overhead.

Free PDF Download

Enjoying this article?

Enter your email and get a clean, formatted PDF of this article - free, no spam.

Free. No spam. Unsubscribe any time.

Not sure where to start?

Voxire builds real-time operational systems for SaaS platforms and restaurant groups across Lebanon and the MENA region. If you are designing a live dashboard, operations monitoring system, or event-driven SaaS feature, we can help you get the architecture right from the beginning.

https://voxire.com/get-a-quote/

Back to blog
Chat on WhatsApp