Architecture

Overview

Our observability platform is built on Grafana Cloud, providing comprehensive monitoring, logging, and tracing capabilities.

Platform Structure

Grafana Cloud Organization Model

Within Grafana Cloud, we operate an organization that enables us to create multiple isolated stacks. Each stack is a complete, self-contained observability environment that includes:

  • Grafana instance for visualization and dashboarding

  • Mimir for metrics storage (Grafana Cloud's Prometheus implementation)

  • Loki for log aggregation and storage

  • Tempo for distributed tracing

  • Additional services as part of the Grafana Cloud stack

The complete list of services included in each stack can be found in the Grafana Cloud stack referencearrow-up-right.

Stack Isolation

Each stack is completely siloed from others. This means that each customer's data is stored individually in their own dedicated stack, with no possibility of cross-contamination between customers.

Data Storage Components

Default Configuration: Managed Services

By default, we use Grafana Cloud's fully managed services for all data storage components:

  • Managed Mimir: Metrics storage and querying with Prometheus compatibility

  • Managed Loki: Log aggregation and storage

  • Managed Tempo: Distributed tracing storage

This managed approach eliminates operational overhead while providing enterprise-grade reliability and automatic scaling.

Alternative: Self-Hosted Components

For customers with specific requirements, we can host the data storage components (Mimir, Loki, and Tempo) ourselves. This is not our default configuration but may be appropriate when:

  • Regulatory requirements mandate data storage in specific geographic locations

  • Network isolation requires data to remain within customer-controlled infrastructure

  • Custom data retention policies differ significantly from managed service defaults

When self-hosting components, the Grafana instance typically remains in Grafana Cloud while Mimir, Loki, and Tempo run in dedicated, customer-specific or multi-tenant infrastructure. Data isolation is maintained with each deployment option but the complexity of the management of the infrastructure may vary.

Alerting Architecture

The Challenge: Central Management with Data Isolation

While stack isolation ensures customer data security, it creates an operational challenge: how do we manage alerting rules consistently across many customer stacks without duplicating configuration or risking drift?

Solution: Cross Stack Datasource

We use Grafana Cloud's Cross Stack Datasourcearrow-up-right feature to solve this challenge. This allows us to:

  • Query data from multiple customer stacks from a central alerting stack

  • Manage alert rules centrally for consistency and scale

  • Maintain complete data isolation - each alert evaluation queries only the specific customer's stack

  • Avoid extracting or re-storing customer data, preventing any possibility of cross-contamination

How It Works

The Cross Stack Datasource acts as a federation layer. When an alert needs to evaluate:

  1. The central alerting system queries the specific customer's stack directly

  2. Alert evaluation happens against that customer's isolated data

  3. Results trigger notifications through the configured channels

  4. No customer data is copied, aggregated, or stored centrally

This approach ensures that each customer's metrics and logs remain in their dedicated stack while enabling us to provide consistent, scalable alerting across all customers.

Grafana Organisation Diagram

Last updated

Was this helpful?