Architecture

Overview

Our observability platform is built on Grafana Cloud, providing comprehensive monitoring, logging, and tracing capabilities.

Platform Structure

Grafana Cloud Organization Model

Within Grafana Cloud, we operate an organization that enables us to create multiple isolated stacks. Each stack is a complete, self-contained observability environment that includes:

Grafana instance for visualization and dashboarding
Mimir for metrics storage (Grafana Cloud's Prometheus implementation)
Loki for log aggregation and storage
Tempo for distributed tracing
Additional services as part of the Grafana Cloud stack

The complete list of services included in each stack can be found in the Grafana Cloud stack reference.

Stack Isolation

Each stack is completely siloed from others. This means that each customer's data is stored individually in their own dedicated stack, with no possibility of cross-contamination between customers.

Data Storage Components

Default Configuration: Managed Services

By default, we use Grafana Cloud's fully managed services for all data storage components:

Managed Mimir: Metrics storage and querying with Prometheus compatibility
Managed Loki: Log aggregation and storage
Managed Tempo: Distributed tracing storage

This managed approach eliminates operational overhead while providing enterprise-grade reliability and automatic scaling.

Alternative: Self-Hosted Components

For customers with specific requirements, we can host the data storage components (Mimir, Loki, and Tempo) ourselves. This is not our default configuration but may be appropriate when:

Regulatory requirements mandate data storage in specific geographic locations
Network isolation requires data to remain within customer-controlled infrastructure
Custom data retention policies differ significantly from managed service defaults

When self-hosting components, the Grafana instance typically remains in Grafana Cloud while Mimir, Loki, and Tempo run in dedicated, customer-specific or multi-tenant infrastructure. Data isolation is maintained with each deployment option but the complexity of the management of the infrastructure may vary.

Alerting Architecture

The Challenge: Central Management with Data Isolation

While stack isolation ensures customer data security, it creates an operational challenge: how do we manage alerting rules consistently across many customer stacks without duplicating configuration or risking drift?

Solution: Cross Stack Datasource

We use Grafana Cloud's Cross Stack Datasource feature to solve this challenge. This allows us to:

Query data from multiple customer stacks from a central alerting stack
Manage alert rules centrally for consistency and scale
Maintain complete data isolation - each alert evaluation queries only the specific customer's stack
Avoid extracting or re-storing customer data, preventing any possibility of cross-contamination

How It Works

The Cross Stack Datasource acts as a federation layer. When an alert needs to evaluate:

The central alerting system queries the specific customer's stack directly
Alert evaluation happens against that customer's isolated data
Results trigger notifications through the configured channels
No customer data is copied, aggregated, or stored centrally

This approach ensures that each customer's metrics and logs remain in their dedicated stack while enabling us to provide consistent, scalable alerting across all customers.

PreviousObservability NextAlerts

Last updated 3 months ago

Was this helpful?

hashtagOverview

hashtagPlatform Structure

hashtagGrafana Cloud Organization Model

hashtagStack Isolation

hashtagData Storage Components

hashtagDefault Configuration: Managed Services

hashtagAlternative: Self-Hosted Components

hashtagAlerting Architecture

hashtagThe Challenge: Central Management with Data Isolation

hashtagSolution: Cross Stack Datasource

hashtagHow It Works