Prometheus Scalability: High Cardinality and How to Fix It

Prometheus has become the de facto standard for metrics collection in cloud-native environments. Its pull-based model, powerful query language, and deep Kubernetes integration make it an obvious choice for platform teams. But as organizations scale — more services, more replicas, more labels — Prometheus starts showing cracks. Queries slow down, memory usage balloons, and what was once a reliable monitoring backbone becomes an operational liability. This article examines exactly why that happens and what you can do about it, from quick tactical fixes to full architectural overhauls.

The Cardinality Problem: Why It Kills Prometheus

Cardinality is the single most important concept to understand when troubleshooting Prometheus scalability. In the context of time series databases, cardinality refers to the total number of unique label combinations that exist across all your metrics. Every unique combination creates a distinct time series, and Prometheus must store, index, and query each of them independently.

Consider a simple HTTP request counter: http_requests_total. If you label it with method (GET, POST, PUT, DELETE), status_code (200, 201, 400, 404, 500, 503), and endpoint (50 distinct API paths), you already have 4 × 6 × 50 = 1,200 time series from a single metric. Now add a customer_id label with 10,000 distinct values. You have just created 12 million time series from one counter.

This is the cardinality explosion pattern, and it is the most common cause of Prometheus degradation in production. The problem is compounded by labels that have unbounded or high-entropy values:

  • User IDs or session tokens embedded in labels
  • Request IDs or trace IDs (effectively infinite cardinality)
  • Pod names without proper aggregation, especially in autoscaling environments
  • Free-form error messages or SQL query strings
  • IP addresses, particularly in environments with high churn

The relationship between cardinality and resource consumption is not linear — it is roughly proportional but carries significant overhead per series in memory indexing structures. Prometheus stores its head block (the most recent data) entirely in memory. Each time series in the head block requires approximately 3–4 KB of RAM for the series itself plus index entries. A Prometheus instance with 1 million active time series will typically consume 4–6 GB of RAM just for the head block, before accounting for query processing overhead.

Memory Explosion Patterns and Real Symptoms

Memory issues in Prometheus rarely announce themselves cleanly. Instead, they manifest through a cascade of symptoms that are easy to misdiagnose. Understanding the failure modes helps you identify the root cause faster and apply the right remedy.

The Head Block Growth Pattern

Prometheus keeps a two-hour window of data in memory as the head block before compacting it to disk. If your series count grows continuously — which happens when pod churn creates new series faster than old ones expire — the head block never shrinks. You can monitor this directly with prometheus_tsdb_head_series and prometheus_tsdb_head_chunks. A healthy instance shows this number plateauing. A cardinality problem shows it growing monotonically until OOM.

Query Timeout Cascades

As series count grows, even well-written PromQL queries that worked fine at 100k series become unbearably slow at 1M. Grafana dashboards start timing out, alert evaluation lags behind schedule, and Alertmanager begins receiving delayed or duplicated firing alerts. The prometheus_rule_evaluation_duration_seconds metric is a reliable early warning — when p99 evaluation time for your recording rules exceeds your evaluation interval, you have a problem.

Scrape Failures Under Memory Pressure

When Prometheus is under heavy memory pressure, its Go garbage collector starts spending more time collecting, which introduces latency into the scrape loop. Scrapes begin timing out, causing gaps in your data. This creates a deceptive situation where you have gaps in metrics precisely when your system is under stress — exactly when you need monitoring most. Watch up metric drops and prometheus_target_scrapes_exceeded_sample_limit_total for these patterns.

Compaction Pressure

High cardinality also stresses the TSDB compaction process. Prometheus compacts head block data into persistent blocks every two hours. With millions of series, compaction can take tens of seconds to minutes, during which write performance degrades. prometheus_tsdb_compaction_duration_seconds rising above 30 seconds is a warning sign. Compaction failures leave orphaned blocks on disk, gradually consuming storage and potentially corrupting the TSDB if left unaddressed.

Short-Term Fixes: Tactical Remediation

When you are dealing with a Prometheus instance under active stress, you need immediate relief before you can implement architectural changes. These techniques can be applied quickly and provide meaningful headroom while longer-term solutions are planned.

Recording Rules: Pre-Computing Aggregations

Recording rules are the most underutilized tool in the Prometheus toolbox. They allow you to pre-compute expensive PromQL expressions and store the results as new time series. The key benefit for scalability is that you can aggregate away high-cardinality dimensions, dramatically reducing the number of series that dashboards and alerts need to query at runtime.

Consider an example where you have per-pod HTTP request rates with labels for pod, namespace, service, method, and status_code. Your dashboards mostly need service-level aggregations, not per-pod breakdowns. A recording rule can produce that aggregation once per evaluation interval:

groups:
  - name: http_aggregations
    interval: 30s
    rules:
      - record: job:http_requests_total:rate5m
        expr: |
          sum by (job, namespace, method, status_code) (
            rate(http_requests_total[5m])
          )

      - record: job:http_request_duration_seconds:p99_5m
        expr: |
          histogram_quantile(0.99,
            sum by (job, namespace, le) (
              rate(http_request_duration_seconds_bucket[5m])
            )
          )

      - record: namespace:http_requests_total:rate5m
        expr: |
          sum by (namespace, status_code) (
            rate(http_requests_total[5m])
          )

Notice that the pod label is dropped in all three rules. If you had 500 pods, you have just reduced the cardinality of these series by a factor of 500. Dashboards querying job:http_requests_total:rate5m instead of computing rate(http_requests_total[5m]) on the fly will return results orders of magnitude faster.

The naming convention level:metric:operations is the Prometheus community standard. Following it consistently makes recording rules self-documenting and helps teams understand the aggregation level at a glance.

Metric Dropping via Relabeling

Relabeling gives you surgical control over what metrics Prometheus actually ingests. There are two stages where relabeling applies: relabel_configs (applied before scraping, based on target metadata) and metric_relabel_configs (applied after scraping, based on scraped metric names and labels). For cardinality control, metric_relabel_configs is your primary tool.

Dropping entire metric families that you do not use is the most impactful change you can make. Many exporters emit dozens of metrics that are irrelevant for most use cases:

scrape_configs:
  - job_name: kubernetes-pods
    metric_relabel_configs:
      # Drop metrics we never query
      - source_labels: [__name__]
        regex: 'go_gc_.*|go_memstats_.*|process_.*'
        action: drop

      # Drop high-cardinality label values while keeping the metric
      - source_labels: [__name__, pod]
        regex: 'http_requests_total;.*'
        target_label: pod
        replacement: ''

      # Drop entire time series based on label combinations
      - source_labels: [__name__, le]
        regex: 'http_request_duration_seconds_bucket;(\+Inf|100|250|500)'
        action: keep

      # Replace high-cardinality endpoint paths with normalized versions
      - source_labels: [endpoint]
        regex: '/api/v1/users/[0-9]+'
        target_label: endpoint
        replacement: '/api/v1/users/:id'

Be careful with metric_relabel_configs — they are applied per scraped sample, so computationally expensive regex patterns across high-frequency scrapes can add CPU overhead. Test regex patterns and prefer anchored, non-backtracking expressions.

Cardinality Limits as a Safety Net

Prometheus 2.x introduced per-scrape sample limits as a defensive mechanism. These do not solve cardinality problems but prevent a single misbehaving exporter from taking down your entire Prometheus instance:

global:
  # Global limit across all scrapes
  sample_limit: 0  # 0 = no limit

scrape_configs:
  - job_name: application-pods
    # Reject scrapes that return more than 50k samples
    sample_limit: 50000

    # Limit unique label sets per scrape
    label_limit: 64

    # Limit label name and value lengths
    label_name_length_limit: 256
    label_value_length_limit: 1024

    kubernetes_sd_configs:
      - role: pod

When a scrape exceeds sample_limit, Prometheus rejects the entire scrape and marks the target as having failed. This is a hard circuit breaker, not a graceful degradation — the target’s up metric goes to 0. Set limits conservatively above your expected maximum to avoid false positives, and alert on prometheus_target_scrapes_exceeded_sample_limit_total > 0.

Architectural Solutions: Federation and Remote Write

Once you have exhausted tactical optimizations or when your scale genuinely exceeds what a single Prometheus instance can handle, architectural changes become necessary. Prometheus offers two built-in mechanisms for scaling horizontally: federation and remote_write.

Federation: Hierarchical Scraping

Prometheus federation allows one Prometheus instance to scrape aggregated metrics from other Prometheus instances via the /federate endpoint. In a typical setup, leaf-level Prometheus instances collect raw metrics from targets, while a global Prometheus instance federates pre-aggregated recording rule results from the leaves.

# Global Prometheus configuration federating from regional instances
scrape_configs:
  - job_name: federate-eu-west
    scrape_interval: 15s
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        # Only federate pre-aggregated recording rule metrics
        - '{__name__=~"job:.*"}'
        - '{__name__=~"namespace:.*"}'
        - '{__name__=~"cluster:.*"}'
        # Federate key infrastructure alerts
        - 'up{job="kubernetes-apiservers"}'
    static_configs:
      - targets:
          - prometheus-eu-west.monitoring.svc:9090
          - prometheus-us-east.monitoring.svc:9090
          - prometheus-ap-south.monitoring.svc:9090

Federation works well for multi-region global dashboards and cross-cluster alerting on aggregated signals. Its limitations are significant, though: the /federate endpoint is a point-in-time snapshot, so you cannot run range queries against federated data effectively. It also creates a single point of failure at the global layer and does not provide true long-term storage. For those requirements, remote_write is the better path.

Remote Write: Streaming to Durable Storage

Remote write allows Prometheus to stream all ingested samples to an external storage backend in real time. The external backend handles long-term retention, multi-tenancy, and global query federation. Prometheus itself becomes a stateless collection agent that maintains only a short local retention window for resilience against network outages.

remote_write:
  - url: https://thanos-receive.monitoring.svc:19291/api/v1/receive
    # Authentication for the remote endpoint
    basic_auth:
      username: prometheus
      password_file: /etc/prometheus/secrets/remote-write-password

    # Tune the write queue for throughput vs. latency
    queue_config:
      # Number of shards (parallel write connections)
      max_shards: 200
      min_shards: 1
      # Samples to batch before flushing
      max_samples_per_send: 500
      # Time to wait before flushing an incomplete batch
      batch_send_deadline: 5s
      # In-memory buffer capacity per shard
      capacity: 2500
      # How long to retry failed writes
      min_backoff: 30ms
      max_backoff: 5s

    # Metadata configuration
    metadata_config:
      send: true
      send_interval: 1m

    # Filter what gets remote-written (reduce egress)
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'go_gc_.*|go_memstats_.*'
        action: drop

The queue_config tuning is critical and frequently misunderstood. Each shard maintains its own connection to the remote endpoint and its own in-memory queue. Increasing max_shards increases parallelism and throughput but also increases memory consumption and load on the remote endpoint. The right values depend heavily on your sample ingestion rate and network latency to the remote endpoint. Monitor prometheus_remote_storage_queue_highest_sent_timestamp_seconds versus prometheus_remote_storage_highest_timestamp_in_seconds — the lag between them tells you how far behind your remote write queue is.

Long-Term Solutions: Thanos vs Grafana Mimir vs VictoriaMetrics

For production systems that need long-term storage, global query capability, high availability, and genuine horizontal scalability, purpose-built solutions are the right answer. Three projects dominate this space: Thanos, Grafana Mimir, and VictoriaMetrics. They share similar goals but differ significantly in architecture, operational complexity, and trade-offs.

CriterionThanosGrafana MimirVictoriaMetrics
ArchitectureSidecar + object store; modular componentsFully distributed; Cortex-derived microservicesSingle binary or cluster mode
Storage backendAny S3-compatible object storeAny S3-compatible object storeOwn TSDB format on local or object store
PromQL compatibilityFull PromQL; own query engineFull PromQL; Mimir-specific extensionsMetricsQL (PromQL superset)
Operational complexityMedium — multiple components, each simpleHigh — many microservices with complex configLow — minimal components, simple config
Ingest scalabilityScales via Thanos Receive fan-outHorizontally scalable distributors + ingestersExcellent; handles millions of samples/sec per node
Query performanceGood; Store Gateway caches object store dataGood; query sharding and caching built inExcellent; highly optimized query engine
Multi-tenancyLimited; tenant isolation via external labelsNative; per-tenant limits and isolationEnterprise only; basic in cluster mode
DeduplicationBuilt-in; replica dedup at query timeBuilt-in; ingest-time and query-time dedupBuilt-in; dedup with downsampling
DownsamplingYes; Thanos Compactor handles itYes; configurable per tenantYes; automatic with vmbackupmanager
LicenseApache 2.0 (fully open source)AGPL-3.0 (open source) + enterprise tierApache 2.0 (community); proprietary enterprise
Best fitTeams already running Prometheus wanting minimal disruptionLarge orgs needing multi-tenant SaaS-grade monitoringTeams prioritizing simplicity and raw performance

Thanos: The Incremental Path

Thanos integrates with existing Prometheus deployments through a sidecar process that runs alongside each Prometheus pod. The sidecar uploads completed TSDB blocks to object storage (S3, GCS, Azure Blob) and exposes a gRPC Store API that Thanos Query uses to federate queries across all Prometheus instances plus historical data in the object store. This makes Thanos the lowest-friction path for teams with existing Prometheus infrastructure.

Thanos Receive is an alternative ingest path that accepts remote_write directly, which is useful when you want to decouple Prometheus instances from the query layer or implement active-active HA without relying on Prometheus replication. Thanos Compactor handles block compaction and downsampling on the object store, creating 5-minute and 1-hour resolution downsamples automatically for efficient long-range queries.

Grafana Mimir: Enterprise-Grade Multi-Tenancy

Mimir is a fork of Cortex, rewritten by Grafana Labs to address operational complexity issues in Cortex’s architecture. It follows the same microservices pattern — Distributor, Ingester, Querier, Query Frontend, Store Gateway, Compactor, Ruler — but with significantly improved defaults and a monolithic deployment mode that simplifies small-scale deployments. Mimir’s headline feature is native multi-tenancy with per-tenant cardinality limits, query limits, and ingestion rate limits enforced at the distributor layer.

Mimir is the right choice when you need to run monitoring as an internal platform service for multiple teams or business units, each with independent resource quotas and data isolation. The operational overhead is substantial, but for large organizations it is justified by the isolation and governance capabilities.

VictoriaMetrics: Simplicity and Raw Performance

VictoriaMetrics takes a fundamentally different approach: rather than building on top of Prometheus’s TSDB format, it implements its own highly optimized storage engine. The result is dramatically better compression (often 5–10x better than Prometheus TSDB) and query performance that consistently outperforms Thanos and Mimir in benchmarks, particularly for high-cardinality workloads and large time ranges. The single-node binary handles workloads that would require a full Thanos cluster, and the cluster version adds horizontal scalability with fewer moving parts than Thanos or Mimir.

VictoriaMetrics also supports MetricsQL, a superset of PromQL that adds useful functions like outlierIQR(), limitOffset(), and improved histogram handling. Grafana datasource compatibility is maintained through a PromQL-compatible API, so existing dashboards work without modification.

Practical Guide: Choosing Your Scaling Approach

The right solution depends on your current scale, team capacity, and trajectory. This is not a one-size-fits-all decision. Here is a pragmatic framework for matching the solution to the problem.

Stage 1: Under 1 Million Active Series

A single Prometheus instance with proper tuning should handle this comfortably. Focus on recording rules to eliminate expensive dashboard queries, implement metric_relabel_configs to drop unused metrics, and set sample_limit guards. Increase Prometheus memory limits to give it adequate headroom (at minimum 8 GB, ideally 16 GB for instances approaching 1M series). Set --storage.tsdb.retention.time to the minimum that satisfies your compliance and debugging needs — 15 days is often enough if you have remote_write configured to a longer-term store.

Stage 2: 1–5 Million Active Series

At this scale, a single instance is viable but requires vertical scaling and aggressive optimization. Consider sharding your Prometheus deployment by functional area: one instance for infrastructure metrics, one for application metrics, one for business metrics. This is horizontal scaling via functional decomposition, not true distributed architecture. Add remote_write to object storage for long-term retention. If you are running Kubernetes, the Prometheus Operator with multiple Prometheus custom resources per namespace group is a clean implementation of this pattern.

This is also the stage where VictoriaMetrics single-node becomes compelling — it can handle this range comfortably with far less RAM than Prometheus and simpler operations than a full distributed system.

Stage 3: 5 Million+ Active Series or Global Requirements

At this scale, a distributed architecture is necessary. Your choice among Thanos, Mimir, and VictoriaMetrics Cluster depends primarily on:

  • Existing Prometheus investment + incremental migration: Thanos Sidecar is the path of least resistance. Your existing Prometheus instances keep working; you add sidecars and deploy Thanos query components.
  • Multi-tenant platform with governance requirements: Grafana Mimir, accepting the operational complexity in exchange for native tenant isolation and limits.
  • Maximum performance with minimal operational burden: VictoriaMetrics Cluster, replacing Prometheus entirely or alongside it via remote_write, with dramatically simpler operations than Thanos or Mimir.
  • Multi-region, cross-cloud global monitoring: Thanos or Mimir, both have mature multi-region architectures; VictoriaMetrics Enterprise has similar capabilities but is not open source.

Complementary Configuration: Thanos Sidecar Example

For teams adopting Thanos, the sidecar configuration alongside a Prometheus deployment looks like this in a Kubernetes environment:

# Thanos sidecar configuration (as part of Prometheus pod spec)
containers:
  - name: prometheus
    image: prom/prometheus:v2.48.0
    args:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      # Keep 2h locally; Thanos handles long-term
      - --storage.tsdb.retention.time=2h
      # Thanos requires min-block-duration = max-block-duration for sidecar
      - --storage.tsdb.min-block-duration=2h
      - --storage.tsdb.max-block-duration=2h
      - --web.enable-lifecycle

  - name: thanos-sidecar
    image: quay.io/thanos/thanos:v0.32.0
    args:
      - sidecar
      - --tsdb.path=/prometheus
      - --prometheus.url=http://localhost:9090
      - --grpc-address=0.0.0.0:10901
      - --http-address=0.0.0.0:10902
      # Object store configuration
      - --objstore.config-file=/etc/thanos/objstore.yml
    volumeMounts:
      - name: prometheus-data
        mountPath: /prometheus
      - name: thanos-objstore-config
        mountPath: /etc/thanos

---
# Object store configuration (s3-compatible)
# /etc/thanos/objstore.yml
type: S3
config:
  bucket: my-thanos-metrics
  endpoint: s3.eu-west-1.amazonaws.com
  region: eu-west-1
  # Use IAM role or provide credentials via environment
  access_key: ""
  secret_key: ""

VictoriaMetrics as Remote Write Target

If you choose VictoriaMetrics as your remote storage backend, the integration with existing Prometheus instances is straightforward. VictoriaMetrics exposes a remote_write compatible endpoint at /api/v1/write:

# prometheus.yml — remote write to VictoriaMetrics
remote_write:
  - url: http://victoriametrics:8428/api/v1/write
    queue_config:
      max_samples_per_send: 10000
      capacity: 20000
      max_shards: 30

# VictoriaMetrics single-node startup (Docker Compose example)
services:
  victoriametrics:
    image: victoriametrics/victoria-metrics:v1.95.1
    command:
      - -storageDataPath=/victoria-metrics-data
      # Retain 1 year of data
      - -retentionPeriod=12
      # Enable deduplication (for HA Prometheus pairs)
      - -dedup.minScrapeInterval=15s
      # Memory limit
      - -memory.allowedPercent=60
    ports:
      - "8428:8428"
    volumes:
      - vm-data:/victoria-metrics-data

volumes:
  vm-data:

VictoriaMetrics also exposes a Prometheus-compatible query API at /api/v1/query and /api/v1/query_range, so Grafana datasources pointing at it need only a URL change — no plugin installation required for basic use. For MetricsQL-specific functions, use the VictoriaMetrics datasource plugin available in Grafana’s plugin catalog.

Observing Your Prometheus Health

Before implementing any of these solutions, establish a baseline understanding of your Prometheus instance’s current health. The following PromQL expressions give you immediate visibility into the key indicators:

# Total active time series in the head block
prometheus_tsdb_head_series

# Series created vs. removed (churn indicator)
rate(prometheus_tsdb_head_series_created_total[5m])
rate(prometheus_tsdb_head_series_removed_total[5m])

# Memory usage of the head block chunks
prometheus_tsdb_head_chunks_storage_size_bytes

# Remote write lag (seconds behind)
(
  prometheus_remote_storage_highest_timestamp_in_seconds
  - prometheus_remote_storage_queue_highest_sent_timestamp_seconds
)

# Top cardinality contributors (requires Prometheus 2.14+)
# Run this in Prometheus /api/v1/query:
topk(20,
  count by (__name__) ({__name__!=""})
)

# Alert evaluation lag
rate(prometheus_rule_evaluation_duration_seconds_sum[5m])
/ rate(prometheus_rule_evaluation_duration_seconds_count[5m])

Prometheus also exposes a /api/v1/status/tsdb endpoint that returns cardinality statistics including the top 10 metrics by series count and the top 10 label names by cardinality. This is invaluable for identifying which specific metrics or labels are causing problems and should be your first stop when investigating a new cardinality issue.

Frequently Asked Questions

How do I identify which metrics are causing my cardinality explosion?

Start with the /api/v1/status/tsdb endpoint on your Prometheus instance. It returns a JSON response with seriesCountByMetricName, seriesCountByLabelValuePair, and seriesCountByLabelName arrays, each showing the top contributors to your total series count. This points you directly at the offending metrics and labels without any external tooling. Complement this with topk(20, count by (__name__)({__name__!=""})) in PromQL, which gives you the same information in a queryable format you can alert on. Once you know the metric name, query count by (label1, label2) (your_metric_name) replacing label pairs to identify which specific label dimensions are driving the high count.

Can I run Prometheus in HA without external dependencies?

Yes, but with important caveats. The standard HA pattern for standalone Prometheus is to run two identical Prometheus instances scraping the same targets. Both instances collect data independently, and Alertmanager deduplicates alerts from both using its mesh clustering (run multiple Alertmanager instances in a cluster and point both Prometheus instances at all of them). This provides alerting HA — alerts fire even if one Prometheus instance is down. It does not provide query HA in the traditional sense, because each instance has its own independent data and queries against a failed instance simply fail. Dashboards pointing at a specific instance will show gaps during that instance’s downtime. For true query HA with failover and deduplication, you need Thanos Query (which can deduplicate replica series at query time using the replica external label) or a similar solution. Running Prometheus without any external dependencies means accepting these query HA limitations.

What is a safe maximum cardinality for a single Prometheus instance?

There is no universal number — it depends heavily on your scrape interval, available RAM, and query patterns. A practical guideline: allocate 3–4 GB of RAM per million active series for the head block alone, then add 50% headroom for query processing. A Prometheus instance with 16 GB of RAM can comfortably handle 2–3 million active series under typical workloads. Beyond 5 million series, even well-resourced single instances start showing query performance degradation that impacts alert evaluation reliability. The more meaningful limit to enforce operationally is series churn rate: an instance creating more than 100,000 new series per minute will struggle regardless of total series count, because the head block indexing operations become a bottleneck. Monitor rate(prometheus_tsdb_head_series_created_total[5m]) and treat sustained values above 50,000/minute as a warning condition.

Should I use Thanos Sidecar or Thanos Receive for ingestion?

The choice comes down to whether you want to keep Prometheus as the authoritative ingest layer or move toward a push-based architecture. Thanos Sidecar is the simpler, lower-risk option: Prometheus continues operating normally, the sidecar uploads completed blocks to object storage in the background, and you gain long-term storage and global query capability with minimal disruption. The drawback is that Prometheus must have local storage for at least 2 hours (one block duration), and the sidecar requires that min-block-duration equals max-block-duration, which prevents Prometheus from doing its own compaction. Thanos Receive accepts remote_write from any Prometheus instance, which enables active-active HA setups where multiple Prometheus replicas write to a Receive hashring simultaneously, and Receive handles deduplication. This is more complex to operate but provides better ingest-side redundancy. For most teams starting with Thanos, Sidecar is the right first step. Receive makes sense when you are building a centralized monitoring platform that accepts writes from many Prometheus instances across different clusters or environments.

Is it worth migrating from Thanos to VictoriaMetrics or Mimir once you are already running Thanos?

Migration from Thanos to an alternative should be driven by specific pain points, not by benchmark numbers alone. If your team is spending significant time operating Thanos (debugging query Store Gateway cache issues, managing compactor conflicts, handling block upload failures), and your primary need is simplicity and query performance rather than multi-tenancy, VictoriaMetrics is worth evaluating seriously. The migration path is smooth: run VictoriaMetrics alongside Thanos temporarily, migrate remote_write targets to VictoriaMetrics, and decommission Thanos once you are satisfied. Historical data in your object store can be imported using VictoriaMetrics’s vmctl tool. If your pain point is multi-tenancy and governance — multiple teams with independent data isolation, per-tenant rate limits, chargeback requirements — Mimir is the right destination and the operational complexity is justified. The one scenario where staying with Thanos is usually the right call is when your organization has invested heavily in Thanos tooling, has stable operations, and does not have specific unmet needs. Migration carries real costs in engineering time and operational risk; make sure the benefits are concrete and quantified before committing.