Overcoming Prometheus Scaling Limits: How DSV's Road IT Built a Resilient Monitoring Stack

  • Shipping and Logistics
  • Hedehusene, Denmark

"VictoriaMetrics was chosen due to its rich feature set, streamlined architecture, and performance under real workloads. Its ability to efficiently handle high ingestion rates and large-scale time-series data, combined with horizontal scalability and cost-effective resource usage, made it stand out." Amir Kheirkhahan, Platform Engineer, DSV

Main Benefits of Using VictoriaMetrics

  • Stability & Reliability

  • Operational Simplicity

  • Massive Scale

Challenge

As DSV's Road IT scaled its Kubernetes environments, their core federated Prometheus stack proved insufficient for handling the high cardinality and sheer scale of incoming data. The team faced severe technical hurdles:

  • Performance: Individual Prometheus instances would hit memory and CPU limits, so DSV needed to enhance resource efficiency.
  • Alerting: Critical alerting and notification pipelines needed to be more dependable.
  • Operations: Maintaining the complex federated architecture created a heavy operational load for the engineering team

Solution

To eliminate the bottlenecks of their federated stack, DSV transitioned to VictoriaMetrics. By optimizing core components like vminsert, vmselect, and vmstorage, the team built a scalable architecture capable of handling ~800,000 data points per second without degradation. The new setup focused on resilience and efficiency:

  • High availability: The team deployed HA mode to eliminate single points of failure, preventing potential system crashes.
  • Efficient ingestion: DSV implemented vmagent in streaming mode for secure, resource-efficient data collection.
  • Proven scale: This move replaced operational complexity with a stable foundation that now reliably supports ~72 million active time series.

Why VictoriaMetrics Was Chosen Over Other Solutions

  • DSV selected VictoriaMetrics for its streamlined architecture and rich feature set, which offered a clear alternative to the operational complexity of their previous federated stack.
  • The team prioritized the platform's superior clustering capabilities, which provided the horizontal scalability needed to efficiently handle high ingestion rates and massive data volumes.
  • VictoriaMetrics stood out for its ability to deliver stability and high performance under real workloads while maintaining cost-effective resource usage.

Technical Stats

  • Ingestion Rate

    ~800,000 datapoints/second

  • Total Datapoints Stored

    ~3.5 Trillion

  • Daily New Time Series

    ~85 Million

  • Active Time Series (Peak)

    ~72 Million

  • Data on Disk

    ~1.83 TB