Overcoming Prometheus Scaling Limits: How DSV's Road IT Built a Resilient Monitoring Stack
- Shipping and Logistics
- Hedehusene, Denmark
"VictoriaMetrics was chosen due to its rich feature set, streamlined architecture, and performance under real workloads. Its ability to efficiently handle high ingestion rates and large-scale time-series data, combined with horizontal scalability and cost-effective resource usage, made it stand out." Amir Kheirkhahan, Platform Engineer, DSV
Main Benefits of Using VictoriaMetrics
Stability & Reliability
Operational Simplicity
Massive Scale
Challenge
As DSV's Road IT scaled its Kubernetes environments, their core federated Prometheus stack proved insufficient for handling the high cardinality and sheer scale of incoming data. The team faced severe technical hurdles:
- Performance: Individual Prometheus instances would hit memory and CPU limits, so DSV needed to enhance resource efficiency.
- Alerting: Critical alerting and notification pipelines needed to be more dependable.
- Operations: Maintaining the complex federated architecture created a heavy operational load for the engineering team
Solution
To eliminate the bottlenecks of their federated stack, DSV transitioned to VictoriaMetrics. By optimizing core components like vminsert, vmselect, and vmstorage, the team built a scalable architecture capable of handling ~800,000 data points per second without degradation. The new setup focused on resilience and efficiency:
- High availability: The team deployed HA mode to eliminate single points of failure, preventing potential system crashes.
- Efficient ingestion: DSV implemented vmagent in streaming mode for secure, resource-efficient data collection.
- Proven scale: This move replaced operational complexity with a stable foundation that now reliably supports ~72 million active time series.
Why VictoriaMetrics Was Chosen Over Other Solutions
- DSV selected VictoriaMetrics for its streamlined architecture and rich feature set, which offered a clear alternative to the operational complexity of their previous federated stack.
- The team prioritized the platform's superior clustering capabilities, which provided the horizontal scalability needed to efficiently handle high ingestion rates and massive data volumes.
- VictoriaMetrics stood out for its ability to deliver stability and high performance under real workloads while maintaining cost-effective resource usage.
Technical Stats
Ingestion Rate
~800,000 datapoints/second
Total Datapoints Stored
~3.5 Trillion
Daily New Time Series
~85 Million
Active Time Series (Peak)
~72 Million
Data on Disk
~1.83 TB