“VictoriaMetrics Just Works - and Uses Fewer Hardware Resources Compared to Other Tools!”

  • Internet
  • San Francisco, USA
  • 500+ Employees

Grammarly's digital writing assistant supports more than 30 million DAUs and 30,000 teams write more clearly and effectively every day. In building a product that scales across multiple platforms and devices, Grammarly works to empower users whenever and wherever they communicate. Grammarly's values-driven team is growing to support our expanding user base and to continue developing our writing assistant into a truly comprehensive communication partner. With a working model that balances remote work with in-person collaboration at Grammarly's hubs in San Francisco, Kyiv, New York, and Vancouver, the Grammarly team strives to help people around the world connect and be understood. Mission: Improve lives by improving communication.

Main Benefits of Using VictoriaMetrics

  • 10x Cost Savings

  • Ingestion Types Flexibility

  • Performance

  • Easy to Get Started

  • Responsive VM Developers

  • Great Support & Docs

Challenge

The maintenance and scaling of our previous on-premises monitoring system was hard and required major engineering time and resources to maintain.

The stability of the previous solution was unreliable..

Our previous system struggled with storing frequently changing metrics (the moderate churn rate was a concern).

The overall costs of the previous solution were too high.

Solution

Ingestion type flexibility (support for Graphite, OpenMetrics, etc.) was definitely a winning feature and important benefit for us.

VictoriaMetrics comes with good documentation and is easy to bootstrap.

The high level of responsiveness of the VictoriaMetrics developers and support team during our research phase and production have made us extremely happy customers.

Delivered 10x cost savings versus our prior monitoring solution.

Why VictoriaMetrics Was Chosen Over Other Solutions

  • Great On-Premises Solution

  • Outperformed Competitive Solutions in Benchmarks

  • Strong POC Results

  • Direct Access & Great Support by VM Developers

  • After trying out SaaS solutions we decided to go with an in-house setup. Out of the various in-house tools we had short-listed, we decided to try VM first during a PoC taking into account publicly available benchmarking with competitive solutions. The PoC results and the VictoriaMetrics developers' help made it an easy decision to move forward with a VictoriaMetrics solution.

Technical Stats

  • Median memory usage during the last 24h

    sum(avg_over_time(process_resident_memory_bytes[24h]))

    618 GiB

  • The average number of cpu cores used during the last 24h

    sum(rate(process_cpu_seconds_total[24h]))

    ~66 CPU cores

  • The maximum number of active time series during the last 24 hours

    sum(max_over_time(vm_cache_entries{type="storage/hour_metric_ids"}[24h]))

    ~120 Mil

  • Daily time series churn rate

    sum(increase(vm_new_timeseries_created_total[24h]))

    ~74 Mil

  • The average ingestion rate over the last 24h

    sum(rate(vm_rows_inserted_total[24h]))

    3.16Mil datapoints/sec

  • The total number of datapoints

    sum(vm_rows{type=~"storage/.+"})

    57.7 Tri

  • The total number of entries in inverted index

    sum(vm_rows{type="indexdb"}))

    295 Bil

  • Data size on disk

    sum(vm_data_size_bytes{type=~"storage/.+"})

    72.8 TiB

  • Index size on disk:

    sum(vm_data_size_bytes{type="indexdb"})

    7.6 TiB

  • The average datapoint size on disk

    sum(vm_data_size_bytes) / sum(vm_rows)

    ~1.5 B

  • The average range query rate over the last 24h

    sum(rate(vm_http_requests_total{path=~".*/api/v1/query_range"}[24h]))

    ~1.4 req/s

  • The average instant query rate over the last 24h

    sum(rate(vm_http_requests_total{path=~".*/api/v1/query"}[24h]))

    ~47 req/s

  • Average range query duration quantiles over the last 24h

    max(avg_over_time(vm_request_duration_seconds{path=~".*/api/v1/query_range"}[24h])) by (quantile)

    1 2.29s 0.500 0.006s 0.900 0.3s 0.970 1.04s 0.990 2.21s

  • Average instant query duration quantiles over the last 24h

    max(avg_over_time(vm_request_duration_seconds{path=~".*/api/v1/query"}[24h])) by (quantile)

    1 19.8s 0.500 0.007s 0.900 0.2s 0.970 1s 0.990 2.6s