When looking for a highly scalable time series database, there are a number of criteria to investigate and evaluate.
First up, it’s always a good idea to consider open source software. It’s more likely to have gone through comprehensive troubleshooting, it’s typically more reliable as it has more timely and widespread peer-review, it better guarantees technology independence, it’s easier to find engineers who are familiar with it and it has great security. Organizations can use open source software for as long as they want including in the form of their choice.
So now you’ve decided on an open source time series database, what are the key criteria to look at to further refine your search?
It’s also important to look at the following time series-specific functionalities and whether they are supported by the solution you’re evaluating:
From an architectural perspective, it is recommended that the following concepts and questions be considered (this is a non-exhaustive list but what we recommend based on our experience):
We always recommend open source over proprietary or otherwise not fully open technology. It’s important to look not only at a solutions’ code and features but also how its licensing is structured.
There are many other licenses in the market and AGPLv3 is not the worst. It allows using software for commercial purposes, but requires open sourcing any modifications made to it. This license does not affect small companies or startups, but protects against building a business based on it, because it usually requires some code changes for better integrations.
These are important questions to investigate and find answers for up front so that you don’t find yourself locked-in or restricted by licensing requirements that you hadn’t anticipated.
In terms of measurement when it comes to choosing a most scalable time series database it’s important to do due diligence and fully test for key indicators such as:
Resource usage can be a determining factor both from a system as well as from a budgetary perspective.
As some of you will know from experience: great scale can quickly lead to great cost!
So some of the questions to always ask are:
And
Knowing the answers to these questions will help you to make not just the most relevant technical decision, but also the most cost-effective one.
More on our thoughts with regards to cost (of scale) and cost in general is documented in this blog post.
While object storage in the cloud is the new approach and file system is the traditional one, object storage became an option only recently, with the rise of AWS which can make it quite cheap.
However, there are pros and cons to both.
Some object storage pros:
Some object storage cons:
Some local file system pros:
Some local file system cons:
This blog post isn’t entirely innocent of course, and we do have a take on the question.
With VictoriaMetrics we focus on performance, simplicity and reliability, which allows us to achieve both high scalability and availability. We believe that the local file system set up provides the best options for outstanding read and write performance.
We also always make sure that VictoriaMetrics has few moving components and no extra dependencies in order to make it as easy to operate as possible.
This was one of the key reasons why VictoriaMetrics was developed in the first place: to remove the complexity that exists in other solutions to make monitoring as accessible as possible to everyone who needs it.
VictoriaMetrics’ cluster version is a great option for ingestion rates over a million data points per second and this is what its architecture looks like:
Cluster performance and capacity can be scaled up in two ways:
General recommendations for cluster scalability:
The key to (cost-)efficiency is simplicity and transparency.
The less magic happens under the hood and the lower number of components is used, the better efficiency will be.
It’s not only about efficient usage of hardware resources, but also about the amount of effort engineers need to make in order to understand and maintain the software. Which, sometimes, costs a lot more than hardware.
VictoriaMetrics documentation contains clear tips for the scaling of cluster components. The Capacity planning section also contains general recommendations and performance expectations based on the amount of provided resources and workload volume.
We always pay extra attention to performance reports from our users and do everything we can to make VictoriaMetrics even better. All the performance improvements we publish are based on real world scenarios, which we learn from and optimize all together with our users.
Finally, we’re always very keen to collaborate, learn from our customers and users, their setups and specific cases.
That is, probably, the main reason why VictoriaMetrics remains on the high level of (cost-)efficiency and scalability.
Please do contact us if you’d like to discuss your own monitoring set up with us!