Why we generate & collect logs: About the usability & cost of modern logging systems
Logs and log management have been around far longer than monitoring and it is easy to forget just how useful and essential they can be for modern observability.
Most of you will know us for VictoriaMetrics, our open source time series database and monitoring solution. Metrics are our “thing”; but as engineers, we’ve had our fair share of frustrations in the past caused by modern logging systems that tend to create further complexity, rather than removing it.
This blog post looks at what logs are and why they matter (as a refresher or brief introduction), why logs are generated and collected as well as at the costs associated with that.
It also breaks down why and how we created the log management solution of our dreams!
What are logs & what is log management? #
A log is data that typically comes in the form of a text line and that is created during the execution of software applications and/or operating systems.
Amongst other things, logs can be used to write or document all these activities as well as automate the documentation of errors, messages, file transfers, etc. It is generally classified according to the format or data types it handles.
- Audit logs and security logs keep track of security-related activities
- Event logs take care of the traffic occurring in the network such as that caused by user management & behavior
- System logs keep track of and update operations and activities performed by operating systems
- Server logs keep a record of the activities and activity-time-periods generated by the server
Log management on the other hand is the process of dealing with large amounts of logs and mainly consists of the following:
- Collecting the logs from various sources
- Enriching, aggregating and transforming the collected logs into the form suitable for further storage, querying and analyzing
- Efficient storing of the collected logs in a centralized storage
- Efficient querying and analyzing of the stored logs
Why we generate and collect logs #
Generally and practically speaking, we generate and collect logs mostly for further analysis and debugging such as for example:
- To calculate the number of successful/unsuccessful hacker attempts to SSH into your host.
- To calculate stats over web logs for a particular status code, domain, request path, etc.
- To calculate the frequency of logs with particular substrings.
- To find all the error logs with a particular substring (such as trace_id, user_id, request_id, ip), and then to analyze the found log manually.
All these tasks are easy to perform from command-line when logs are stored in plain files.
Just start with cat /path/to/log | grep some-substring
. Then iteratively apply the needed commands to the selected logs - wc, awk, grep, less, head, sort, uniq, cut
, etc. - until the desired result is obtained.
This approach serves well for analyzing locally stored logs on a few hosts, but it doesn’t scale for cases where logs need to be analyzed across hundreds of hosts and/or application instances.
Of course, there are command-line tools for parallel execution of unix commands across hundreds of hosts such as parallel ssh, which can help, but we’ve heard from plenty of users (and our own experience told us the same) that something better is needed.
Limitations of existing solutions #
If you’ve been working with logs already, you’ll know that well established, existing solutions allow collecting and querying logs from hundreds of hosts/applications, but that they can make the analysis of these logs quite difficult.
Some of the limitations of existing centralized log management systems include:
- Awkward-to-use query languages with nonsensical limitations (such as the number of returned log lines per query)
- Inconvenient graphical UIs, which show only a few queried log lines per page, while the rest of queried log lines are available only after navigating to the next pages.
- Limited integration with existing command-line tools for logs’ analysis such as less, head, grep, awk, etc. This is especially problematic when the number of selected logs contain millions or billions of lines
- Non-trivial configuration, index creation, performance tuning and maintenance
- High costs. This includes infrastructure and operational costs for open-source systems, and usage costs for commercial SaaS log management systems
An ideal logs management solution? Why we created VictoriaLogs #
A question that I asked myself many times over the years when I had to analyze logs with modern solutions for logs was:
Why isn’t there a logs management solution that allows collecting logs from hundreds of sources and then analyzing them with good old command-line tools in the usual ergonomic way?
I couldn’t find a solution that matched my needs, so … we at VictoriaMetrics decided to create our own solution, based on our experience with developing VictoriaMetrics!
The result: an open-source, user-friendly open source database for logs - VictoriaLogs.
- It accepts structured and unstructured logs from popular log shippers such as Promtail, Filebeat, Fluentbit, Logstash, Vector, etc.
- It supports fast full-text search out of the box without any configuration and tuning
- It has perfect integration with good old command-line tools such as head, less, grep, awk, wc, sort, uniq, cut, etc. Read more about the integration here.
Why use VictoriaLogs: The cost of scale for logs #
If you’re using existing solutions such as Elasticsearch, Grafana Loki, Amazon CloudWatch Logs, Google Cloud Logging and others, you’re probably asking yourself why you should take a look at VictoriaLogs.
VictoriaLogs is open source and free to use: the only cost to you is for your own infrastructure where VictoriaLogs runs; and these costs are much lower than for competing open source solutions - see below for exact numbers.
Here are the top reasons for trying VictoriaLogs #
- Open source under Apache 2 license & free to use
- Infrastructure and operational costs: VictoriaLogs is up to 30x less expensive than other open-source solutions for logs:
- It requires up to 30x less RAM and up to 15x less disk space for the same production workload by comparison
- The query performance is comparable or better
- Easy to setup and operate:
- There is no need to create any indexes or tune config parameters for achieving high performance and low resource usage
- It works optimally out of the box
- Excellent integration with command-line tools
- Provides an easy-to-use query language with fast full-text search - LogsQL.
Search speed #
The search speed is comparable or faster to similar solutions when the query selects a large number of matching log lines (e.g. >=10K). It is also trivial to optimize following these docs.
Optimized for high load #
VictoriaLogs is optimized for high load: It can efficiently store and query hundreds of terabytes of logs on a single node.
- Requires less CPU, RAM, disk IO and disk space than similar solutions on the same workload
- Capacity and performance scale linearly with the available CPU and RAM.
The typical compression rate for the ingested logs is 40x-80x. This means that 100TB of ingested logs occupy only 100TB/40=2.5TB
of disk space.
Summary #
With VictoriaLogs we’ve built the logs management solution of our dreams:
A user-friendly and cost-efficient database for logs, which we’re looking to continuously enhance together with the user community the same way we’ve done with VictoriaMetrics.
Give it a go and let me know your thoughts!