ELK, Elasticsearch and the Elastic Stack
Log files are full of information for optimizing and troubleshooting the development environment; they can be used to perform debugging, security analysis, predictive analysis and performance analysis. However, analyzing the logs is a tedious process that requires collecting the data, cleaning it, structuring it, analyzing it and then producing the results.
The black box
While logs are valuable for the information they hold, they can be as difficult to find, recover and parse as a black box from a plane crash. Unless there is a simple way for a developer to rapidly find the information he is looking for, they are essentially useless.
To compound the issue, as infrastructures increasingly move to the cloud, analyzing logs is quickly becoming a more difficult, time-consuming task. Not only does the number of components – loads, servers, environments and users – in the cloud make it more error prone, making analyzing the log files necessary more often, but the complexity makes isolating the issue in the log files nearly impossible.
Elasticsearch
In February 2010, a solution to the needle-in-haystack problem presented by log files was released: Elasticsearch. Elastic, the company that originally released the tool, describes it as a “distributed, RESTful, JSON-based search engine.”
Based on Lucene, it was developed in Java and released as open source under the Apache License. Its benefits make it indispensable for log analysis:
- Scalable. It was designed to be able to serve multiple users from a single instance.
- Full-text search. It searches the entire text of the entries in a database, not just metadata or parts of the original texts.
- Real-time analysis. The full text of all entries can be searched in real-time.
Essentially, Elasticsearch gave developers a way to read all the contents of a black box on demand. Elasticsearch gained popularity for log search and analysis quickly after its release, and it is now the most-used enterprise search engine. Most applications that require complex searches use Elasticsearch.
Logstash
As a result of the popularity of the open source project, the community began to develop an ecosystem of tools around Elasticsearch. The first was Logstash, which was created as a pipeline to homogenize and transmit data to the developer.
Its job is to take information in different formats from disparate sources and unify them in a centralized location.
Kibana
The second tool is Kibana, a flexible open source data plugin for Elasticsearch that provides a way for developers to view the uniform data. It gives developers the ability to create visualizations and more easily navigate the data, providing a window into why the crash actually happened, how it could have been prevented and how it can be resolved.
The Elastic Stack
Combined, the three components – Elasticsearch, Logstash and Kibana – became colloquially known as ELK. In 2015, Elasticsearch added a fourth component, Beats, to the stack. Beats is a platform for single-purpose data shippers, gathering data from machines and transporting it to Logstash for further parsing or directly into Elasticsearch.
Unfortunately, no acronym could be formed that carried the same authority as ELK. “BLEK” just would not cut it; as Elastic itself admits, “For a stack so scalable, the acronym really wasn’t.” So now, the most popular platform in the world for analyzing log files is known simply as “The Elastic Stack.”