Logging in Distributed Microservices using EFK

Jun 15, 2022

Why Logging is required for distributed microservices

An application log is an essential component of any application, regardless if it's monolithic or microservices-based, but the fundamental architecture of microservices-based applications makes logging a complicated endeavour.

Logging in a monolith is as simple as writing data to a single log file and viewing it later. In a microservices-based application, you have many different components working together. Services may span across multiple servers, even spread across geographical boundaries. This creates many log files since each microservice maintains its own set of data. When one or more services fail, the team needs to know which service experienced an issue and why. It's also difficult to decipher the complete request flow in microservices. For instance, which services have been called? And in what sequence and frequency is that service called?

So considering the pain point, Developers should structure an application's log data to simplify parsing and to only log relevant information. This structured logging helps create simple and flexible microservices logs. The structure of the data being logged is important for tracing, debugging and troubleshooting.

Why we need EFK Stack for Logging

Well if you know about Kuberenetes then you must be thinking that you can use the kubectl logs command to easily check logs for any Kuberneted pod running. But what if there are 100 pods or even more, in that case it will be very difficult. On top of this, Kibana dashboard UI can be configured as you want to continuously monitor logs in runtime, which makes it easier for engineers with no experience of running linux commands to check logs, monitor the Kubernetes cluster and applications running on it.

If you are on AWS, then you can configure Elasticsearch to archive logs on S3 bucket(which can be configured without EFK stack too), to have historical logs persisted.

If you have a large application with 100 pods running along with logs coming in from kubernetes system, docker container, etc, if you do not have a centralised log aggregation and management system, you will, sooner or later, regret big time, hence the EFK stack is a good choice.

Also, using Fluent bit we can parse logs from various different input sources, filter them to add more info. or remove unwanted info, and then store the data in Elasticsearch.

What is EFK

EFK is a suite of tools combining Elasticsearch, Fluentd and Kibana to manage logs. Fluentd will collect the logs and send them to Elasticsearch. This latter will receive the logs and save it on its database. Kibana will fetch the logs from Elasticsearch and display it on a nice web app. All three components are available as binaries or as Docker containers.

When running multiple services and applications on a Kubernetes cluster, a centralized, cluster-level work stack will assist you quickly type through and analyze the serious volume of log knowledge made by your Pods. One well-liked centralised solution is the Elasticsearch, Fluentd, and Kibana (EFK) stack.

Elasticsearch is usually deployed aboard Kibana, a robust knowledge visualization frontend, and dashboard for Elasticsearch.

Info: ELK is an alternative to EFK replacing Fluentd with Logstash.

How does EFK work

To understand the setup, here is a picture:

Here we have a Kubernetes cluster with 3 nodes, on these 3 nodes pods will be created to run various services like your Applications, and in this case the EFK stack.

Fluent bit is run as a DaemonSet, which means each node in the cluster will have one pod for Fluent bit, and it will read logs from the /var/log/containers directory where log files are created for each Kubernetes namespace.

Elasticsearch service runs in a separate pod while Kibana runs in a separate pod. They can be on the same cluster node too, depending upon the resource availability. But usually both of them demand high CPU and memory so their pods get started on different cluster nodes.

There will be some pods running your applications, which are shown as App1, App2, in the above picture.

The Fluent bit service will read logs from these Apps, and push the data in JSON document format in Elasticsearch, and from there Kibana will stream data to show in the UI.

How to Start Logging with EFK

How to Start with EFK

Because EFK components are available as docker containers, it is easy to install it on k8s. For that, we’ll need the following:

  • Kubernetes cluster (Minikube or AKS…)
  • Kubectl CLI
  • Helm CLI

1. Installing Elasticsearch using Helm

2. Installing Fluentd as DaemonSet

Fluentd should be installed on each node on the Kubernetes cluster. To achieve that, we use the DaemonSet. Fluentd development team provided a simple configuration file available here:

https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch.yaml

Fluentd should be able to send the logs to Elasticsearch. It needs to know its service name, port number and the schema. These configurations will be passed to the Fluentd pods through environment variables. Thus, in the yaml file

The value "elasticsearch-client" is the name of the Elasticsearch service that routes traffic into the client pod.

Deploy Fluentd using the command:

3. Installing Kibana using Helm

The last component to install in EFK is Kibana. Kibana is available as a Helm chart that can be found here: https://github.com/helm/charts/tree/master/stable/kibana.

The chart will deploy a single Pod, a Service and a ConfigMap. The ConfigMap gets its key values from the values.yaml file. This config will be loaded by the Kibana container running inside the Pod. This configuration is specific to Kibana, to get the Elasticsearch host or service name, for example. The default value for the Elasticsearch host is http://elasticsearch:9200. While in our example it should be http://elasticsearch-client:9200. We need to change that.

The Service that will route traffic to Kibana Pod is using type ClusterIP by default. As we want to access the dashboard easily, we’ll override the type to use LoadBalancer. That will create a public IP address.

In Helm, we can override some of the config in values.yaml using another yaml file. We’ll call it kibana-values.yaml. Lets create that file with the following content:

files:

kibana.yml:

Now, we are ready to deploy Kibana using Helm with the overridden configuration:

$ helm install kibana stable/kibana -f kibana-values.yaml

Then we check for the created service with the LoadBalancer type.

From here, we can copy the external IP address (51.138.9.156 here) and open it in a web browser. We should not forget to add the port number which is 443.

4. Sending Logs to EFK

  1. First need to write the logs in application container

service_name.yml

2.   After writing mounting the log file to the pod node
deployment.yml

3.   In FluentBit we have added the config to tail the mounted log file

4.   Tailed logs will be parsed via parser and send to elastic search

Conclusion

For distributed microservices logging, EFK is one of the most commonly used tools for log tracing. We can take advantage of the functionality where we can define our search criteria based on different keywords. Log tracing is becoming more informative and easier to search. By using a correlation id, we will get a complete request flow. We can also have an automatic alert system in place, which analyzes logs and sends notifications whenever anything goes wrong with one or more services in our application.

Join us

Scalability, reliability and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers at all levels and  if solving hard problems with challenging requirements is your forte, please reach out to us with your resumé at careers.india@halodoc.com.

About Halodoc

Halodoc is the number 1 all around Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke. We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 3500+ pharmacies in 100+ cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services, and to top it off we have recently launched a premium appointment service that partners with 500+ hospitals that allow patients to book a doctor appointment inside our application. We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek, Astra, Temasek and many more. We recently closed our Series C round and In total have raised around USD$180 million for our mission. Our team works tirelessly to make sure that we create the best healthcare solution personalised for all of our patient's needs, and are continuously on a path to simplify healthcare for Indonesia.