Backend infrastructure is the backbone of every technology business. And as a growing health-tech platform, high-availability is vital. Users can interact with Halodoc via:
- Medicine delivery
- Doctor consultations
- Lab tests
- Hospital appointments
All these interactions require high infrastructure usage to keep our user interactions smooth. It is important that we ensure these systems are reliable and any API level changes only improve user experience.
In this blog we will be talking about the the methodology we use to test our systems for performance in the CI/CD pipeline using K6. Performance test scripts are run on stage or perf environment on every release cycle or/and on-demand to evaluate system behaviour under load.
- Avoid infrastructure issues on production such as sudden DB spike , high MEM / CPU usage under high throughput conditions .
- Having important business flows load-tested regularly via CI-CD helps in evaluating infra usage and pushes for optimising at regular intervals of a sprint.
- Helps in evaluating how new APIs introduced can have a system impact, and meet KPI requirements.
- Historical data of load test executions can help in the evaluation of system performance and infra usage, confidence of GTM if load testing is done in automated and regular cycles.
- Well planned up scaling of infrastructure on sale days such as Ramadan / marketing events where sudden spike of users is expected.
- Infrastructure cost to setup performance test environments.
- Usage of legacy /paid tools that are expensive to run / execute on distributed systems .
- Complex and manual infrastructure reporting mechanisms .
- Dedicated resources to evaluate and analyse perf test results.
Set up a pipeline to deploy dependent services with configuration equal to or in multiples corresponding production configuration and then run tests on them, and post test runs update back the configuration to a smaller stage config.
- Tests can run on an average of 15 -60 mins during which config will be of production, average cost incurred will be less than 10$ on an average.
- Usage of k6 instead of Apache Jmeter to run tests due to the following reasons:
k6 uses lesser memory and is much more efficient in running load on servers. Built in
Following is the detailed comparison:
- k6 can be easily integrated with Jenkins pipeline to run code on a specific k6 repo branch (https://k6.io/blog/integrating-load-testing-with-jenkins), whereas jmx scripts are standalone and we have to write complex shell or ansible scripts / containerise these scripts to run on Jenkins .
Sample k6 script :
Test the script by executing:
k6 run perftest.js
this will run a test with 1 virtual user (VU).
Once the above script executes successfully we can now execute with desired load from the EC2 instance. For example to run with 100 VU for 60 secs, execute this:
k6 run --vus 100 --duration 60s perftest.js --out statsd
However in real life scenario, we give our server ramp up time. We can include load pattern in the code itself. For example, to give 30s ramp up time and 30s cooling time:
Furthermore we can create more complex load pattern using non-standard executor.
- Cover business flows rather than individual APIs to evaluate user experience in terms of latency/throughput.
- Integrate with Influx DB and setup individual Grafana dashboards for performance analysis
$ sudo apt install influxdb
Run k6 tests and upload to Influx
$ k6 run --out influxdb=http://localhost:8086/myk6db script.js
- Verify infra side KPIs like heap, CPU, and memory usage on NewRelic / any server monitoring tools either manually or by implementing in house integration such as StatsD .
- Furthermore k6 has the capability of controlling a live test when you are not sure how the systems would react under load at run time. For example you may want to pause / increase the number of Vusers based on system performance this can be easily done using the following command :
To get status :
To pause or resume Vusers :
k6 pause / resume
To scale up or scale down users:
k6 scale --vus10
Additional support available :
StatsD integration with NewRelic
K6 has support for StatsD integration wherein we can monitor load real time on New Relic with custom dashboards :
Integration with Amazon CloudWatch :
In this blog, we gave an idea on how to efficiently design and implement a continuous performance testing architecture with k6 using limited resources and having powerful reporting mechanisms in place to ensure highest availability of our backend services every sprint cycle.
Scalability, reliability and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers/architects and if solving hard problems with challenging requirements is your forte, please reach out to us with your resumé at firstname.lastname@example.org.
Halodoc is the number 1 all around Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke. We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 3500+ pharmacies in 100+ cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services, and to top it off we have recently launched a premium appointment service that partners with 500+ hospitals that allow patients to book a doctor appointment inside our application. We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek and many more. We recently closed our Series B round and In total have raised USD$100million for our mission. Our team works tirelessly to make sure that we create the best healthcare solution personalised for all of our patient's needs, and are continuously on a path to simplify healthcare for Indonesia.