Cost Optimization for Relational Database Service(RDS) in AWS
Halodoc is a health-tech company that aims to simplify Indonesia’s healthcare ecosystem. As such, we use a wide variety of technologies across our services. To provide a quick overview: our frontend is based on an Angular framework and our backend on a Java-based microservices framework. We use MySQL as our persistent store and other AWS managed services like EC2, ELB, Lambda, Cloudfront and RDS also are included as a part of our tech stack.
Amazon RDS (Relational Database Service) is a cloud computing solution from Amazon Web Services that aims to facilitate the process of setting up, deploying, and scaling a relational database on the cloud. It provides cost-efficient and resizable capacity all while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.
Halodoc backend is developed on a domain driven microservices model with each service having its own RDS. With services having their own RDS, they are siloed as single points of failure in case of any impact. However, infrastructure cost plays an important role.
One of the biggest challenges that any infrastructure team faces is to keep their costs down without impacting the performance of the business-critical applications.Like other AWS products, users pay for what they use with RDS. Therefore, it is prudent to revisit the architecture periodically to make sure that the resources are working optimally and we are making the most out of our investment.
In this post, I will be discussing some of the steps that we took at Halodoc to reduce our infrastructure costs. Please note that the steps are based on Halodoc’s environment and may not apply to all scenarios. Also, it is always advised to try things in a test environment before moving to production.
Most importantly, cost optimization is not a one-time activity. Rather, it is a continuous process and the sooner we adopt this process, more efficient our databases are going to be.
Steps that we undertook to optimize our RDS costs
Determine Instance usage:
We use CloudWatch to check DB connections to our RDS instances. Sometimes, there are instances created solely for testing and then were never used after. If there are zero connections, these instances can either be terminated after taking a snapshot or stopped or downsized. RDS provides the flexibility to rapidly restore and upscale the instance so there is no point in keeping an instance running for a workload that can be scheduled.
We can also set scheduled jobs for stopping stage and performance testing RDS instances during night or over the weekend.
Scheduled jobs can be run to check open DB connections on all RDS instances. We can set alerts to notify us of those instances that have 5 or fewer connections. This enables us to keep a constant check on unused or underutilised instances and take action accordingly.
Determine Instance Performance:
Instances which have less than 50% CPU utilisation can be downsized to lower instance classes unless they are already running with a smaller instance type.Even if the server does not display very high free memory, it can still be a good contender provided that CloudWatch monitoring is reporting a low read IOPS. Databases tend to occupy as much memory as possible and if there is no swap usage or low read IOPS, it indicates very less pressure on the instance.
If the CPU is lower but memory requirement is high, a possible optimisation is to change the M series instance to R series, which should provide the same amount of memory but half the CPU. As CPU is much costlier than memory, there could be substantial savings with this approach.Changing the instance class will result in a downtime but it can be reduced if instances are in a Multi-AZ configuration. If the instance is single AZ, it can be converted to Multi-AZ on the fly. It can then be modified to a lower instance class and once done, can be reverted to single AZ again.If any lower instance class is not available, check if you can consolidate multiple small or non critical DB within a single RDS.
Instance tuning to reduce CPU, Memory and IO utilisation:
Periodically we make sure that the queries are properly tuned. We use multiple options like slow query logs, performance insights, information/performance schema table to check the top queries and tune them.
Major benefits could be achieved from optimising the tables, creating the right indexes, removing the redundant unutilised indexes, rewriting the queries to reduce the data set and optimizing the joins.
Once this is done, we can downgrade the instance class of the RDS if the load of the same has reduced.
Disable Multi AZ when not required:
Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments, However Multi AZ enabled RDS are twice as costly when compared to the single AZ ones.
Optimize Storage cost :
RDS offers different EBS storage and we will be talking about SSDs here. Available SSD storages are GP2 and PIOPS.
With Provisioned IOPS we can specify how much IOs we need but with GP2 volumes, the IOPS are dependent on the size. For each GB of storage we get 3 IOPs.We have observed that while there is not a marked difference in the performance, PIOPS volumes are 2 to 3 times costlier than the GP2 volumes for same IOPs.For example, in the Singapore region a 1000 GB GP2 volume that can provide 3000 IOPS would cost 138 USD per month, but a PIOPS volume with 100GB size and 3000 IOPS will cost 343 USD per month.
However it may not be always possible to change GP2 in case of:
Critical business applications that require a sustained IOPS performance
- The required IOPS are more than 16000
- Required throughput is more than 250 MiB per second
Enabling Data Caching:
We have identified the frequently read data in our applications and implemented a caching strategy for the same. This resulted in lesser load in some of the RDS instances. With this exercise, we were able to downgrade the instance types of our RDS.
Maximise Reserved Instances utilisation:
We make sure all the RI's are used. In case there are unused instances, we change the instance type of any existing RDS and make sure we are using all the RI's.
Opt for Reserved Instances/Savings Plan:
Amazon RDS Reserved Instances gives us the option to reserve a DB instance for a one or three year term that gives us a significant discount for a consistently running workload. If you are sure that you are going to run an instance class for a year, then purchasing the RI for the same is a huge cost saver.
Maintaining a balance between performance and price is one of the biggest challenges in planning and building a relational database infrastructure. With the steps we had taken, we were able to reduce our infrastructure costs by 10%.
Halodoc is the number 1 all around Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke.
We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 1500+ pharmacies in 50 cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services, and to top it off we have recently launched a premium appointment service that partners with 500+ hospitals that allows patients to book a doctor appointment inside our application.
We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates foundation, Singtel, UOB Ventures, Allianz, Gojek and many more. We recently closed our Series B round and In total have raised USD$100million for our mission.
Our team work tirelessly to make sure that we create the best healthcare solution personalised for all of our patient's needs, and are continuously on a path to simplify healthcare for Indonesia.