In the dynamic sphere of digital healthcare, web services and APIs are integral conduits connecting patients, healthcare providers, and a myriad of health tech applications. At Halodoc, we aspire to forge a seamless healthcare experience. Therefore, our services and APIs must efficiently manage significant traffic volumes to accommodate our expanding user base and consistently deliver top-tier service quality. One pivotal technique we employ for traffic management is rate limiting, which is one of the crucial facets for upholding a resilient and dependable infrastructure.
Rate limiting is paramount in shielding resources from excessive traffic and potential misuse. Furthermore, this ensures our services and APIs deliver a seamless, secure, and reliable user experience. In this blog, we will expound on implementing a Sliding Window Rate Limiter using Redis and Lua and provide essential insights on rate limiting.
A brief overview of rate-limiting
Rate limiting is a technique used to control the rate at which users, applications, or services can make requests to a system, network, or API. It helps manage resource usage, maintain server stability, and protect against abuse or denial-of-service (DoS) attacks. Rate limiting also ensures fair use and distribution of resources among multiple users or clients. Common methods for implementing rate limiting include token bucket, leaky bucket, fixed window, sliding window, etc. In distributed systems, rate limiting is particularly important to maintain performance, stability, security, and fairness while controlling costs and ensuring compliance with relevant regulations.
Sliding Window Algorithm as a rate-limiting strategy
The Sliding Window Algorithm is a rate-limiting technique that limits the number of requests a user can make within a given time frame while providing a smoother distribution of requests. It does this by continuously tracking requests and maintaining a "sliding window" that moves forward as time progresses, ensuring that request counts are always up-to-date.
Consider a sliding window with a window size of 1 min and a maximum number of requests per window as 3. This window "slides" over time as older requests go out of scope and newer ones come in.
- When request R3 comes in, it will be allowed since only two requests, R1 and R2, were previously present in Window 1.
- When a new request R4 comes in, the system checks if it can fit within the window. In this case, R1 is out of scope, and R4 fits within Window 2, so it can be handled.
- When a new request R6 comes in, the window is already full with three requests, R3, R4 and R5. Thus, R6 will be rejected.
The Preeminence of the Sliding Window Approach
The sliding window rate limiter provides better accuracy and fairness than other rate-limiting strategies. Furthermore, it ensures that the request rate is evenly distributed over time, preventing clients from sending a burst of requests at the beginning or end of the window, which could overwhelm the system. Below are the key advantages that set the Sliding Window algorithm apart from other rate-limiting methods.
- The Sliding Window algorithm prevents bursts of traffic at the window boundaries, a problem that the Fixed Window algorithm faces. It also mitigates the risk of an unexpected surge of requests that could overwhelm the system.
- The Sliding Window algorithm enforces a stricter limit over time than the Token Bucket algorithm, which allows for bursts until its token supply is exhausted. It keeps a more consistent flow of traffic, avoiding potential system overloads.
- Unlike the Leaky Bucket algorithm, the Sliding Window algorithm allows for some level of burstiness, making it more flexible for scenarios where short bursts of traffic are acceptable and expected. In contrast, the Leaky Bucket algorithm processes requests at a fixed rate regardless of the network's current capacity, potentially wasting bandwidth.
Addressing Memory Considerations
While the sliding window algorithm confers many advantages, it brings certain memory considerations into play. This method requires sufficient memory space to retain the timestamp of each request within the current window. As a result, the memory demands can escalate with increased request rates, extended window durations, or a large client base, potentially affecting system performance and cost.
However, the Sliding Window Counter, a variation of the sliding window algorithm, effectively addresses these concerns. This variant eschews storing each request's timestamp in favour of tracking the count of requests within the window, thus diminishing memory requirements. It should be noted that this method operates under the assumption of evenly distributed requests within each time window, which can introduce a trade-off in terms of accuracy.
Ultimately, the choice of the algorithm should be based on the specific needs and the nature of traffic a system handles.
Rate Limiting with Redis
Redis is an in-memory data structure store used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Therefore, it is an excellent choice for implementing rate limiting, owing to its numerous benefits. Below are some compelling reasons to consider utilizing Redis for rate limiting.
- Performance: Redis is an in-memory data structure store, which means it can handle many operations per second with very low latency. This makes it ideal for implementing rate limiting, where fast response times are crucial.
- Scalability: Redis can be easily scaled horizontally using techniques like sharding or clustering. This allows the distribution of rate limiting across multiple instances, ensuring that the system remains responsive and performant even as the number of clients increases.
- Flexibility: Redis supports various data structures, such as strings, lists, sets, and sorted sets, which can be used to implement different rate-limiting algorithms, like a sliding window, fixed window, or token bucket. This flexibility enables the selection of the most suitable rate-limiting strategy for different use cases.
- Expiration: Redis has built-in support for setting expiration times on keys, which can be used to clean up old rate-limiting data automatically. This feature simplifies the management of rate-limiting counters and reduces memory usage.
- Persistence: Although Redis is an in-memory data store, it can be configured to persist data to disk. This can be particularly useful for rate-limiting scenarios where maintaining historical data is essential.
- Community and Support: Redis is a popular open-source project with a large and active community. It is widely adopted and supported by various programming languages and frameworks, making integrating rate-limiting functionality into existing applications easy.
Leveraging Lua to enhance Redis-based Rate Limiting
Lua is a lightweight, high-level, multi-paradigm programming language designed primarily for embedded use in applications. It is an open-source language, easy to learn, and widely used in various fields such as gaming, web development, and system scripting. In addition, Lua is often praised for its simplicity, flexibility, and fast performance.
When Lua scripting is used with Redis, its capabilities can be further enhanced, and several advantages can be enjoyed, such as,
- Atomicity: Redis commands are atomic, which means they are executed in one step without any other operations coming in between. This guarantees data consistency, even in a multi-client scenario. Using Lua scripts, a block of commands can be executed atomically, which is very important for rate limiting where the current count needs to be read, incremented, and sometimes reset.
- Performance: Lua scripts run on the Redis server, minimizing network latency and reducing the need for multiple round-trips between the client and server. This leads to faster execution of complex operations and efficient enforcement of rate limits.
- Simplicity: Lua script can simplify the code required for rate limiting. For example, a Lua script can combine multiple Redis commands into a single script, reducing network traffic and improving performance.
- Extensibility: Lua enables the customization of Redis functionality by implementing specific commands or logic, such as those required for sophisticated rate-limiting algorithms.
- Code Reusability: Lua scripts can be easily reused across different Redis instances and clients, promoting modularity and simplifying maintenance.
- Simplification of Client Code: Using Lua scripts in Redis can offload some processing from the client application to the Redis server, leading to simpler client code and improved performance.
Implementing the Sliding Window Rate Limiter using Redis and Lua
Some important things to consider when implementing a Sliding Window Rate Limiter:
- Time Window: The time window is one of the most critical parameters in a rate limiter. It defines how long the sliding window lasts. The choice of window size will depend on the specific use case.
- Rate Limit: The rate limit is the maximum number of requests allowed within the time window. This should be set according to the system's capacity and the desired level of service.
- Key: When creating a rate limiter using Redis, the key should ideally be a unique identifier associated with each user or client. Here are a few examples of what the key could be:
i. IP Address: If the rate limiter is intended to limit requests per IP address, the IP address of the client could be used as the key.
ii. User ID: If the rate limiter is designed to limit actions per user, the unique user ID could be used as the key.
iii. Session ID: If the rate limiter is intended to limit actions per session, the session ID could be used as the key.
iv. API Key: If the API uses API keys, requests per API key could be limited with API Key as the key.
v. Endpoint: The endpoint can be used as the key to limit the number of requests to a specific endpoint.
Here's a step-by-step guide for implementing a Sliding window Rate Limiter in Lua using Redis commands:
- Get the key from the client-side and map it to a sorted set in Redis, which will be used by the server to validate.
- Check the current time and eliminate any requests in the sorted set that are outside the defined time window.
- Determine the cardinality (number of elements) of the sorted set.
- If the cardinality is less than the predetermined limit,
i. Add the current time in microseconds as a new member to the sorted set with the current time in seconds as the score.
ii. Set the expiration time of the sorted set to the length of the defined time window.
iii. Return a value of 0, indicating success.
- If the cardinality equals or exceeds the limit, return a value of 1, indicating that the limit has been reached.
The following Lua script demonstrates how to put the above steps into practice:
However, the code can be customized and adapted according to a system's specific requirements and constraints.
Invoking the Lua Script
- Lua scripts can be executed on the server-side using the EVAL command. However, using EVAL has some drawbacks. Whenever EVAL is called, the script's source code is also included with the request. Repeatedly calling EVAL to execute the same set of parameterized scripts wastes network bandwidth and has some overheads in Redis.
- To alleviate these issues, Redis provides a caching mechanism for scripts to save network and computing resources. A script is loaded to the server's cache by calling the SCRIPT LOAD command and providing its source code. The server doesn't execute the script but instead compiles and loads it to the server's cache. Once loaded, the cached script can be executed with the SHA1 digest returned from the server using the EVALSHA command.
Here's an illustration of the process of loading and then executing a cached script with mykey as the key, max_requests as 2 and window as 10s:
$ redis-cli SCRIPT LOAD "$(cat path/sliding_window.lua)"
$ redis-cli EVALSHA 38f1a2fcc2c3e81f596f7ebbada97c4f8d89b223 1 mykey 2 10
The script returns 0, which states that the request was allowed.
Error handling while executing the Lua Script
The Redis script cache is always volatile. It isn't considered a part of the database or persisted. The cache may be cleared when the server restarts, during fail-over when a replica assumes the master role, or explicitly by SCRIPT FLUSH. That means that cached scripts are ephemeral, and the cache's contents can be lost anytime.
$ redis-cli EVALSHA 38f1a2fcc2c3e81f596f7ebbada97c4f8d89b223 1 mykey 2 10
(error) NOSCRIPT No matching script. Please use EVAL.
When the EVALSHA is called, the server returns an error if the script's SHA1 digest is not in the cache. In this case, the application should load it with SCRIPT LOAD and then call EVALSHA once more to run the cached script by its SHA1 sum. Some of Redis' clients may provide utility APIs for doing this automatically.
Beyond the Limit
The previous sections explored the sliding window rate limiting concept, its benefits, and its implementation. However, an important question remains - What happens when the rate limit is exceeded? This section probes into the various strategies that rate limiters can employ when the rate limit is breached.
- Throttling: In some implementations, once the limit is reached, further requests are delayed until some of the previous requests have aged out of the time window. This effectively slows down the rate of incoming requests.
- Queueing: Some implementations might put excess requests into a queue to be processed later. Once the window slides far enough for some requests to fall out, queued requests can be processed in their place. This can help in getting smooth spikes in request rate, but the queue could indefinitely grow if there are consistently more requests than the limit allows.
- Dropping: If the rate limit has been exceeded, the rate limiter might drop incoming requests, effectively ignoring them. This is a simple and effective way to ensure the limit is not exceeded, but it means some requests will not be processed at all.
- Return an Error: If a request cannot be processed due to rate limiting, the system will often respond with an error message. For HTTP-based systems, this is a 429 (Too Many Requests) status code. This informs the client that they have exceeded their rate limit and should slow down their requests.
While each of these strategies can be effective on its own, in many cases, a combination of these techniques is employed to handle excess requests. By blending different methods, rate limiters can more effectively balance system load, user fairness, and overall performance.
Even the most carefully designed rate limiter is only as good as its implementation, and thorough testing is crucial to ensure its functionality. For evaluating the performance, several factors should be considered, and various methods can be employed as follows:
- Benchmarking and Stress Testing: Run a series of tests that simulate various scenarios, such as normal load, peak load, and extreme load. Measure how the rate limiter performs under these conditions. Tools like Apache JMeter, Gatling, or Locust can be used for these tests.
- Throughput: Measure the maximum number of requests the system can handle per unit of time when the rate limiter is enabled versus when it's disabled. A well-designed rate limiter should introduce minimal overhead to the system.
- Latency: Measure the time it takes for a request to be processed under different load conditions. The latency should remain relatively stable even as the load increases.
- Scalability: Test how well the rate limiter scales as the number of clients or requests increases. A scalable rate limiter should maintain its performance as the system grows.
Ensuring the performance and functionality of a sliding window rate limiter is critical for maintaining a robust, responsive, and secure service. Key areas to consider when setting up a monitoring system include,
- Rate Limit Utilization: Each user or client's requests should be tracked. Consistent hitting of the rate limit by a user might signal the need for limit adjustment or indicate abusive or unintended usage. Monitoring the overall rate limit utilization across all users can provide insights into system usage patterns and help identify potential bottlenecks.
- Rate Limit Exceedances: The number of times the rate limit is exceeded should be monitored. High numbers here could indicate a problem, such as a user or a bot trying to abuse the system or a misconfiguration in the rate limit settings.
- Redis Performance Metrics: Given the crucial role of Redis in this setup, its performance needs to be closely watched. Metrics like memory usage, CPU usage, network bandwidth, and latency should be monitored. A sudden change in these metrics could indicate a problem requiring immediate attention.
- Lua Script Performance: The execution time of the Lua scripts should be measured. If script execution begins to take longer, it may impact service performance, especially under high load.
- Error Rates: The rate of errors related to the rate limiter should be monitored. This could include errors in the Lua script, communication with Redis, or any other related parts of the infrastructure.
- Availability: The rate-limiting service should be highly available. Any downtime can have severe consequences, especially if used in critical parts of the system. The uptime of the Redis server and the service that runs the Lua scripts should be monitored.
- Alerting: An alerting system should be set up based on these metrics. For instance, if Redis' memory usage goes beyond a particular threshold, it could trigger an alert. Similarly, if the rate limit exceedances go beyond a specific number, an alert should be initiated.
- Dashboard: A comprehensive dashboard that provides an overview of all the crucial metrics related to the rate limiter should be created. This allows for a quick understanding of the system's health and performance.
Effective monitoring of the rate limiter ensures its continued function as expected, even as the service scales, contributing to a healthy, secure, and user-friendly service. For comprehensive sliding window rate limiter monitoring, consider using tools such as Redis Monitor for Redis-specific metrics, Prometheus or New Relic for general monitoring and alerting, and Grafana for data visualization and dashboarding.
Halodoc's Use Cases
While numerous methods exist to implement rate limiting, the sliding window algorithm, especially when used in tandem with Redis and Lua, strikes an outstanding balance between ease of use, performance, and precision. Let's examine the challenges we faced at Halodoc and how this rate limiter provided practical solutions.
Challenge1: Unwanted Traffic Spikes and Resource Exhaustion
In Halodoc's digital ecosystem, we faced a significant challenge with our Android and iOS applications and Web Services. Automated digital entities, or 'bots', generated excessive traffic to our APIs, explicitly targeting the OTP APIs. This led to a surge in OTP requests, rapid depletion of SMS balance, escalating costs, and increased vulnerability to attackers.
To counter this, we implemented a Sliding Window Rate Limiter using Redisson (Redis Java client) and Lua for IP-Based OTP Rate Limiting. This approach restricts the volume of OTPs solicited from a single IP address within a defined timeframe. If this limit is transgressed, requests from the offending IP address are blocked for a set duration. Through the strategic deployment of this system, we have substantially reduced the misuse of the OTP system, effectively curtailed the surge in requests, fortified our applications against potential cyber threats, and also gained cost efficiency.
Challenge2: Exceeding Rate Limits of Third-Party APIs
Our applications rely on various third-party APIs for pivotal functionalities. However, these APIs impose stringent rate caps on requests within a defined timeframe. Breaching these thresholds could instigate temporary or, in severe cases, permanent service disruptions, impacting the user experience and undermining our applications' reliability.
We've used the same Sliding Window Rate Limiter to tackle this challenge. This mechanism regulates the frequency of requests our applications dispatch to third-party APIs, ensuring compliance with the established rate constraints. Consequently, we circumvent potential disruptions and sustain the unimpeded operation of our services.
That being said, it's imperative to note that, like any system, this rate limiter requires fine-tuning and continuous performance monitoring to ensure it aligns with an application's unique demands. At present, we are using New Relic for vigilant monitoring and alert generation about the performance of the rate limiter, integrated seamlessly with Slack for optimal communication. Additionally, the EFK Stack, comprising Elasticsearch, Fluentd, and Kibana, plays a significant role in managing and analyzing our request data, reinforcing the robustness of our system.
As we progress, we'll be using this as one of the rate-limiting strategies for additional applications, with necessary adjustments as needed.
In wrapping up, rate-limiting functions as an indispensable instrument in ensuring systems' steadfastness, safeguarding, and uninterrupted availability, with particular emphasis on the domain of Web Services and APIs. Integrating a Sliding Window Rate Limiter using Redis and Lua provides a highly robust and efficient approach to regulating the request traffic a system processes over a specific period. This technique gives granular control over request rates and its capability to distribute request handling evenly, negating system overload risk due to abrupt traffic surges. Tapping into the power of Redis and Lua for this use case amalgamates the unique strengths of these technologies, paving the way for resilient, scalable, and efficient systems.
We trust that this blog has shed light on the nuances of implementing a bespoke Rate Limiter and has empowered you with the knowledge to create one yourself. We urge you to dig deeper, tinker, and tailor the code to meet your requirements. As you embark on this journey, remember that the best learning comes from doing. Happy coding!
Scalability, reliability and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers at all levels, and if solving complex problems with challenging requirements is your forte, please reach out to us with your resumé at email@example.com.
Halodoc is the number 1 Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke. We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 3500+ pharmacies in 100+ cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services. To top it off, we have recently launched a premium appointment service that partners with 500+ hospitals that allow patients to book a doctor appointment with our application. We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek, Astra, Temasek and many more. We recently closed our Series C round and, in total, have raised around USD 180 million for our mission. Our team works tirelessly to ensure we create the best healthcare solution personalized for all our patients' needs. We are continuously on a path to simplify healthcare in Indonesia.