Enhancing Kubernetes Stability on AWS EKS by Leveraging the "Node Swap" feature in EKS

At Halodoc, our commitment to delivering seamless healthcare experiences drives us to continuously enhance the reliability and performance of our infrastructure. As we scaled our Kubernetes workloads, we’ve observed challenges related to memory management—particularly with Out of Memory (OOM) pod terminations during traffic surges or under memory intensive workloads.

To address this, we explored a new capability in Kubernetes: Node Swap, available from Amazon EKS 1.33 onward. This initiative, implemented alongside Karpenter v1.5.2, allowed us to experiment with controlled swap configurations on Amazon Linux 2023 (AL2023) nodes, significantly reducing OOMKills and improving workload resilience.

In this post, we’ll walk through our journey of enabling Node Swap on EKS at Halodoc—covering the concept, challenges, implementation, and results.

Understanding Node Swap in Kubernetes

Node Swap is a Linux kernel feature integrated with Kubernetes, allowing nodes to use disk-based swap space when physical memory is fully utilized.

Traditionally, Kubernetes enforces strict memory limits—if a container exceeds its limit, the OOM Killer terminates it immediately. With Node Swap, the kernel can move less frequently accessed memory pages to disk, preventing abrupt pod terminations and improving stability during transient memory spikes.

Key benefits of Node Swap:

  • Memory flexibility: Extends available memory using disk-based swap.
  • Improved workload resilience: Absorbs temporary spikes without terminating the pods.
  • Configurable behavior: Fine-grained control via node-level configuration.
Kubernetes Node

The diagram illustrates how a Kubernetes Node uses Swap memory to prevent Pods from crashing when RAM is exhausted, ensuring workloads continue to run.

Challenges We Faced Before Node Swap

Before enabling swap, our production workloads were under growing memory pressure, particularly during sudden traffic spikes at peak consultation hours, promotional events, and heavy ingestion processes. These memory surges impacted several memory-critical applications, leading to operational and user-facing issues:

  • Frequent OOMKills on critical applications: Memory intensive workloads were often terminated by the Linux OOM killer when node memory was exhausted, causing temporary disruptions in workflows essential for patients, doctors, and internal operations.
  • Cascading pod instability impacting latency and reliability: Each OOMKill triggered pod restarts, reducing processing capacity and causing latency spikes or temporary failures in dependent services. Memory-critical workloads were particularly vulnerable, resulting in delays in transaction processing and consultation responses.
  • Challenges in memory provisioning and cost management: To prevent OOM events, we over-provisioned nodes and pod memory, increasing infrastructure costs. Under-provisioning, on the other hand, caused instability during unpredictable traffic surges or ingestion peaks.

Production metrics over the last six months revealed that 80 OOMKills occurred during high-traffic periods, accounting for roughly 77% of all observed OOMKills, accompanied by noticeable spikes in API latency during peak times. These challenges highlighted the need for a resilient, cost-efficient mechanism to absorb transient memory spikes without compromising performance or reliability.

Pre-Implementation Checklist for Node Swap

Before enabling Node Swap, verify the following:

  • Disk Capacity: Nodes must have enough free disk space for swap.
  • Workload Assessment: Identify latency-sensitive or high-throughput workloads.
  • Memory Profiling: Review current memory usage, pod limits, and OOM events.
  • Swap Size Planning: Define a reasonable swap size with min/max caps.
  • Kubelet Configuration: Ensure --fail-swap-on=false and --memory-swap-limit=-1 are applied.
  • Monitoring Setup: Collect metrics for swap usage, pod restarts, memory consumption, and disk I/O.
  • Gradual Rollout: Start with a small subset of nodes.
  • Backup & Recovery: Confirm node and workload backup procedures.

Implementing Node Swap on EKS

UserData in Kubernetes/EKS: userData is a script that runs automatically when a new node (EC2 instance) is launched, allowing you to configure the node—such as installing packages, setting up swap, or mounting storage—before it joins the cluster.

Dynamic Swap Creation in UserData

We designed a dynamic swap setup in the userData script so that every new node automatically comes with a swap partition based on its memory size:

Swap Script
  • Swap size is 30% of node memory, capped between 1–16 GiB.
  • Swap is persisted in /etc/fstab for node reboots.
  • vm.swappiness=30 ensures swap is used only under memory pressure, keeping most workloads in RAM.

Configuring Kubelet to Use Swap

Since Kubernetes disables swap by default, we added flags to the kubelet:

Kubelet Configurations
  • --fail-swap-on=false allows kubelet to run on swap-enabled nodes.
  • --memory-swap-limit=-1 allows containers to use swap space without restriction, letting the Linux kernel dynamically manage memory and swap usage instead of enforcing strict limits.
  • --register-with-taints ensures new nodes are correctly labeled for Karpenter scheduling.
Kubernetes Node using memory swapping

The image illustrates how a Kubernetes Node using memory swapping to offload a Pod's inactive data from RAM to Swap storage, ensuring the Pod continues to run and avoids an OOM kill

Why 30% Swap Size with Min/Max Limits?

Choosing an appropriate swap size involves balancing memory buffering and disk I/O efficiency:

  • 30% of RAM: Allocating swap to 30% of the node's RAM provides a sufficient buffer to handle transient memory spikes without significantly impacting performance. This approach is commonly recommended for modern systems to prevent excessive swapping, which can degrade performance.
  • Minimum 1 GB: ensures small nodes have usable swap.
  • Maximum 16 GB: prevents excessive swap on large nodes, which can cause high disk I/O and slow latency-sensitive workloads.
Node Size Calculated Swap (30%) Applied Swap
4 GB 1.2 GB 1 GB (min)
16 GB 4.8 GB 4.8 GB
64 GB 19.2 GB 16 GB (max)

This approach ensures predictable performance while providing a safety net for OOM prevention.

Observations

We monitored production workloads under real traffic and observed significant improvements after implementing Node Swap:

Metric Before Swap (Average) After Swap (Average) Improvement (%)
OOM Kills / Pod Restarts 24 6 75% reduction
Memory Utilization 75–80% 65% 12–15% decrease
Response Time 2,021 ms 1,971 ms ~2% faster
Average Memory Usage 581 MiB 541 MiB ~7% lower

Key takeaways:

  • Node Swap acted as a buffer, absorbing transient memory spikes and helping maintain pod stability.
  • OOM Kills dropped significantly, indicating that far fewer pods were terminated due to memory pressure.
  • Pod restarts decreased: Over the past three months, restarts fell from 24 to 6, a 75% reduction, improving overall service stability and reliability.
  • Improved Memory Utilisation: Nodes handled higher memory workloads safely without impacting performance

Lessons Learned & Considerations

Enabling Node Swap provides a safety net for memory spikes, but it comes with trade-offs. Based on our implementation at Halodoc, here’s what to keep in mind:

  • Swap is a buffer, not a fix: Node Swap mitigates transient memory spikes but does not resolve memory leaks or inefficient workloads. Proper memory requests, limits, and workload optimization are still essential.
  • Disk space requirements: Ensure nodes have sufficient free disk volume for swap. Insufficient space can lead to performance degradation or node instability.
  • Performance impact: Swap is significantly slower than RAM. Latency-sensitive workloads (e.g., databases, real-time processing) may experience slower response times under heavy swap usage.
  • Gradual rollout is key: Start with a subset of nodes before enabling swap cluster-wide to monitor effects on performance and stability.
  • Monitoring is critical: Track swap usage, pod restarts, memory utilization, and disk I/O to detect hidden issues before they impact services.
  • Trade-offs with swap size: Too little swap may not prevent OOMKills; too much can increase disk I/O and slow down workloads. Our 30% of RAM (1–16 GB) approach balanced these considerations effectively.
  • Workload-specific considerations: Stateful and high-throughput applications may behave differently under swap; test workloads individually to assess impact.
  • Observability matters: Swap can mask underlying memory issues if not properly monitored, leading to delayed detection of problematic pods or memory leaks.

Conclusion

The implementation of Node Swap at Halodoc marks a significant milestone in enhancing Kubernetes workload stability and resilience. By configuring Node Swap at 30% of node RAM with a cap of 1–16 GB, we have successfully reduced OOMKills over the past three months, minimized pod restarts from 24 to 6 (75% reduction), and maintained predictable performance across diverse node types—all while efficiently handling transient memory spikes and optimizing overall memory utilization.

These improvements help Halodoc continue delivering seamless healthcare experiences while efficiently managing cluster resources. Looking forward, we plan to further optimize node memory utilization, enhance observability, and leverage upcoming EKS and Kubernetes features to elevate the performance and reliability of our infrastructure.

References

  1. Swap memory management
  2. Kubernetes nodeswap feature

Join us

Scalability, reliability and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers at all levels and if solving hard problems with challenging requirements is your forte, please reach out to us with your resume at careers.india@halodoc.com.

About Halodoc

Halodoc is the number one all-around healthcare application in Indonesia. Our mission is to simplify and deliver quality healthcare across Indonesia, from Sabang to Merauke. Since 2016, Halodoc has been improving health literacy in Indonesia by providing user-friendly healthcare communication, education, and information (KIE). In parallel, our ecosystem has expanded to offer a range of services that facilitate convenient access to healthcare, starting with Homecare by Halodoc as a preventive care feature that allows users to conduct health tests privately and securely from the comfort of their homes; My Insurance, which allows users to access the benefits of cashless outpatient services in a more seamless way; Chat with Doctor, which allows users to consult with over 20,000 licensed physicians via chat, video or voice call; and Health Store features that allow users to purchase medicines, supplements and various health products from our network of over 4,900 trusted partner pharmacies. To deliver holistic health solutions in a fully digital way, Halodoc offers Digital Clinic services including Haloskin, a trusted dermatology care platform guided by experienced dermatologists.We are proud to be trusted by global and regional investors, including the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek, Astra, Temasek, and many more. With over USD 100 million raised to date, including our recent Series D, our team is committed to building the best personalized healthcare solutions — and we remain steadfast in our journey to simplify healthcare for all Indonesians.