Building a Cost-Efficient, Multi-Environment On-Demand Service Architecture

Service Architecture Dec 6, 2024

In today's fast-paced digital world, businesses constantly seek ways to optimize resources and reduce operational costs without compromising service availability and performance. As a software engineer, it's crucial to identify areas where infrastructure and resource utilization can be streamlined, especially in cloud-based environments where costs can quickly escalate with over-provisioning.

In the past year, we encountered a situation where some of our services only required high resource consumption for a brief window of time each day, while the rest of the time they could run on minimal resources. However, due to its critical API integration with the front end, the team was forced to maintain a larger infrastructure to handle sporadic spikes, leading to unnecessary cost overruns.

This presented a perfect opportunity to rethink our approach. Instead of maintaining over-provisioned resources for 24-hour availability, we introduced an on-demand environment service architecture. We started utilizing a lightweight, always-available environment to handle the day-to-day API requirements and dynamically spawned a better-provisioned environment only for more resource-intensive use cases. This approach ensures that performance needs are met during peak times and significantly reduces costs during off-peak hours.

In this blog, we’ll dive deeper into the problem, our thought process behind the solution, and how implementing an on-demand service model can drive both operational efficiency and cost savings.

Problem Statement

In many applications, it’s common to have services with varying resource requirements throughout the day. In our case, we identified a service with a workload that caused significant spikes in resource consumption—reaching up to 90% CPU and memory utilization during peak periods. These spikes left little room for smaller processes, and any additional processes could fully exhaust the available resources. The high resource utilization was driven by resource-intensive operations such as validating large datasets and performing complex computations. Outside of these periods, the service operated with minimal resource usage, primarily handling lightweight API requests or remaining idle for most of the day.

Despite the low demand during off-peak hours, this service had to be available around the clock to support a critical API that interacted with the front end. Given the necessity of 24-hour availability, we initially allocated resources at a level sufficient to handle the peak demand. This approach led to several issues:

  • Over-provisioned Infrastructure: Resources remained underutilized for most of the day, creating inefficiency.
  • Increased Operational Costs: Running large infrastructure continuously resulted in unnecessary costs.
  • Lack of Process Isolation: In the previous setup, all tasks-whether low or high on resource demand were handled within the same environment. This lack of isolation meant that resource-heavy processes would often interfere with lighter, continuous tasks, leading to degraded performance during peak periods. Without proper separation, the high-demand tasks would exhaust system resources, causing slower response times for low-demand API requests.

This situation presented a clear opportunity for architectural improvement. By decoupling the environment based on specific use cases, we could design a solution that maintains 24/7 availability while efficiently allocating resources. This approach would enable the service to run at a reduced baseline capacity during low-demand periods and scale up only when high-demand operations are triggered. This would optimize resource utilization, reduce costs, and ensure better isolation and performance for different types of processes.

Solution: Multi-Environment On-Demand Service Architecture

To address our need for efficient resource utilization, we developed an on-demand environment service architecture inspired by the blue-green deployment model. In this setup, rather than just alternating between two environments (blue and green), we extended the concept to support multiple dynamic environments, each with specific resources optimized for various use cases. This gave us the flexibility to deploy any environment “color” based on the service demand and workload requirements, allowing us to allocate resources dynamically as needed.

Here’s how our solution was structured:

  1. Primary (Always-On) Environment: We created a lightweight environment designed for continuous operation, managing basic, low-resource API interactions while maintaining 24/7 availability. Optimized for minimal resource usage, this environment efficiently supports daily operations at a much lower cost. In our implementation, the primary environment was configured with approximately 34% less memory allocation and 50% fewer pods than the on-demand environments, achieving an ideal balance of cost efficiency and reliable functionality.
  2. Dynamic High-Resource Environment (On-Demand): For the brief window of time that required intense processing, we set up a separate environment with high-resource allocation. Unlike traditional blue-green models, we could instantiate additional “colored” environments—e.g., red, yellow, etc.—to handle specific workloads dynamically. These high-resource environments would be spun up on demand, handle their tasks, and then be torn down once the workload was complete, ensuring resources were only utilized as needed.
  3. Multi-Color Environment Management: Instead of just blue and green, we created a pool of environments with different configurations and resource specifications. Each environment was optimized for particular workloads, giving us the flexibility to allocate resources that matched the demand precisely. For example:
    - Red Environment: For CPU-intensive processing
    - Yellow Environment: For memory-intensive tasks
    - Black Environment: For network-heavy processes
  4. Automated Environment Triggering and Teardown: Each environment could be triggered based on predefined schedules or specific API requests, ensuring resources were provisioned on demand. Once the high-resource task was complete, the environment would automatically shut down, freeing up resources and minimizing costs.

Previously, we considered using AWS services as an options such as Lambda, Batch, Step Functions, and On-Demand EC2. While these solutions offer reliability and managed infrastructure, they come with certain limitations. AWS Lambda is constrained by short execution times, making it unsuitable for long-running processes. AWS Batch handles resource-intensive jobs effectively but is primarily geared toward batch processing and lacks real-time flexibility. AWS Step Functions are powerful for orchestrating workflows but require integration with other services for execution, adding complexity. On-Demand EC2 provides control over resources but involves higher management overhead and less efficient scaling. In contrast, our on-demand architecture excels with dynamic resource allocation, zero-pod mode for cost savings, and seamless scalability, making it particularly advantageous for long-running, resource-heavy operations.

Architectural Overview

The diagram illustrates how clients, API services, Kafka, ArgoCD, Jenkins, and different environments interact to fulfill the on-demand service requirements. Here’s a step-by-step breakdown of how this architecture operates:

1. Client Interaction (Steps 1-2)

  • Client Requests: Clients make requests to the main API Service housed in the Main Environment (Blue Environment), which is available 24/7 to handle lightweight API calls.
  • On-Demand Environment Manager: The API Service communicates with the On-Demand Environment Manager to enable or disable the on-demand environment

2. Kafka Integration for Task Separation (Steps 4-5)

  • Environment-Specific Producers: Based on the type of high-resource task required, the On-Demand Environment Manager routes messages to specific Kafka topics (e.g., Yellow, Red, or Black) through environment-specific Producers. Each producer sends messages to the corresponding Kafka Topic (Yellow, Red, or Black), which allows tasks to be decoupled and directed to the correct environment.

3. Environment Activation (Steps 6-7)

  • Kafka Consumers in Each Environment: Each environment has a Consumer listening to its specific Kafka topic. When the environment is activated the respective environment (Yellow, Red, or Black) is also activated inside the environment, and the consumer will process the incoming message.
  • Service Execution: Once an environment is activated, its Service (e.g., Yellow Environment Service, Red Environment Service) processes the task as directed by the Kafka message.

4. Task Completion and Process Finalization (Steps 8-10)

Process Completion Producer: After the environment completes the task, it generates a Process Completion Message and sends it to the Process Completion Topic in Kafka. This message signifies that the task has been successfully processed and that the environment can be deactivated.

Each environment (Yellow, Red, Black) has its Process Completion Producer to ensure that completion signals are specific to the environment that performed the task.

  • Process Completion Consumer: In the Main Environment, a Process Completion Consumer listens to the Process Completion Topic. When it receives a completion message, it informs the On-Demand Environment Manager that the task is finished.

5. Automated Environment Deactivation (Steps 9-12)

  • Environment Teardown: Once notified, the On-Demand Environment Manager triggers Jenkins to execute deactivation scripts or pipelines.
  • ArgoCD Integration: Jenkins works with ArgoCD to deactivate the specific environment that completed the task, ensuring resources are released immediately after processing.

This process allows each environment to be deployed only when needed, thus optimizing resource usage and minimizing costs.

Implementation

Implementing the on-demand service architecture required careful orchestration of multiple components to ensure seamless operation, scalability, and resource efficiency. The process involved creating an infrastructure where different environments could be deployed, managed, and terminated based on demand, with each environment aware of its purpose and capable of executing its designated workload. Below is a detailed breakdown of how the solution was implemented:

1. Implementing ArgoCD's App-of-Apps Template for Multi-Environment Deployments

Implementing an ArgoCD App-of-Apps template provides an efficient way for teams to manage and deploy multiple environments through a single build deployment pipeline. This strategy is particularly useful for ensuring consistency across various environments (e.g., development, staging, production) while simplifying deployment management by avoiding separate pipelines for each environment.

To implement this, we create individual helm chart files for each environment, detailing specific resource configurations and unique environment variables. All environments share a base set of environment variables defined in a common YAML file. Each environment can override these shared variables using its specific helm chart file, allowing for customized resource specifications and settings while maintaining a unified and consistent deployment framework. These are the examples of multi-environment helm-chart setup:

ServiceName:blue-environment

image:
  repository: link/to/repository
  fluentRepository: link/to/repository
  pullPolicy: Always

overrideEnvConfig: |-
  export ENVIRONMENT_NAME='blue-environment'
  export IS_ONDEMAND_ENVIRONMENT=false
  
autoscale:
  enabled: false
  minReplicas: 1
  maxReplicas: 1
  targetCPUUtilizationPercentage: 90

resources:
  limits:
    cpu: 700m
    memory: 760Mi
  requests:
    cpu: 500m
    memory: 560Mi
Blue Environment (Lightweight and Always On) Helm Chart Configurationc
ServiceName:red-environment

image:
  repository: link/to/repository
  fluentRepository: link/to/repository
  pullPolicy: Always

overrideEnvConfig: |-
  export ENVIRONMENT_NAME='red-environment'
  export IS_ONDEMAND_ENVIRONMENT=true
  
autoscale:
  enabled: false
  minReplicas: 1
  maxReplicas: 4
  targetCPUUtilizationPercentage: 90

resources:
  limits:
    cpu: 3000m
    memory: 760Mi
  requests:
    cpu: 2000m
    memory: 560Mi
Red Environment (CPU Oriented) Helm Chart Configuration

2. Kafka Configuration Across The Environment

Kafka plays a crucial role in connecting all the environments within our architecture. Since we use the same codebase across different environments, we distinguish Kafka configurations by environment (e.g., Red Environment, Black Environment) to ensure that only the relevant consumers are active where needed. At the application level, we specify which consumers should be enabled or disabled based on the environment, ensuring efficient resource usage and process isolation.

kafkaConfigurationMap:
  BLUE_ENVIRONMENT:
    producer:
      red_event: "red_topic"
      yellow_event: "yellow_topic"
      black_event: "black_topic"
    consumer:
    - eventName: "process_completion_event"
      topics:
        - "process_completion_topic"
      group: "process_completion_group"
  RED_ENVIRONMENT:
    producer:
      process_completion_event: "process_completion_topic"
    consumer:
      - eventName: "red_event"
        topics:
          - "red_topic"
        group: "red_group"
  YELLOW_ENVIRONMENT:
    producer:
      process_completion_event: "process_completion_topic"
    consumer:
      - eventName: "yellow_event"
        topics:
          - "yellow_topic"
        group: "yellow_group"
  BLACK_ENVIRONMENT:
    producer:
      process_completion_event: "process_completion_topic"
    consumer:
      - eventName: "black_event"
        topics:
          - "black_topic"
        group: "black_group"
Kafka Configuration
private void registerKafka(final ConfigClient configClient) {
        Environment currentEnvironment = Environment.getEnvironment();
        KafkaConfiguration kafkaConfiguration = configClient.getKafkaConfigurationMap(currentEnvironment);

        KafkaProducerClient kafkaProducerClient1 = new KafkaProducerClient(kafkaConfiguration);
        KafkaConsumerRegistry.registerConsumers(kafkaConfiguration.getConsumers());
    }
Kafka Init

Use Cases for the On-Demand Environment Architecture

The on-demand service architecture was developed to address specific workloads that presented challenges in our original infrastructure. By isolating environments based on their workload demands, we aimed to optimize resource use, reduce costs, and maintain consistent performance. Here are the main use cases that benefited from this approach:

1. Long-Running Processes (More Than 30 Minutes)

These tasks require extended runtimes and significant resources, making on-demand environments ideal for their execution. This prevents resource contention and ensures system stability. Key examples include:

  • Batch Data Processing: At Halodoc, we use on-demand environments for reconciling order data. This process involves validating large datasets and transforming them into different formats for downstream operations. Running this task in a dedicated environment ensures it is completed efficiently without affecting other ongoing services.
  • Large-Scale Data Aggregation: After reconciling orders, we aggregate the validated data to create detailed reports for the Finance Team. These reports are used to prepare disbursal data for payments to pharmacies and doctors. The process requires significant computational power due to the data volume and complexity. Executing this task in an on-demand environment allows us to allocate resources only as needed, avoiding interference with other services and controlling costs.

Using on-demand environments for long-running processes ensures that these tasks are completed without straining system resources or impacting lighter, continuous tasks.

2. Less Frequently Running Operations

These tasks are scheduled or triggered at specific intervals, rather than running continuously. Examples include:

  • Scheduled Financial Reports: We run weekly data aggregation tasks to compile financial reports before exporting the data to our accounting journal systems. These processes are resource-intensive but infrequent, so running them in a dedicated on-demand environment minimizes the need for permanent high-capacity infrastructure.
  • Periodic Database Maintenance: Tasks such as purging outdated records, reindexing, and archiving old data are resource-heavy but typically performed during off-peak times to avoid impacting other operations.
  • Weekly Data Backups: Creating full backups of large databases or file systems is essential for disaster recovery but demands significant I/O and processing power. Using an on-demand environment ensures that these operations do not disrupt continuous services.

Running these infrequent operations in temporary environments prevents wasteful resource allocation during idle times and ensures resources are used efficiently only when needed.

3. Resource-Intensive Workloads

Tasks that require substantial computational resources are best managed with on-demand environments that can scale dynamically. At Halodoc, this approach is applied to:

  • Reconciliation Service: This service processes large volumes of order data for validation and reconciliation. Due to the complexity and size of the data, significant CPU and memory resources are required. Using an on-demand environment ensures the necessary resources are available when the task runs, maintaining efficient processing and preventing impact on other critical services.
  • Disbursal Service: After order reconciliation, we generate reports that help the Finance Team prepare disbursal data for payments to pharmacies and doctors. This process, being resource-intensive, benefits from execution in an on-demand environment, which allows for flexible scaling and prevents delays or resource conflicts.

Deploying resource-intensive workloads in on-demand environments enables these critical tasks to be completed efficiently without tying up resources that could be used for other continuous operations. This approach enhances system performance and reduces the need for permanent high-capacity infrastructure.

Benefits and Results

Implementing this on-demand, multi-environment architecture provided several key advantages:

  • Cost Savings: By spinning up environments only as needed, we minimized the costs associated with idle resources. This resulted in an overall cost reduction of 32.22% for one of our reconciliation services.
  • Reduced Resource Consumption in the Main Environment: With the new architecture, we reduced the pod count in the main environment by 50% and decreased memory usage by 34.22%. This reduction in resource consumption significantly lowered operational costs and allowed for better allocation of resources to critical tasks.
  • Optimized Performance through Process Isolation: Separating high-demand tasks from the main environment allowed us to prevent resource-intensive processes from interfering with low-demand API requests. This isolation resulted in smoother and more consistent API performance, eliminating slowdowns during peak processing periods. Since deploying the new architecture, we have not encountered any pod restarts due to resource constraints or alerts for high resource utilization, even with reduced overall resource allocation. In contrast, under the previous setup, it was common to experience at least one pod restart during high-resource processes due to resource exhaustion.
  • Scalability: Using Kafka, we could scale each environment based on workload, creating an efficient, scalable system that adjusts to demand. Each environment operates independently, allowing us to handle spikes in demand without affecting continuous API services.
  • Automation: Integrating Jenkins and ArgoCD allowed us to automate the entire environment lifecycle, minimizing operational overhead and ensuring environments are only active when necessary.
  • Zero Pod Mode: A key advantage of this on-demand approach is its application at the service level, enabling services to run in a "zero pod" mode. In this state, the service is completely shut down, consuming no resources until a specific trigger, such as a scheduler or incoming request, initiates it. This capability allows services to be spawned dynamically on demand, ensuring they only consume resources when active. This feature enhances resource efficiency, reduces operational costs, and allows the system to respond flexibly to workload fluctuations while maintaining performance.

Implementing this architecture allowed us to not only achieve our initial goals of cost efficiency and improved resource management but also enhance the overall performance and scalability of our system. The combination of reduced pod count, lower memory usage, and effective process isolation created a streamlined, cost-effective, and high-performance solution for our workloads.

Conclusion

By adopting a flexible, on-demand multi-environment architecture, we effectively balanced continuous service availability with efficient resource management. This architecture allows us to maintain a lean, always-on environment for standard API usage while dynamically spinning up and tearing down high-resource environments for intensive workloads. The Kafka-driven message routing system provided a streamlined way to segregate and handle different types of processing tasks, allowing us to manage each environment independently and activate it only when needed.

This approach resulted in significant cost savings, enhanced scalability, and optimized performance. We minimized idle resource costs while still meeting peak processing demands. Additionally, the automation provided by Jenkins and ArgoCD minimized operational overhead, enabling our team to focus on further optimizations rather than manual scaling efforts.

Reference

What is blue green deployment?
Blue green deployment is an application release model that gradually transfers user traffic from a previous version of an app or microservice to a nearly identical new release—both of which are running in production.
Efficient Batch Computing – AWS Batch - AWS
AWS Batch allows developers, scientists, and engineers to efficiently process hundreds of thousands of batch and machine learning computing jobs on AWS.
Serverless Function, FaaS Serverless - AWS Lambda - AWS
AWS Lambda is a serverless compute service for running code without having to provision or manage servers. You pay only for the compute time you consume.
Workflow Orchestration - AWS Step Functions - AWS
AWS Step Functions lets you orchestrate multiple AWS services into serverless workflows so that you can build and update applications quickly.
On-Demand Instances vs Reserved Instances - Instance Types Comparison - AWS
What’s the difference between On-Demand Instances and Reserved Instances? How to use On-Demand Instances and Reserved Instances with AWS.


Join us

Scalability, reliability, and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers at all levels and if solving hard problems with challenging requirements is your forte, please reach out to us with your resumé at careers.india@halodoc.com

About Halodoc

Halodoc is the number 1 all around Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke. We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 3500+ pharmacies in 100+ cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services, and to top it off we have recently launched a premium appointment service that partners with 500+ hospitals that allow patients to book a doctor appointment inside our application. We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek, Astra, Temasek, and many more. We recently closed our Series D round and In total have raised around USD$100+ million for our mission. Our team works tirelessly to make sure that we create the best healthcare solution personalized for all of our patient's needs, and are continuously on a path to simplify healthcare for Indonesia.

Samuel Manalu

A guy who cooks many things for Halodoc👨‍🍳🖥️🇮🇩