Harnessing the Power of JDBC Batch Processing

JDBC Batch Processing Jun 21, 2024

As Halodoc's healthcare technology continuously evolves, efficiency becomes indispensable. Tasks such as processing payments promptly, managing claims, and accurately reconciling complex financial data all require utmost precision and speed.

Enter Java Database Connectivity (JDBC) Batch Processing, a crucial component in streamlining Halodoc's database operations. Incorporating JDBC Batch Processing into Halodoc's healthcare technology has proven to be a valuable tool in enhancing efficiency and performance. One notable achievement is the optimisation of the reconciliation of complex financial data by 15 folds. This improvement underscores the effectiveness of JDBC Batch Processing in streamlining database operations and driving operational excellence.

We will dissect its integral role in optimising database operations and improving performance. From managing large data volumes to ensuring seamless transactions, JDBC batch processing emerges as a valuable asset in the quest for operational excellence.

A Brief Overview of JDBC

First, let's understand JDBC. Java Database Connectivity (JDBC) is a Java API that provides a standardised method for engaging with relational databases. With its help, Java applications can execute SQL queries and manage data. JDBC acts as a bridge between the Java programming language and diverse Database Management Systems (DBMS) such as MySQL, PostgreSQL, Oracle and SQL Server.

JDBC provides a set of classes and interfaces that facilitate database operations, including establishing connections, creating statements, executing queries, and processing result sets. It follows a simple and intuitive approach, making it widely adopted in enterprise applications for database connectivity.

Here are some important JDBC concepts :

  1. Establishing Connection: Here at Halodoc, we prefer using HikariCP which is a solid high-performance JDBC connection pool to create a datasource and then establish a connection.
  2. Creating Statement: Once the connection is established, create a statement object using the prepareStatement() method of the Connection interface. This statement object will be used to execute SQL queries against the database.
  3. Executing Queries: You can execute SQL queries using the executeQuery() method of the Statement interface for retrieving data or the executeUpdate() method for executing insert, update, delete, or DDL (Data Definition Language) queries.
  4. Processing Results: For SELECT queries that return results, you use the ResultSet interface to iterate over the returned rows and retrieve data and for INSERT AND UPDATE queries you get the affected number of rows in the result.
import com.google.common.collect.Lists;
import com.zaxxer.hikari.HikariDataSource;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import lombok.Builder;
import lombok.Data;

public class JDBCDemo {
  @Data
  @Builder
  private class Employee {
    private String name;

    private Double salary;

    private String designation;
  }


  public static void main(String[] args) {
    //Create a list of employees
    final List < Employee > employees = new ArrayList < > ();

    employees.forEach(employee -> {
    
      try (HikariDataSource dataSource = getHikariDatasource(); Connection connection = dataSource.getConnection(); PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO employees (name, salary, designation) VALUES (?, ?, ?)")) {
      
        preparedStatement.setString(1, employee.getName());
        preparedStatement.setDouble(2, employee.getSalary());
        preparedStatement.setString(3, employee.getDesignation());
        int numberOfAffectedRows = preparedStatement.executeUpdate();
        
      } catch (SQLException e) {
        e.printStackTrace();
      }
      
    });

  }


  private static HikariDataSource getHikariDatasource() {
    final HikariDataSource dataSource = new HikariDataSource();
    dataSource.setPoolName("your_datasource");
    dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver");
    dataSource.setJdbcUrl("jdbc:mysql://localhost:3306/your_database");
    dataSource.setUsername("your_username");
    dataSource.setPassword("your_password");
    dataSource.setIdleTimeout(300);
    dataSource.setConnectionTestQuery("SELECT 1");
    dataSource.setConnectionInitSql("SELECT 1");
    dataSource.setMinimumIdle(10);
    dataSource.setMaximumPoolSize(50);
    dataSource.setIdleTimeout(30);
    return dataSource;
  }
}

The Need for Speed

The provided code executes a single SQL query within a database transaction. However, consider scenarios where you must handle the creation and/or update of a large volume of records. Using the aforementioned code snippet, you would need to execute each SQL query individually.

Based on our observations, executing a single insert/update SQL query via JDBC typically takes around 50 to 100 milliseconds, if the query is more complex it may require more than 100 milliseconds to execute. Now if you were to execute 200,000 SQL queries, the scale of time required becomes quite evident upon calculation. It would take several hours to handle this workload, which isn't particularly efficient.

Fortunately, JDBC offers a solution for this challenge in the form of JDBC Batch Processing. JDBC Batch Processing allows us to group multiple SQL statements into a batch and execute them together in a single network trip to the database. So, instead of executing each SQL statement individually, JDBC Batch Processing enables us to bundle the queries in a batch, resulting in reduced network overhead and improved performance.

Implementing Batch Processing

We will modify the aforementioned code to implement batch processing.

import com.google.common.collect.Lists;
import com.zaxxer.hikari.HikariDataSource;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import lombok.Builder;
import lombok.Data;

public class JDBCDemo {
  @Data
  @Builder
  private class Employee {
    private String name;

    private Double salary;

    private String designation;
  }


  public static void main(String[] args) {
    //Create a list of employees
    final List < Employee > employees = new ArrayList < > ();
    final int batchSize = 50;
    
    final List < List < Employee >> employeeBatches = Lists.partition(employees, batchSize);
    
    employeeBatches.forEach(employeeBatch -> insertBatchOfEmployees(employeeBatch));
  }



  private static void insertBatchOfEmployees(final List < Employee > employeesBatch) {
    
    try (HikariDataSource dataSource = getHikariDatasource(); Connection connection = dataSource.getConnection(); PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO employees (name, salary, designation) VALUES (?, ?, ?)")) {
    
      for (final Employee employee: employeesBatch) {
        preparedStatement.setString(1, employee.getName());
        preparedStatement.setDouble(2, employee.getSalary());
        preparedStatement.setString(3, employee.getDesignation());
        preparedStatement.addBatch();
      }
      
      int[] rowsAffected = preparedStatement.executeBatch();
    
    } catch (SQLException e) {
      e.printStackTrace();
    }
    
  }
  

  private static HikariDataSource getHikariDatasource() {
    final HikariDataSource dataSource = new HikariDataSource();
    dataSource.setPoolName("your_datasource");
    dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver");
    dataSource.setJdbcUrl("jdbc:mysql://localhost:3306/your_database");
    dataSource.setUsername("your_username");
    dataSource.setPassword("your_password");
    dataSource.setIdleTimeout(300);
    dataSource.setConnectionTestQuery("SELECT 1");
    dataSource.setConnectionInitSql("SELECT 1");
    dataSource.setMinimumIdle(10);
    dataSource.setMaximumPoolSize(50);
    dataSource.setIdleTimeout(30);
    return dataSource;
  }

Creating batches
When you have a large list, you need to create partitions (batches) of the list. To decide on the optimal batch size, start with a moderate batch size and conduct performance tests with different batch sizes and measure the execution time and resource utilisation for each batch size to find the optimal batch size.

Once you have your batches, perform the following steps for each batch :

  1. Iterating through Employee Objects:
    • Iterate through each object in the employeesBatch.
  2. Setting Placeholder Values:
    • For each Employee object, set the values of the prepared statement's placeholders using the setString() and setDouble() methods.
  3. Adding to Batch:
    • After setting the values, add the prepared statement to a batch using addBatch() for each employee object. This allows creating a batch of all the SQL statements before executing them all at once.
  4. Executing Batch:
    • Once all SQL statements are added to the batch, after the end of the for loop, call executeBatch() on the PreparedStatement to execute all the statements in the batch at once.

Behind The Scenes

• When the executeBatch() is called, the JDBC driver sends the entire batch of SQL statements to the database server in one go.

• The database server processes the batch and executes each SQL statement. After all statements are executed, the results are returned to the JDBC driver, which then returns them to the application.

• In Batch Processing,  for a batch, multiple statements are executed in a single network round-trip between the application and the database server. Therefore, the network communication overhead between the application and the database server is reduced significantly.
Number of network round trips in batch processing = Number of batches

• Whereas, in single statement execution, each statement requires a network round trip between the application and the database server, causing an increase in the number of network round trips between the application and the database server.
Number of network round trips in single statement execution = Number of statements

• Executing 100,000 statements as single queries would require 100,000 round trips, whereas if we partition 100,000 statements into 1000 batches (batch size of 100), the number of round trips will reduce to 1,000.

Halodoc's Use Case

The Challenge

During Halodoc's financial reconciliation of over 100,000 pharmacy delivery payments, a challenge arose where we needed to:
1. Fetch over 100,000 payment records from Table 1.
2. Insert 2 records in Table 2 for each record coming from Table 1, resulting in more than 200,000 inserts in Table 2.
3. Then, for each record inserted in Table 2, create a record in Table 3 and map it to the Table 2 record with a foreign key constraint (Table 2's primary key).
So, over 200,000 inserts in Table 3.
4. Update Table 4 for each record fetched from Table 1, resulting in more than 100,000 updates.

Initially, we attempted a single query execution approach, but the entire process took more than 5 hours to complete, the reason being there were over 500,000 network round trips happening between the application and database server. We had to overcome this subpar performance.

The Solution

Here's where JDBC Batch Processing, the saviour, comes into the picture.
For each of the 4 steps, fetching payment records, inserts in Table 2, inserts in Table 3 and updates in Table 4, we used the batching approach:

We first retrieved all the payment IDs of the 100,000 payment records and then partitioned them into 500 batches  (batch size = 200).

Then iterated over the 500 batches and for each batch of payment IDs:
1. Fetched the respective payment records from the database, so 200 payment records were fetched in one go.

2. For 200 payment records, created 400 objects which need to be inserted in Table 2 and using JDBC Batch Insert we inserted them in a single database round trip.

3. Similarly, we created another 400 objects which needed to be inserted in Table 3
We faced a problem here. As mentioned earlier, we needed to map the Table 3 record with a Table 2 record using Table 2's primary key as a foreign key in Table 3, and the executeBatch() from step 2 returned int[] (integer array) of number of rows inserted but not the primary keys generated for the inserted records.

With a little exploration of Batch Processing, we discovered that JDBC provides a method preparedStatement.getGeneratedKeys() to fetch all the auto-incremented primary keys generated in the Batch Insert.

So, we used the getGeneratedKeys() to fetch all the 400 primary keys created for Table 2 records and used them as foreign keys in the 400 objects we created for Table 3 and then using JDBC Batch Insert again in a single database round trip, we inserted the 400 objects with foreign keys mapped to Table 2 records.

4. Then finally we used JDBC Batch Update to update 200 records in Table 4 in another database roundtrip.

Also to avoid establishing and closing connections at every step and to make the implementation more efficient, we made sure all 4 steps shared a single database connection for each batch where the connection was established at the start of the batch and then committed and closed at the end of the batch.

Here's a modified version of the batch processing code to fetch the generated keys:

import com.google.common.collect.Lists;
import com.zaxxer.hikari.HikariDataSource;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import lombok.Builder;
import lombok.Data;

public class JDBCDemo {
  @Data
  @Builder
  private class Employee {
    private String name;

    private Double salary;

    private String designation;
  }


  public static void main(String[] args) {
    //Create a list of employees
    final List < Employee > employees = new ArrayList < > ();
    final int batchSize = 50;
    
    final List < List < Employee >> employeeBatches = Lists.partition(employees, batchSize);
    
    employeeBatches.forEach(employeeBatch -> insertBatchOfEmployees(employeeBatch));
  }



  private static List < Long > insertBatchOfEmployees(final List < Employee > employeesBatch) {
    List < Long > ids = new ArrayList < > ();
    try (HikariDataSource dataSource = getHikariDatasource(); Connection connection = dataSource.getConnection(); PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO employees (name, salary, designation) VALUES (?, ?, ?)")) {
    
      for (final Employee employee: employeesBatch) {
        preparedStatement.setString(1, employee.getName());
        preparedStatement.setDouble(2, employee.getSalary());
        preparedStatement.setString(3, employee.getDesignation());
        preparedStatement.addBatch();
      }
      
      int[] rowsAffected = preparedStatement.executeBatch();
      
      try (ResultSet generatedKeys = preparedStatement.getGeneratedKeys()) {
        while (generatedKeys.next()) {
          ids.add(generatedKeys.getLong(1));
        }
      }
      
    } catch (SQLException e) {
      e.printStackTrace();
    }
    return ids;
  }
  


  private static HikariDataSource getHikariDatasource() {
    final HikariDataSource dataSource = new HikariDataSource();
    dataSource.setPoolName("your_datasource");
    dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver");
    dataSource.setJdbcUrl("jdbc:mysql://localhost:3306/your_database");
    dataSource.setUsername("your_username");
    dataSource.setPassword("your_password");
    dataSource.setIdleTimeout(300);
    dataSource.setConnectionTestQuery("SELECT 1");
    dataSource.setConnectionInitSql("SELECT 1");
    dataSource.setMinimumIdle(10);
    dataSource.setMaximumPoolSize(50);
    dataSource.setIdleTimeout(30);
    return dataSource;
  }

Retrieving Generated Keys:
• After the execution of the batch is completed, if necessary, retrieve all the generated keys by calling getGeneratedKeys() on the PreparedStatement.
• Iterate over the ResultSet of generated keys using a while loop.
• Inside the loop, generatedKeys.getLong(1) retrieves the value of the first generated key in the current row of the result set and adds it to a list of Long, called ids.

What Happens Internally

  1. The JDBC driver uses database-specific SQL commands or APIs to fetch the generated keys.
    • In MySQL, after the INSERT statement, it calls SELECT LAST_INSERT_ID() to   fetch the last auto-generated ID.
    a)  INSERT INTO employees (name, salary, designation) VALUES (?, ?, ?);
    b)  SELECT LAST_INSERT_ID();

    • In PostgreSQL, it internally uses RETURNING clause in the INSERT statement itself to fetch the last auto-generated ID.
    a)  INSERT INTO employees (name, salary, designation) VALUES (?, ?, ?)  RETURNING id;
  2. When you call getGeneratedKeys(), the driver returns the ResultSet containing all the generated keys.

The Final Result

This entire implementation reduced the total number of network round trips from 500,000 to merely 2,000, which brought down the time taken to complete the task from 5 hours to just 20 minutes. A 15-fold increase in performance!

Conclusion

JDBC Batch Processing stands out as a powerful tool for boosting the efficiency and performance of database tasks within Java applications. By grouping multiple SQL statements into batches, developers can minimise round trips to the database, reduce network overhead, and elevate the overall throughput.

This approach not only streamlines database interactions but also enhances the scalability and responsiveness of Java applications, especially when handling large datasets.

When incorporating JDBC batch processing into your projects, it's crucial to meticulously design your batches, consider transaction management and monitor performance metrics to refine your application's behaviour. With proper utilisation, JDBC Batch Processing emerges as an invaluable asset for achieving top-tier performance in database operations within Java development.

Join Us

Scalability, reliability and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers at all levels, and if solving complex problems with challenging requirements is your forte, please reach out to us with your resumé at careers.india@halodoc.com.

About Halodoc

Halodoc is the number 1 all around Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke. We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 3500+ pharmacies in 100+ cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services, and to top it off we have recently launched a premium appointment service that partners with 500+ hospitals that allow patients to book a doctor appointment inside our application. We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek, Astra, Temasek, and many more. We recently closed our Series D round and In total have raised around USD$100+ million for our mission. Our team works tirelessly to make sure that we create the best healthcare solution personalised for all of our patient's needs, and are continuously on a path to simplify healthcare for Indonesia.

Nihar Patil

Software Engineer