Building Reliable Reprocessing Messaging System using SQS

Backend Apr 1, 2021

At Halodoc, one of the major offering of our product is providing customers instant consultations with the doctor of choice (based on doctor's availability). A user can select the doctor of his choice from the available list of doctors, once the doctor accepts the consultation, the user can chat with the doctor (text, audio and video) and get instant prescriptions for his consultation.

Flow for Booking Online Consultation
Flow for Booking Online Consultation

The problem we were facing

In the above defined case, when the user makes the payment for the consultation with the doctor, the doctor receives a new push notification on his application to accept the consultation request. With the ever increasing daily consultations on our platform, we were facing optimisation challenges to reliably send notifications to doctor with no latency.

We also have built-in algorithm which prefers the doctors who have good acceptance rate and moves them to top and penalises the ones missing many consultations to ensure good user experience. For the case of doctors not receiving notification for the consultation means the system is degrading their score un-necessarily.

We were using the Push Notifications (FCM in case of Android and APN in case of Apple) to send consultation requests to the doctors. Both FCM and APN do not provide the delivery acknowledgement synchronously, so one can never be sure if the push notification was actually delivered or not - in case the response from the APIs was success. There were times when our doctors were quoting the reasons for not receiving the notification as reason for low acceptance rate.

How we solved the problem

We have our own home-grown live messaging system - Live Connect. Since we are already using it for many of our critical use-cases, we decided to use the same on top of APNs and FCMs for the following reasons -

  1. Live connect supports many web-hooks on different states of the message (delivered, offline-sent etc)
  2. Live connect persists the status of the client - connected/disconnected (here for more details)

For the above reasons we decided to use Live Connect as first channel for sending the push notifications as with the web-hooks integration for delivery, we can get 100% reliable delivery of push notifications.

Still, How to handle the Undelivered Push Notifications?

Here, we evaluated multiple options - Kafka, time based scripts and SQS. With SQS having built in feature for delayed message processing, it was unanimous choice. Now for every message processed for push notification, we also add the message to SQS with configured delay.

Code Snippet to push message to the delay queue
Code Snippet to push message to the delay queue

What is SQS?

SQS is a fully managed message queuing service offered by AWS which provides highly scalable distributed managed queuing service that can be used by the applications to publish and consume messages at scale. The key distinguishing factor offered by SQS over other messaging services like Kafka, RabbitMQ etc is its in-built delay-timeout feature which can be very useful for the critical systems where 100% processing of the message is required.

Visibility Timeout and Delay Timeout in SQS
Visibility Timeout and Delay Timeout in SQS

Delay Timeout

Using delay timeout, we can postpone the delivery of new messages to a queue for a number of seconds. For example, when your consumer application needs additional time to process messages. If you create a delay queue, any messages that you send to the queue remain invisible to consumers for the duration of the delay period. The default (minimum) delay for a queue is 0 seconds. The maximum is 15 minutes.

How SQS solves the Use-Case

While processing the messages, every message is also pushed to the delay-queue with the configured timeout in seconds. The message of this delay queue is processed after the elapsed time, the delivery-status of the message is known by that time. In case the delivery callback is not received by the ondeliver webhook, the message is considered to be undelivered and reprocessed by the consumer of this delay queue, which in this case is a AWS Lambda function.

Flow for the message in delay queue and its processing
Flow for the message in delay queue and its processing

Conclusion

After successful implementation of this use-case, our PN delivery rates have improved and we seldom receive complains from the doctor for not receiving the notifications.  

We have plans for using SQS to create a generic pipeline in our communication micro-service - having multiple consumers with defined priority and using the delay queues to ensure one of the consumer is able to process the message. For example, in case of multiple providers for SMS, in case any of the provider fails sending the message, we can use the delay queue processor to send the message using another provider

Further Readings

Getting started with Amazon SQS - Amazon Simple Queue Service
Explains the basic workflows of working with Amazon SQS using the Amazon SQS console.
Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka
The Uber Insurance Engineering team extended Kafka’s role in our existing event-driven architecture by using non-blocking request reprocessing and dead letter queues (DLQ) to achieve decoupled, observable error-handling without disrupting real-time traffic.

Join Us

We are always looking out for top engineering talent across all roles for our tech team. If challenging problems that drive big impact enthral you, do reach out to us at careers.india@halodoc.com

About Halodoc

Halodoc is the number 1 all around Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke. We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 3500+ pharmacies in 100+ cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services, and to top it off we have recently launched a premium appointment service that partners with 500+ hospitals that allow patients to book a doctor appointment inside our application. We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek and many more. We recently closed our Series B round and In total have raised USD$100million for our mission. Our team works tirelessly to make sure that we create the best healthcare solution personalised for all of our patient's needs, and are continuously on a path to simplify healthcare for Indonesia.

Neeraj Gupta

Engineering Backend @Halodoc