Automating iOS Memory Leak Detection: Designing a Runtime Observability Pipeline

At Halodoc, we continuously invest in strengthening our mobile engineering foundations to ensure our applications remain reliable, scalable, and easy to evolve. As our iOS app has grown in size and complexity, we’ve increasingly focused on operational excellence—not just building features, but ensuring they behave correctly over time.

One area that required deeper attention was memory management, especially detecting and tracking leaks as the app and teams scaled. While iOS provides powerful tools for debugging memory issues, relying primarily on manual investigation does not scale well as applications grow. We wanted to move beyond reactive debugging and toward a system that could proactively surface memory issues, provide shared visibility across the team, and enable faster remediation.

Introduction

Traditionally, memory leaks in iOS apps are identified using tools like Instruments during development or after performance issues are reported. While effective, this approach is largely manual, reactive, and developer-driven.

Our goal was to design an automated, production-safe observability pipeline that could detect memory leaks during real app usage in non-production environments such as development builds, QA cycles, and UI automation suites, capture meaningful context, and surface these issues as actionable engineering signals.

Rather than treating memory leaks as isolated debugging tasks, we wanted to treat them as reliability concerns—visible, trackable, prioritised, and owned by the team.

The Problem

Before this initiative, memory-leak detection was already part of our iOS workflow, but it was not enforced end-to-end at a system level or connected to ownership and follow-through.

We had multiple safeguards in place:

  • Static analysis via SwiftLint, enforced as a build-phase check locally and in CI/CD. Builds would fail if retain cycles were detected.
  • Runtime detection via MLeaksFinder, which surfaced memory leaks as in-app alerts during development and QA testing.

While this provided good coverage, the workflow still depended on manual follow-up:

  • Runtime leaks were visible only as transient alerts during app usage
  • Tracking, monitoring, and ensuring closure required explicit human intervention
  • There was no centralized or persistent view of leaks across builds and test cycles

As a result, memory-leak handling lacked a strict, automated feedback loop that guaranteed visibility, ownership, and follow-through for every detected leak across builds and releases.

This was the gap we set out to address.

Design Goals & Principles

Before designing the solution, we defined a set of guiding principles to address the gaps in our existing workflow:

  • Production-safe: The system must never impact real users or production stability (leak detection runs only in non-production builds).
  • Automation over awareness: Runtime memory leaks should not rely on manual observation or follow-up to be tracked and resolved.
  • Event-driven and persistent: Memory leaks should be captured as structured, durable signals rather than ephemeral UI alerts.
  • High signal-to-noise ratio: The system should minimise alert fatigue through intelligent grouping and de-duplication.
  • Actionable by default: Every detected leak should result in clear visibility, ownership, and traceability.

These principles shaped every architectural decision that followed.

High-Level Solution Overview

At a high level, the system functions as an observability pipeline:

  • Memory leaks are detected at runtime in non-production builds, across both manual and automated test executions
  • Each detected leak is emitted as a structured, background event
  • Events are sent to a centralized observability platform
  • Monitoring rules aggregate and evaluate leak patterns over time
  • Alerts are generated automatically based on defined conditions
  • Engineering tickets are created without manual intervention

The key shift is conceptual:

Memory leaks are treated as operational signals, not local debugging artifacts—similar to errors, crashes, and latency regressions.
High-level architecture of automated iOS memory-leak observability pipeline

Technology Choices

  • MLeaksFinder: Originally adopted for its zero-setup, automatic runtime detection, MLeaksFinder was already part of our iOS stack.. This allowed us to focus this initiative on automating detection, observability, and alerting rather than introducing new tooling.
  • Dynatrace: Used as the observability platform to ingest structured leak events, monitor trends, and trigger automated alerts and ticket creation.

The pipeline is intentionally detector-agnostic, allowing the underlying leak detection mechanism to be replaced in the future without impacting the overall workflow.

Why We Did Not Use Xcode Instruments? We explored using Xcode Instruments for memory-leak detection, including running it alongside automated test flows. However, Instruments is fundamentally trace-based and requires parallel execution, large trace collection, and additional parsing infrastructure—making it impractical for continuous use during normal development and QA workflows.

For this initiative, we needed lightweight, always-on runtime signals that could be captured instantly, enriched with context, and correlated with user journeys, which made leveraging the existing MLeaksFinder setup a more practical foundation for building an automated observability and ownership pipeline.

Key Architectural Decisions

Event-Driven Runtime Detection

A key architectural decision was to adopt an event-driven model for memory-leak detection.

When a leak is detected at runtime, it is immediately emitted as a structured event and observed by the app, which then reports it directly to the observability backend. This ensures leak signals are captured in real time, without depending on app lifecycle timing or deferred processing.

An event-driven approach provides stronger guarantees around reliability and timeliness, and allows leak detection to integrate seamlessly with both manual testing and automated execution paths.

Using View Stack as the Primary Identifier

Class name alone is insufficient to meaningfully identify a memory leak, especially in large apps with repeated components and shared views.

In addition to the leaked object’s class, MLeaksFinder provides the runtime view hierarchy (retain path) that leads to the leaked instance. This hierarchy preserves the structural UI context in which the leak occurred.

For example, a leak may surface as:

SecureWebViewController → UIView → UIView → WKWebView → UIView

Even when the leaked instance is a nested UI component, the hierarchy captures the owning screen and containment path. Using this view hierarchy path as the primary grouping identifier significantly improved debugging clarity and reduced time to resolution.

Attaching User Journey Context for Hard-to-Reproduce Leaks

In some cases, even with detailed view-hierarchy information, memory leaks were still difficult to reproduce reliably. To address this, we attached a user-level identifier to leak events when available, allowing us to correlate leaks with the user journey around the time they occurred.

This made it easier to understand the interaction sequences that triggered a leak and reproduce it reliably in non-production environments—especially for issues caused by specific flows rather than a single screen.

Intelligent De-duplication with Time Windows

Without safeguards, the same memory leak can generate repeated alerts, leading to noise and alert fatigue.

To address this, we introduced time-based de-duplication aligned with release cycles. Leak events are grouped by view hierarchy, and repeated occurrences within a defined time window are aggregated into a single signal rather than triggering multiple alerts.

If the same leak continues to appear beyond that window, a new alert is generated—indicating either a regression or that the original issue has not been fully resolved.

Implementation Highlights

Some notable implementation aspects include:

On-device reporting

  • Extending an existing leak-detection library to emit structured events instead of displaying debug alerts
  • reporter abstraction layer to support future integrations with different observability backends
  • Rich contextual payloads for each leak event, including:
    • Leaked class
    • View hierarchy (retain path)
    • App version and build
    • Device and OS information
    • Timestamp
    • Optional user or session identifier (to correlate with user journeys)

Safety & Quality

  • Strict build-time isolation, ensuring the system runs only in non-production builds
  • Comprehensive unit tests covering payload creation, event delivery, and failure scenarios, ensuring the observability pipeline remains reliable and safe across refactors.

Throughout the implementation, the focus remained on reliability and maintainability rather than maximum coverage at any cost.

Event Contract (Simplified)

To keep the observability pipeline flexible and detector-agnostic, leak reporting is built around a small, well-defined contract. The following snippets illustrate the simplified interface and event structure used to emit memory-leak signals. This highlights the core responsibilities and data flow.

Leak Reporter Interface & Event Structure

Monitoring, Alerting, and Incident Creation

Once leak events reach the observability platform:

  • They are grouped and visualized over time
  • Monitoring rules evaluate frequency and impact
  • Alerts are triggered when defined conditions are met

When an alert fires:

  • A tracking ticket is created automatically
  • A notification is posted to the engineering communication channel with the ticket link

This closes the loop from detection to ownership without manual intervention.

Dynatrace dashboard showing grouped iOS memory leak events

Impact & Results

Although the system is still evolving, it has already delivered tangible benefits:

  • Earlier detection of memory leaks during development and testing cycles
  • Reduced reliance on manual debugging
  • Improved visibility into leak patterns across app versions
  • Clear accountability through automated ticket creation
  • Higher confidence during releases

Most importantly, memory leaks are no longer silent or incidental—they are visible, measurable, and actionable.

Memory Leak Alert

Silent Leak Detection in Automation

By moving from UI alerts to silent, event-based leak reporting, memory-leak detection now integrates seamlessly with existing UI automation suites. Automated tests can run without interruption while leaks are detected and reported in the background.

This allows daily sanity, regression, and in-sprint automation runs to act as a continuous source of memory-leak signals, surfacing issues that may be missed during manual testing. As a result, memory leaks are detected earlier—often during routine automation—without adding extra steps or slowing down developers or QA teams.

Learnings & Trade-offs

A few key learnings emerged from this effort:

  • Grouping strategy matters more than raw detection for effective debugging and triage
  • Observability without action is incomplete—signals must lead to ownership
  • Alerting systems require careful tuning to avoid noise and fatigue
  • Treating leaks as operational signals fundamentally changes how teams respond to them

These learnings shaped our next iterations and will guide future feature gates such as CI/CD leak thresholds.

Reaching a stable and effective design required multiple iterations, reinforcing the importance of incremental refinement over one-time solutions.

Future Enhancements

There are a few key directions in which this system can evolve:

  • AI-assisted remediation, where detected leak patterns are analyzed to suggest fixes and generate merge requests for review.
  • Feature and journey tagging to associate leaks with specific functionality under test
  • Memory-leak gates in CI/CD to prevent regressions during feature development
  • Severity classification based on frequency and impact to guide prioritization
  • Deeper retain-cycle diagnostics to further reduce time to root cause

The system was designed to support these enhancements without fundamental rework.

Conclusion

Memory leaks are inevitable in complex applications. What matters is how quickly and reliably they are detected and addressed.

By shifting from manual debugging to an automated observability pipeline, we transformed memory-leak handling from a reactive practice into a proactive engineering capability—making leaks visible, actionable, and tied to ownership by default.

  • Automated detection in non-production builds and UI automation
  • Structured leak events with rich context (view stack, device, version)
  • Event-driven observability via Observerability dashboards and alerts
  • Automatic ownership through ticket creation and notifications`

Some of our blogs to dig deep into Memory Leaks

Join us

Scalability, reliability and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers at all levels and  if solving hard problems with challenging requirements is your forte, please reach out to us with your resumé at careers.india@halodoc.com.

About Halodoc

Halodoc is the number one all-around healthcare application in Indonesia. Our mission is to simplify and deliver quality healthcare across Indonesia, from Sabang to Merauke.
Since 2016, Halodoc has been improving health literacy in Indonesia by providing user-friendly healthcare communication, education, and information (KIE). In parallel, our ecosystem has expanded to offer a range of services that facilitate convenient access to healthcare, starting with Homecare by Halodoc as a preventive care feature that allows users to conduct health tests privately and securely from the comfort of their homes; My Insurance, which allows users to access the benefits of cashless outpatient services in a more seamless way; Chat with Doctor, which allows users to consult with over 20,000 licensed physicians via chat, video or voice call; and Health Store features that allow users to purchase medicines, supplements and various health products from our network of over 4,900 trusted partner pharmacies. To deliver holistic health solutions in a fully digital way, Halodoc offers Digital Clinic services including Haloskin, a trusted dermatology care platform guided by experienced dermatologists.
We are proud to be trusted by global and regional investors, including the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek, Astra, Temasek, and many more. With over USD 100 million raised to date, including our recent Series D, our team is committed to building the best personalized healthcare solutions — and we remain steadfast in our journey to simplify healthcare for all Indonesians.