Automating Web Performance Testing in CI/CD Using Lighthouse and Core Web Vitals

Web Performance Jan 23, 2026

At Halodoc, web performance directly affects how users discover and access healthcare services. Slow loads or layout shifts not only hurt user trust and search rankings, but they also used to be caught only after deployment, through production-only monitoring.

Problem: Regressions caught late through production-only signals.
Solution: Automated Lighthouse checks in MR CI/CD.
Scope: High-traffic URLs, Core Web Vitals-based thresholds.
Outcome: MRs blocked on regressions, faster feedback, less manual perf testing.

Why We Built This

Web performance issues are difficult to catch early because validation is usually inconsistently enforced, non-blocking, and reactive. When performance testing did occur, it was only for larger feature deployments and on shared staging environments with multiple changes present, making attribution difficult and increasing turnaround time due to repeated manual checks.

As a result, performance regressions were typically discovered post-deployment through production monitoring tools, at a point where the code was already merged and live. Addressing these issues often required reactive fixes, additional verification cycles, or rollbacks, increasing both recovery time and operational overhead. Performance validation lacked a consistent, automated signal before code reached production, so regressions surfaced as production alerts or metrics rather than during the merge-review process.

This led to three recurring problems:

Inconsistent enforcement: Performance checks were selective and non-blocking.
Staging noise: Shared staging environments contained multiple changes, making attribution unreliable.
Late discovery: Regressions surfaced only after deployment through production monitoring.

To address this, we introduced web performance as a mandatory pre-deployment check by:

Automatically validating performance before deployment
Blocking merges that exceed defined performance metric thresholds
Reducing manual performance testing efforts for routine changes
Isolating the performance impact of the specific merge request being deployed
Persisting reports as build artifacts so developers can inspect detailed Lighthouse findings without rerunning tests locally.

Choosing the Right Tool

We evaluated three broad approaches and optimised for pre-deployment, speed, and control.

Production-only signals, such as PageSpeed Insights and Chrome UX Report, which rely on live traffic and are unsuitable for validating merge-request code before deployment
External performance platforms like WebPageTest and SaaS monitoring tools, which introduce external dependencies, queue latency, and limitations when testing private containerized builds
CI-oriented frameworks, such as Lighthouse CI, which simplify assertions but execute sequentially and become a bottleneck when multiple URLs and repeated runs are required within tight MR pipeline SLAs.

We ultimately chose Lighthouse’s open-source Node.js library and built a custom test framework around it. This gave us control over environment isolation, parallelism, retries, URL selection, and reporting, while still relying on a well-supported engine maintained by the Chrome team at Google.

This combination allowed us to enforce performance deterministically before merge, without introducing external dependencies or slowing down developer feedback loops.

Solution Overview

At a high level, the automated workflow in the MR CI/CD pipeline consists of:

Trigger the test execution framework when any merge request targets the master branch
Build and run the merge-request code in an isolated container, ensuring the test reflects only the changes being proposed
Execute Lighthouse performance tests against a curated set of critical, high-traffic URLs
Evaluate Core Web Vitals metrics against predefined URL-specific thresholds
Generate a consolidated report with per-URL results and links to detailed Lighthouse reports
Block the merge when performance thresholds are exceeded

This approach ensures that performance validation happens automatically, consistently, and early—before code is merged and deployed to production.

Architecture

Architecture Summary

Merge request triggers a Jenkins MR pipeline
The application is built from the feature branch and runs in an isolated container with production assets
Lighthouse tests execute against critical URLs
Results are aggregated and evaluated against predefined thresholds
Merge is blocked or allowed based on the outcome

Core System Components

The custom test execution framework built around Lighthouse is organized into four logical components, each with a clear responsibility:

Orchestrator – coordinates execution, configuration, and parallelism

Determines the website under test and parses environment variables
Loads environment-specific threshold values and website-specific endpoints to be tested
Loads custom configurable Lighthouse settings and flags
Coordinates and spawns parallel Lighthouse executions

Worker Processes - run isolated Lighthouse audits per URL

Launch isolated headless Chrome instances per execution to avoid cross-test interference and ensure reproducibility.
Run Lighthouse audits independently for each target URL
Persist raw Lighthouse reports for further analysis

Retry Manager - retries only failed or unstable runs instead of the entire suite

Identifies incomplete or unstable runs
Schedules targeted retries wherever required
Avoids unnecessary re-execution to keep pipeline time bounded

Result Aggregator - aggregates runs, prevents noisy single-run failures from blocking merges

Groups Lighthouse runs per URL, computing the median metric values
Aggregates all the individual metrics and Lighthouse reports into a single custom HTML report
Enforces performance thresholds and determines pipeline outcome

In a nutshell, above code explains how Lighthouse results turn into a merge decision.

Key Design Decisions & Trade-offs

This system deliberately trades a bit of orchestration complexity for faster, more reliable signals at MR time.

Execution Model: Serial Lighthouse CI vs Parallel Programmatic Lighthouse

Lighthouse CI was initially considered due to its simplicity, but its sequential execution model made it unsuitable for merge-request pipelines involving multiple URLs and repeated runs. Execution times increased linearly and quickly exceeded acceptable CI deployment time limits.
However, using Lighthouse library programmatically via a test execution framework enabled parallel execution with isolated worker processes, allowing us to keep runtimes predictable while still running multiple iterations per URL. The additional orchestration complexity was an acceptable trade-off to make performance validation viable as a blocking pre-deployment check.

Test Environment: Local Container vs Stage and Production

Testing against production or staging environments was intentionally avoided for merge-request validation. Production pipeline testing surfaces issues only after deployment, often requiring rollbacks, while shared staging environments make performance attribution unreliable due to the presence of unrelated changes.
Running Lighthouse against containerised merge-request builds ensured isolation and enabled environment-scoped, delta-based evaluation, where relative change mattered more than absolute values. Although this setup does not exactly mirror production conditions, it reliably surfaced regressions introduced by the merge request.
The test execution framework remains generic and can also be used to run performance tests against staging or production via a standalone Jenkins job, without coupling those workflows to the deployment pipeline.

Trigger and Sampling Strategy

Performance validation runs only on merge requests targeting the master branch, ensuring checks apply to production-ready code without the overhead of testing every commit deployed to every branch.
Multiple Lighthouse runs are aggregated per URL to reduce variance. When 10+ runs are configured, the 75th percentile is used to guard against occasional outliers while still being strict on consistently slow runs. However, the default remains the median of 5 runs, which provides a stable signal without significantly increasing CI execution time.

Observed Outcome

Post rollout, a merge request introduced a CLS regression on a high-traffic article page, which was automatically detected in the MR pipeline and blocked before the deployment pipeline is run.

In this case, CLS jumped from our baseline of 0.1 to 0.38, which is above our MR threshold and would hurt user experience.

The consolidated performance report, while providing the overall results, also highlights the exact URL and metric that violated the configured threshold.

The individual Lighthouse report provides actionable diagnostics and recommendations to help identify the root cause of the CLS regression from the previous baseline of 0.1. In this case, Lighthouse flagged late-loading web fonts and image lazy-loading behaviour as contributors to the CLS spike.

Lighthouse Report indicating the CLS value and dev suggestions

After the developer applies the fix and pushes the updated changes, the Lighthouse test is automatically triggered again as part of the MR pipeline and passes successfully, since the CLS value is now reflected below the configured threshold.

Once the MR pipeline passes, this allows the merge action to be enabled and allows the deployment pipeline to be triggered once the merge is accepted.

Future Improvements

Smarter URL selection
Today, the set of high-traffic URLs is static and pre-determined manually based on traffic and feature, at that point of time. Going forward, this can be made dynamic by automatically selecting URLs based on real usage and traffic trends.
Adaptive performance thresholds
Current thresholds are static and environment-based. Future iterations can compare results against historical baselines and recent performance trends, allowing thresholds to adjust over time.
Historical visibility
Each merge request is evaluated in isolation today. Persisting results over time would enable trend analysis, regression tracking, and long-term performance insights.

Conclusion

By integrating automated web performance validation directly into the merge-request pipeline, we made performance a deployment requirement rather than a post-release concern, similar to how we already treat test coverage and security scanning. This reduced reliance on manual testing, improved attribution to individual code changes, and ensured performance regressions are caught early—before they impact users.

Over time, this extensible framework allows performance signals, baselines, and coverage to evolve without changing how teams work day to day. As performance expectations grow, it provides a foundation to continuously raise the bar while keeping developer feedback fast, actionable, and reliable.

Collectively, this enables:

Automated Lighthouse checks on critical URLs per MR
Threshold-based Core Web Vitals gating merges
Parallel & retried execution to keep CI fast and stable
Sanitised, shareable reports as build artifacts for devs and reviewers.

References

Join us

Scalability, reliability, and maintainability are the three pillars that govern what we build at Halodoc Tech. We are actively looking for engineers at all levels, and if solving hard problems with challenging requirements is your forte, please reach out to us with your resume at careers.india@halodoc.com.

About Halodoc

Halodoc is the number one all-around healthcare application in Indonesia. Our mission is to simplify and deliver quality healthcare across Indonesia, from Sabang to Merauke.
Since 2016, Halodoc has been improving health literacy in Indonesia by providing user-friendly healthcare communication, education, and information (KIE). In parallel, our ecosystem has expanded to offer a range of services that facilitate convenient access to healthcare, starting with Homecare by Halodoc as a preventive care feature that allows users to conduct health tests privately and securely from the comfort of their homes; My Insurance, which allows users to access the benefits of cashless outpatient services in a more seamless way; Chat with Doctor, which allows users to consult with over 20,000 licensed physicians via chat, video or voice call; and Health Store features that allow users to purchase medicines, supplements and various health products from our network of over 4,900 trusted partner pharmacies. To deliver holistic health solutions in a fully digital way, Halodoc offers Digital Clinic services, including Haloskin, a trusted dermatology care platform guided by experienced dermatologists.
We are proud to be trusted by global and regional investors, including the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek, Astra, Temasek, and many more. With over USD 100 million raised to date, including our recent Series D, our team is committed to building the best personalized healthcare solutions — and we remain steadfast in our journey to simplify healthcare for all Indonesians.