Case StudyArchitectureLoyalty

Building for Scale: Lessons from the AirAsia BIG Loyalty Platform

When you're handling millions of active members and loyalty transactions, architecture decisions become business-critical. Here's what we learned.

ME-Tech Editorial·20 March 2026

The Scale of the BIG Programme

At its peak, AirAsia's BIG loyalty programme had over 40 million registered members across Southeast Asia, processing millions of points transactions daily. Earn events triggered on every flight booking, hotel purchase, and partner transaction. Redemption events hit during seat selection and checkout. The programme ran on a microservices architecture with ME-Tech responsible for the member-facing platform and its Navitaire integrations.

This was not a greenfield project — we inherited a system with existing production traffic and real members. The work was additive: new features, new partner integrations, and progressive infrastructure improvements, all while keeping the existing functionality stable.

Read/Write Split as a Foundation

The first architectural decision that paid lasting dividends was a strict read/write split at the API gateway level. Loyalty balance reads (which happen on every checkout page load) are served from read replicas and aggressively cached. Write operations — points earn, redemption, tier upgrades — go directly to the primary database with explicit idempotency keys.

This separation allowed us to scale read capacity independently and absorb traffic spikes (post-flash-sale checkout rushes, partner batch earn events) without affecting write throughput. It sounds obvious in retrospect, but many loyalty platforms conflate reads and writes until a traffic spike forces the issue in production.

Idempotency Is Non-Negotiable

In a loyalty context, double-crediting points is a financial liability. A network retry, a browser back button press during checkout, or a mobile client timeout-and-retry can all trigger duplicate transaction attempts. Without idempotency, some of these result in double earn.

We implemented idempotency keys at every earn and redemption endpoint, keyed on a composite of: transaction source, booking reference, earn trigger type, and a client-generated nonce. The idempotency store uses Redis with a 24-hour TTL. Duplicate requests within the window return the original response without re-processing. This eliminated double-earn incidents, which had been occurring at a rate of approximately 0.003% of transactions — a small percentage that becomes significant at millions of daily transactions.

Partner Integration Patterns

BIG's value proposition depends on earn opportunities beyond flights — hotels, ground transport, retail, dining. Each partner integration is a data contract: the partner sends an earn trigger, BIG credits the member. The failure modes are asymmetric: if the partner's system goes down, we shouldn't lose earn events, but we also shouldn't block the partner's checkout flow waiting for BIG to respond.

We solved this with an async earn queue. Partner earn events are accepted synchronously (201 Created immediately), queued for processing, and the member balance is updated asynchronously. Partners get a fast response; members see their balance update within seconds. The queue handles retries with exponential backoff for transient failures and dead-letters events that fail repeatedly for manual review.

Database Schema Evolution at Scale

Altering a schema on a table with 40 million rows without downtime requires careful sequencing. The pattern we used for every significant schema change: add the column as nullable, deploy the application code that writes to both old and new columns, backfill the existing data in batches during off-peak hours, then make the column non-nullable and drop the old column in a subsequent release.

This four-step migration pattern added complexity to the release process but meant zero downtime for members. In a loyalty platform, maintenance windows are politically difficult — the programme runs 24/7 and members in different time zones are always active.

What We'd Do Differently

The lesson isn't "build it right from the start." The lesson is "instrument it right from the start, so you know when it's wrong."

Our biggest gap in the early phases was observability. We had application logs but not structured tracing across service boundaries. When an earn event failed, we could see the failure in the BIG service logs, but correlating that back to the originating partner transaction required manual log correlation. We retrofitted distributed tracing with OpenTelemetry mid-project — a correct decision, but one that would have been much cheaper to make at the outset.

Want to work with us?

Tell us about your project and we'll get back within one business day.