Different Language

[From Feature Flags to Progressive Delivery: How to Reduce the Risk of Releasing Complex Features]

Analyze with AI

Get AI-powered insights from this Mad Devs tech article:

Feature flags got you through the first stage: decoupling deployment from release. But here's the gap most teams miss: a flag controls which code path runs, not whether the new version is healthy under real production conditions. You can wrap a feature in the cleanest toggle in the world, flip it, and still take down checkout because the new build exhausts the database connection pool ten minutes in, when real concurrency kicks in, and your staging tests never reached that load.

This guide walks through how to move from feature flags as a mechanism to progressive delivery as a system, with canary releases, blue-green deployments, observability gates, and automated rollbacks in Kubernetes. The idea is straightforward: treat every release as a controlled experiment instead of a binary event.

Feature flag best practices: how to avoid flag debt before it kills your codebase

Feature flags are deceptively simple. You add an if statement, wrap a feature, and ship it behind a toggle. The problem starts about six months later, when your codebase has 200 flags, half of them are stale, and nobody knows which ones are safe to remove.

We've seen this firsthand. On a project with ~40 microservices, the team had accumulated over 300 feature flags across two years. Some flags were release toggles that should have been removed within a sprint. Others were ops flags controlling circuit breakers. A few were permanent experiment flags. They were all mixed with no naming convention, no ownership, and no expiration policy. One developer accidentally toggled a flag that had been "temporary" for 14 months. It took down the checkout flow for 20 minutes.

Flag debt is technical debt with a multiplier. Every stale flag doubles the number of possible code paths your tests need to cover. With n flags, you theoretically have 2^n states. At 10 flags in a single service, that's 1,024 combinations. At 20, it's over a million. Nobody is testing a million code paths.

Here's what actually works to keep flags under control:

Classify flags at creation time. Not all flags are the same. Release flags are short-lived (days to weeks). Ops flags control infrastructure behavior and may be long-lived. Experiment flags have a defined end date. Permission flags are permanent. If you don't classify them upfront, you'll never know which ones to clean up.

Enforce naming conventions. A flag called new_checkout tells you nothing after three months. A flag called release.checkout-v2.2026-q1 tells you everything: it's a release flag, it's for checkout v2, and it should have been removed by the end of Q1. We use the pattern {type}.{feature}.{expiry} and enforce it in CI.

Set expiration dates and enforce them. The single most effective practice we adopted was adding a created_at and expires_at to every flag's metadata, and then failing CI builds when a flag exceeds its TTL without being either renewed or removed. Some teams call this a "time bomb." It sounds aggressive, but it works. Without it, flags accumulate silently. And create the cleanup ticket at the same time as the implementation ticket. If removal work is not scheduled when the flag is created, it never happens.

Assign ownership. Every flag should have an owner, ideally the team that created it. When that team finishes the rollout, they own the cleanup. In our setup, flag metadata includes a team field, and our weekly Slack digest pings teams about flags approaching expiration.

Use OpenFeature as your abstraction layer. OpenFeature is a CNCF incubating project that provides a vendor-agnostic API for feature flagging. Instead of locking yourself into LaunchDarkly's SDK, Statsig's SDK, or your homegrown solution, you code against the OpenFeature interface and swap providers underneath. When your flag management tool becomes the bottleneck (and it will, eventually), migration without OpenFeature means rewriting every flag evaluation call in your codebase.

Here's a production-ready OpenFeature setup in Go:

package flags

import (
    "context"
    "log/slog"

    flagd "github.com/open-feature/go-sdk-contrib/providers/flagd/pkg"
    "github.com/open-feature/go-sdk/openfeature"
)

var client = openfeature.NewClient("checkout-service")

func Init() error {
    provider, err := flagd.NewProvider()
    if err != nil {
        return err
    }
    // SetProviderAndWait blocks until the provider is ready.
    // Without this, flag evaluations during startup return defaults silently.
    return openfeature.SetProviderAndWait(provider)
}

func Shutdown() {
    openfeature.Shutdown()
}

func CheckoutV2Enabled(ctx context.Context, userID, plan string) bool {
    evalCtx := openfeature.NewEvaluationContext(userID, map[string]any{
        "plan": plan,
    })

    enabled, err := client.BooleanValue(
        ctx,
        "release.checkout-v2.2026-q1",
        false, // safe default: old behavior
        evalCtx,
    )
    if err != nil {
        slog.Warn("flag evaluation failed, falling back to default",
            "flag", "release.checkout-v2.2026-q1",
            "user_id", userID,
            "err", err,
        )
        return false
    }

    return enabled
}

Three things matter here for production: SetProviderAndWait blocks until the provider is actually ready (without it, every flag evaluation during startup silently returns the default, which means your new feature randomly appears or disappears depending on boot timing). The error from BooleanValue is logged, not swallowed. And Shutdown() exists because provider connections need to be cleaned up properly. A flag system that is unavailable or half-initialized during startup should degrade predictably, not randomly.

The point of flagd.NewProvider() specifically is that it can be swapped for any compliant provider: LaunchDarkly, Statsig, DevCycle, or a file-based config for local dev, without touching business logic.

But disciplined flag hygiene only solves half the problem. Flag debt is a code problem. The bigger question is the deployment problem that flags can't solve at all: once the new version is running in production, how do you know it's safe before it reaches every user?

Progressive delivery vs continuous delivery: why the shift matters in 2026

Continuous delivery answers the question, "Can we deploy to production at any time?" Progressive delivery answers a different question: "Can we deploy to production safely, with guardrails that continuously verify the release while it is happening?"

The distinction matters because feature flags alone don't answer the questions that actually kill you in production: Does the new build overload the database connection pool? Does the new version leak memory under real concurrency? Does the schema change remain backward-compatible while old and new versions coexist? Those are deployment and runtime questions, not flag questions.

Progressive delivery treats every release as an experiment. Instead of routing 100% of traffic to the new version immediately, you start with 5%, observe the metrics, increase to 20%, observe again, and gradually ramp to 100%, or automatically roll back if error rates spike. The keyword is "automatically." A human staring at a Grafana dashboard at 2 AM is not a release strategy.

In 2026, the tooling is mature enough that there's no good excuse to skip this. Argo Rollouts and Flagger (both CNCF projects) provide Kubernetes-native controllers for canary and blue-green deployments with metric-driven promotion. Gateway API has reached GA, replacing the archived SMI spec as the standard networking abstraction. Prometheus, Datadog, and New Relic all integrate directly with these controllers for automated analysis. The pieces are there; the question is how you wire them together.

The important architectural point is this: feature flags and progressive delivery are not alternatives. They operate at different layers. Progressive delivery controls which version gets traffic. Feature flags control which behavior is exposed inside that version. The strongest rollout pattern uses both.

How feature flags help with progressive delivery: canary, blue-green, and rollout strategies

Canary releases: exact traffic control requires a real router

A canary release routes a small percentage of real production traffic to the new version while the old version continues serving the majority. This sounds straightforward, but there's a catch that many guides gloss over: in Argo Rollouts, setWeight without a traffic router is only a best-effort approximation based on pod replica counts. If you have 10 replicas and set the weight to 5%, you can't get 0.5 pods. The rollout controller rounds, and your "5% canary" might actually be 10% or 0%.

For real percentage-based traffic control, you need to wire the rollout to stable and canary services and to a traffic router: Istio, NGINX Ingress, or another supported integration.

Here's a production-grade canary setup with Argo Rollouts and Istio:

Rollout spec:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: checkout-service
spec:
  replicas: 10
  revisionHistoryLimit: 2
  progressDeadlineSeconds: 600   # abort if no progress for 10 min
  progressDeadlineAbort: true     # auto-abort, don't just mark degraded
  selector:
    matchLabels:
      app: checkout-service
  template:
    metadata:
      labels:
        app: checkout-service
    spec:
      containers:
        - name: checkout
          image: registry.example.com/checkout:v2.3.1
          ports:
            - containerPort: 8080
  strategy:
    canary:
      stableService: checkout-stable
      canaryService: checkout-canary
      trafficRouting:
        istio:
          virtualService:
            name: checkout-vsvc
            routes:
              - primary
      analysis:
        startingStep: 1
        templates:
          - templateName: checkout-canary-analysis
        args:
          - name: canary-service
            value: checkout-canary
          - name: stable-service
            value: checkout-stable
      steps:
        - setWeight: 5
        - pause: { duration: 5m }
        - setWeight: 20
        - pause: { duration: 10m }
        - setWeight: 50
        - pause: { duration: 10m }

Istio VirtualService (initial state):

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: checkout-vsvc
spec:
  hosts:
    - checkout.example.com
  gateways:
    - checkout-gateway
  http:
    - name: primary
      route:
        - destination:
            host: checkout-stable
          weight: 100
        - destination:
            host: checkout-canary
          weight: 0

Now the weight steps are real. The rollout controller updates the traffic split in the VirtualService, not just the replica ratio. The stable and canary services keep traffic separated enough for meaningful per-version analysis. progressDeadlineAbort: true means a stuck rollout doesn't just sit there marked "degraded"; it actually aborts and rolls back.

Starting from step 1, the controller runs the AnalysisTemplate to decide whether to promote or abort. More on that template below.

Blue-green: fast traffic rollback, but not magic rollback

Blue-green deployments run two full environments simultaneously: "active" (current) and "preview" (new). Traffic switches entirely from active to preview once the preview environment passes health checks. Rollback means switching traffic back.

Blue-green works well when the feature is hard to evaluate incrementally, when traffic splitting is awkward for the service, or when you want a full preview stack before promotion. With Argo Rollouts:

strategy:
  blueGreen:
    activeService: checkout-active
    previewService: checkout-preview
    autoPromotionEnabled: false
    prePromotionAnalysis:
      templates:
        - templateName: preview-smoke-tests
      args:
        - name: service-name
          value: checkout-preview.default.svc.cluster.local
    postPromotionAnalysis:
      templates:
        - templateName: checkout-post-promotion
      args:
        - name: service-name
          value: checkout-active.default.svc.cluster.local
    scaleDownDelaySeconds: 600

prePromotionAnalysis runs smoke tests against the preview stack before switching live traffic. postPromotionAnalysis keeps checking the newly promoted version after the switch, so if the metrics degrade under real traffic, the rollout can still abort. scaleDownDelaySeconds: 600 keeps the old ReplicaSet alive for 10 minutes, giving you a fast rollback window. And the service names are FQDNs (checkout-preview.default.svc.cluster.local) because analysis templates running in a different namespace need to resolve them unambiguously.

But blue-green is not a magical undo button. It rolls back traffic, not data. If the new version performs a destructive migration, emits duplicate events, or warms incompatible cache entries, switching traffic back restores the old binary while leaving the data layer in a partially broken state.

Where feature flags add value on top of infrastructure rollouts

Imagine you're deploying a new recommendation engine. The canary handles the infrastructure rollout: new pods, traffic shifting, metric gates. But within the new version, the recommendation engine is behind a feature flag. Even after the canary promotes to 100%, the new engine starts serving only 10% of users (controlled by the flag), then 25%, then 50%.

This double-gating gives you two rollback levers for two different failure modes:

If infrastructure metrics (latency, error rate, CPU) are bad, the canary aborts and returns traffic. If infrastructure is fine, but business metrics (conversion rate, engagement) drop, you toggle the feature flag. Different failure modes, different response mechanisms. Neither one alone covers both cases.

Progressive delivery in Kubernetes: tools, pipelines, and observability gates

Choosing a controller: Argo Rollouts vs Flagger

The two serious options for progressive delivery controllers in Kubernetes are Argo Rollouts and Flagger. Both are mature CNCF projects. Both support canary and blue-green. Both integrate with Prometheus, Istio, Linkerd, NGINX, and the Gateway API. The choice comes down to your existing ecosystem.

Argo Rollouts replaces the standard Kubernetes Deployment with a custom Rollout resource. It comes with its own UI and a kubectl plugin. If you're already using ArgoCD for GitOps, Rollouts is the natural fit.

Flagger works differently: it watches standard Kubernetes Deployments and creates the canary infrastructure automatically. You don't change your Deployment spec. If you're using Flux for GitOps, Flagger is part of the same family. Flagger's webhook system is also more flexible for custom pre/post-rollout hooks.

For a team starting from scratch, I'd lean toward Argo Rollouts for the UI alone. When a canary is in progress at 3 AM, and someone needs to understand the current state, a visual dashboard beats kubectl describe rollout every time.

Observability gates: the analysis template that actually catches problems

This is where heroism stops scaling. A rollout that depends on a human noticing a dashboard spike and hitting the abort button works exactly once before the team burns out. The whole point of progressive delivery is that the promotion decision is made by code, not by whoever happens to be awake. And that only works if the promotion logic is tied to trustworthy signals.

Your analysis should do more than check one success-rate query. At minimum, a serious rollout should validate enough traffic volume to make the sample meaningful, error rate, latency, and ideally a business KPI.

Here's an analysis template that addresses the most common pitfalls we've seen in production:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: checkout-canary-analysis
spec:
  args:
    - name: canary-service
    - name: stable-service
  metrics:
    # Gate 1: do we have enough traffic to trust the other metrics?
    - name: canary-request-volume
      interval: 60s
      count: 5
      successCondition: "len(result) > 0 && !isNaN(result[0]) && result[0] >= 200"
      failureLimit: 1
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            sum(increase(
              http_requests_total{service="{{args.canary-service}}"}[5m]
            ))

    # Gate 2: error rate stays below 2%
    - name: canary-error-rate
      interval: 60s
      count: 5
      successCondition: "len(result) > 0 && !isNaN(result[0]) && result[0] <= 0.02"
      failureLimit: 1
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            sum(increase(
              http_requests_total{service="{{args.canary-service}}", status=~"5.."}[5m]
            ))
            /
            clamp_min(
              sum(increase(
                http_requests_total{service="{{args.canary-service}}"}[5m]
              )), 1
            )

    # Gate 3: p99 latency stays under 500ms
    - name: canary-latency-p99
      interval: 60s
      count: 5
      successCondition: "len(result) > 0 && !isNaN(result[0]) && result[0] <= 0.5"
      failureLimit: 1
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            histogram_quantile(
              0.99,
              sum(rate(
                http_request_duration_seconds_bucket{
                  service="{{args.canary-service}}"
                }[5m]
              )) by (le)
            )

    # Gate 4: business metric -- canary conversion is >= 97% of stable
    - name: conversion-vs-stable
      interval: 120s
      count: 4
      successCondition: "len(result) > 0 && !isNaN(result[0]) && result[0] >= 0.97"
      failureLimit: 1
      consecutiveSuccessLimit: 2
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            (
              sum(increase(checkout_completed_total{service="{{args.canary-service}}"}[5m]))
              /
              clamp_min(
                sum(increase(checkout_started_total{service="{{args.canary-service}}"}[5m])),
                1
              )
            )
            /
            clamp_min(
              (
                sum(increase(checkout_completed_total{service="{{args.stable-service}}"}[5m]))
                /
                clamp_min(
                  sum(increase(checkout_started_total{service="{{args.stable-service}}"}[5m])),
                  1
                )
              ),
              0.0001
            )

There are a few design choices worth calling out.

First, the request volume gate runs before anything else matters. A 0% error rate on 3 requests tells you nothing. We set the minimum at 200 requests per 5-minute window; adjust this based on your traffic volume. For a B2B service with 50 enterprise clients, you might need longer windows or bigger initial canary weights to get any signal at all.

Second, the error rate query uses clamp_min(..., 1) in the denominator. Without it, if the canary receives zero requests in a window, you get a division by zero, Prometheus returns NaN, and depending on your successCondition, the analysis either silently passes or silently fails. Both are wrong. clamp_min ensures the denominator is always at least 1.

Third, the conversion metric compares canary against a stable baseline, not against a fixed absolute number. Raw conversion rates shift based on time of day, marketing campaigns, and traffic composition. A static threshold of "conversion >= 3.5%" might pass at 2 PM and fail at 2 AM, even with identical code. Comparing canary/stable as a ratio neutralizes those external factors.

Fourth, every successCondition includes len(result) > 0 && !isNaN(result[0]). Argo Rollouts metric providers can return empty arrays, NaN, or infinity. If you don't explicitly guard for those cases, your "automated safety" quietly becomes automated self-deception. If business metrics live in Datadog rather than Prometheus, use the default(result, 0) pattern that Argo Rollouts documents for handling empty Datadog windows.

GitOps and rollout controllers must not fight each other

This one trap deserves its own section because it causes real confusion in production.

If Argo Rollouts is changing the traffic weights in an Istio VirtualService during a canary, while ArgoCD keeps reapplying the version stored in Git, the two controllers fight over the same field. The rollout sets the weight to 20%, ArgoCD syncs it back to 100%/0%, the rollout sets it to 20% again, and so on. You get weight flapping, noisy sync status, and very confusing incident timelines.

Argo Rollouts documents this explicitly and recommends using ignoreDifferences:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: checkout
spec:
  ignoreDifferences:
    - group: networking.istio.io
      kind: VirtualService
      jsonPointers:
        - /spec/http/0
  syncPolicy:
    syncOptions:
      - ApplyOutOfSyncOnly=true

ignoreDifferences tells ArgoCD not to consider the managed route section as drift. ApplyOutOfSyncOnly=true prevents ArgoCD from constantly reapplying resources that are already synced. Without this, the rollout may technically work but produce behavior that's extremely hard to debug during an incident.

The same general lesson applies beyond ArgoCD: once progressive delivery is mutating live routing objects, GitOps must be deliberate about which fields are source-of-truth-at-rest and which fields are source-of-truth-during-rollout.

Where progressive delivery still fails if you ignore the rest of the system

Progressive delivery reduces rollout risk. It does not replace systems thinking. We've seen teams invest heavily in canary automation and still have outages because the failure mode was in a layer that the canary didn't cover.

Database migrations. A canary that routes 5% of HTTP traffic to the new version doesn't help if the new version runs a destructive schema change on startup. For anything non-trivial, use the expand → migrate → contract pattern: add the new column/table first (backward-compatible), deploy code that reads and writes both shapes, migrate data gradually, and remove the old contract only after the old version is gone. If your rollback requires restoring a database backup, the application rollback path was never truly safe.

Async workers and consumers. Canarying HTTP traffic does not automatically canary background consumers. If the same release changes Kafka consumers, cron jobs, or queue workers, you need separate rollout controls for those. Otherwise, you carefully protect synchronous traffic while the background plane creates irreversible side effects.

Cache, sessions, and warmup behavior. Some failures only appear after a few minutes of real load: caches fill, JIT compilation settles, connection pools saturate, hot partitions emerge. That's why the first canary pause should not be decorative. If you pause for 30 seconds and then promote, you might miss the failure mode that only shows up at minute 5.

Low-traffic services. A 1% rollout of a low-volume B2B service can easily mean "zero real requests hit the canary." In those cases, use a bigger initial weight (10-20%), longer pauses, internal header routing, or synthetic preview traffic. A mathematically elegant percentage is useless if it doesn't produce statistically useful observations.

Rollback reverses traffic, not external side effects. A rollback stops new requests from hitting the canary. It cannot undo emails already sent, payments already attempted, duplicate events already published, or corrupted state already written. For those paths, idempotency and compensating workflows matter more than the deployment strategy itself.

A proven checklist for scalable, confident releases with progressive delivery

After running progressive delivery across multiple projects, here's the checklist we follow before every complex release.

Before writing the feature flag: decide whether it's a release flag, experiment flag, ops flag, or permission flag. Record owner, TTL, and removal conditions. Define what "flag off" really means for data, side effects, and user state. If toggling the flag off requires a database migration or cache invalidation, that's not a feature flag problem; that's a deployment dependency that needs its own rollback plan.

Before deploying: verify that canary traffic is truly separated by the router, not approximated by pod ratios. Dry-run the analysis queries in staging and confirm that empty results fail the right way (not silently pass). Set thresholds from baseline data, not intuition. Trigger a deliberate failed rollout in staging and prove that rollback works within the expected time window. Confirm that schema changes are backward-compatible for the coexistence period. If you've never tested rollback, you don't have rollback.

During rollout: watch the first canary step manually, even if promotion is automated. Confirm request volume is high enough before trusting the metrics. Track business KPIs separately from infrastructure metrics: a canary can pass all infrastructure checks while destroying conversion rates. Keep the user-facing behavior behind the flag until the version itself is proven healthy.

After full promotion: expand the feature audience gradually through the flag. Remove the temporary release flag while context is still fresh, not next quarter, not "when we have time." Update the runbook with new metrics, dependencies, and rollback notes. Review what actually triggered or almost triggered the rollback logic and tighten the next rollout accordingly.

Conclusion

Feature flags are still essential. They decouple deployment from release, enable testing in production, and provide a kill switch when things go wrong. But they are a mechanism, not a strategy. A strategy means canary releases that limit blast radius, observability gates that automate the "should we proceed?" decision, analysis templates that catch NaN traps and low-volume false confidence, rollback that triggers without a human in the loop, and GitOps configuration that doesn't fight the rollout controller.

Progressive delivery is that strategy. Not from feature flags to no feature flags. From feature flags as a mechanism to progressive delivery as a system.

Minimum setup to get started:

  1. Install the controller: kubectl create namespace argo-rollouts && kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
  2. Convert one Deployment to a Rollout with a two-step canary (5% → 50% → 100%).
  3. Wire it to your traffic router (Istio VirtualService, NGINX, or Gateway API).
  4. Add one AnalysisTemplate that checks HTTP error rate with clamp_min and isNaN guards.
  5. Deploy, watch it work, iterate from there.

You don't need to boil the ocean. Start with one service, one metric, and one canary step. Once the team sees an automated rollback save them from an outage, progressive delivery sells itself.