What is the difference between head-based and tail-based sampling in OpenTelemetry?

Head-based sampling makes the keep/drop decision at the start of a trace, before the outcome is known. A flat 10% head-based rate keeps 10% of errors alongside 10% of healthy traces — you lose signal and keep noise in equal proportion. Tail-based sampling waits until the trace is complete, then decides: keep all errors, keep all traces above a latency threshold, and sample everything else at a low rate. This lets you preserve 100% of the traces that matter for incident investigation while dramatically reducing storage costs. Tail sampling requires a stateful gateway that buffers all spans for a trace before deciding, which is why the gateway must be a StatefulSet and the agent must route by trace ID.

Why do distributed traces break at service boundaries in Kubernetes?

Traces break at service boundaries for three main reasons. First, propagation format mismatch: a Python service using W3C TraceContext calling a Java service configured for B3 produces two disconnected root spans with no error — just a silent gap. Fix by setting propagators explicitly in every service before the first cross-service call. Second, async jobs and background workers: Celery tasks, Kafka consumers, and cron jobs receive no trace context by default and start new root spans. Fix by explicitly injecting context into the task payload at enqueue time and extracting it at consume time. Third, header loss at ingress: NGINX Ingress and service meshes like Istio may strip or overwrite traceparent headers. Fix by configuring proxy_set_header traceparent for NGINX and aligning mesh tracing configuration with the application propagation format.

Why does tail sampling in OpenTelemetry require a StatefulSet and a loadbalancing exporter?

Tail-based sampling must see all spans for a trace before making a keep/drop decision. A standard Kubernetes Deployment load-balances round-robin — different spans for the same trace land on different replicas, each seeing a partial trace and making an incorrect sampling decision. The fix is two-part: the agent uses the loadbalancing exporter, which hashes the trace ID and consistently routes all spans for a given trace to the same gateway pod. The gateway must be a StatefulSet with a headless Service so individual pod DNS addresses are resolvable. Without the headless Service, the loadbalancing exporter cannot distinguish gateway replicas and falls back to round-robin routing.

What is the k8sattributes processor and why is it needed in production OpenTelemetry?

The k8sattributes processor runs in the OTel Collector agent and enriches spans with Kubernetes metadata that application pods cannot see themselves: pod name, pod UID, deployment name, namespace, node name, and custom labels such as app version and team. Without this enrichment, a high-latency alert tells you a service is slow. With it, the same alert can tell you the service is slow on a specific node after a specific rollout — turning a vague signal into an actionable one. The processor uses the Kubernetes API via a ServiceAccount, so the DaemonSet needs appropriate RBAC permissions. Spans emitted in the first seconds of pod startup may arrive before the Kubernetes API propagates pod metadata and will be partially or not enriched — this is expected behavior.

What are the main production failure modes of an OpenTelemetry tail sampling pipeline?

Three failure modes account for most production incidents. First, num_traces buffer exhaustion: when a traffic spike pushes more simultaneous open traces than num_traces allows, the oldest traces are evicted before a sampling decision without alerting unless Collector metrics are monitored. Size for burst traffic with 2× headroom and alert on memory pressure before the buffer fills. Second, late spans: spans arriving after the decision_wait timeout are processed as new root traces — their parent may already be dropped, producing orphaned single-span traces in the backend. This means decision_wait must exceed your longest async operation. Third, tail sampling memory is stateful: a rolling restart of the gateway loses all spans currently in the tail sampling buffer. Document this in your runbook and ensure operators know what to expect during planned maintenance.

Dmitrii Khalezin

DevOps Engineer

Different Language

DevOps

Created: May 27, 2026

[OpenTelemetry in Kubernetes: From Installed to Actually Working]

Analyze with AI

Get AI-powered insights from this Mad Devs tech article:

The gap between "OTel installed" and "OTel working" is where most teams get stuck.

The official documentation is strong for first contact: start the SDK, point it at a Collector, and see spans in Jaeger. What the documentation does not prepare you for is a real Kubernetes cluster with a dozen services, a mix of Python and Go, a trace that silently stops at the boundary between two microservices, and a finance team asking why trace storage costs tripled after rollout.

This guide targets DevOps and SRE engineers who already understand the concepts but have not landed a production OTel implementation that the team actually trusts. The goal is specific: a minimally reliable pipeline for Kubernetes distributed tracing, with correct context propagation and tail-based sampling that keeps storage costs predictable. Running OpenTelemetry in Kubernetes reliably requires decisions that hello-world examples do not force you to make — this guide surfaces them before they become incidents.

One framing that helped us think about this: distributed traces are to microservices what session traces are to AI agents — a way to reconstruct what happened across a sequence of operations that no single participant has full visibility into. The context propagation mechanics map directly to the pattern as explored in our earlier piece on session traces.

What production OTel needs beyond hello-world

Hello-world OTel has three components: SDK in the application, Collector running somewhere, and backend receiving the data. This works for demos. Four requirements change the architecture when you move to production.

Collector as a control point. Applications should not talk directly to the backend. They should not carry the cost of retries, batching, or data transformation. The Collector is where you gain control over the telemetry pipeline: sampling decisions, attribute enrichment, redaction, fan-out to multiple backends, without touching application code. In production, the Collector becomes the control point for the telemetry pipeline.

Kubernetes metadata enrichment. Your pods do not know which node they run on, which deployment version they belong to, or what team label is on their namespace. The Collector does, through the k8sattributes processor. Enriching spans with this context (pod name, deployment name, namespace, and team) is what makes "high latency in service X" turn into "high latency in service X on node Y after the 14:30 rollout of version Z."

Propagation consistency. A trace that crosses a service boundary works only if both services speak the same propagation format. Mixing W3C TraceContext with B3 or Jaeger's format silently breaks traces. This needs to be decided and enforced before the first service goes to production, not discovered six months later when traces look short.

Sampling and cost control. Full trace data at production load can quickly become a storage and retention problem. The answer is not to disable tracing — it is tail-based sampling at a centralized point, which lets you keep 100% of errors and slow traces while sampling healthy fast traces at a low rate. This requires architecture decisions that hello-world does not need.

Reference architecture: agent + gateway + backend

Production OTel on Kubernetes runs two tiers of Collectors.

┌─────────────────────────────────────────────────────────┐
│  Kubernetes Cluster                                     │
│                                                         │
│  ┌──────────┐   OTLP/gRPC   ┌─────────────────────┐   │
│  │ Service A │──────────────▶│  OTel Agent         │   │
│  └──────────┘               │  (DaemonSet)         │   │
│                             │                       │   │
│  ┌──────────┐   OTLP/gRPC   │  Per-node collection │   │
│  │ Service B │──────────────▶│  k8s metadata attach │   │
│  └──────────┘               │  Batch + forward      │   │
│                             └──────────┬────────────┘   │
│                                        │ OTLP/gRPC      │
│                             ┌──────────▼────────────┐   │
│                             │  OTel Gateway          │   │
│                             │  (StatefulSet, 2+ pods)│   │
│                             │                        │   │
│                             │  Tail-based sampling   │   │
│                             │  Attribute redaction   │   │
│                             │  Fan-out to backends   │   │
│                             └──────────┬─────────────┘  │
└────────────────────────────────────────┼─────────────────┘
                                         │
                    ┌────────────────────┼──────────────┐
             ┌──────▼──────┐   ┌────────▼──────┐   ┌───▼──────────┐
             │  Tempo /     │   │  Prometheus   │   │  Loki /      │
             │  Jaeger      │   │  / Mimir      │   │  Elasticsearch│
             └─────────────┘   └───────────────┘   └──────────────┘

Agent tier (DaemonSet). One Collector pod per node. Applications send telemetry to the agent over OTLP gRPC. The agent adds Kubernetes metadata through the k8sattributes processor, then batches and forwards to the gateway. Resource consumption is node-bounded: the agent only handles traffic from pods on its node.

Note on DaemonSet and localhost: pods in a DaemonSet deployment do not automatically reach the agent via localhost. Applications should use the node's internal IP, or configure the agent Service with hostPort or a node-local Service. The simplest production pattern is an environment variable injected by the Downward API using status.hostIP, then referenced in the exporter endpoint: OTEL_EXPORTER_OTLP_ENDPOINT=http://$(NODE_IP):4317.

Gateway tier (StatefulSet). Two or more replicas. The gateway is where tail-based sampling lives — it needs to see all spans for a trace before making a keep/drop decision, which requires stable addressing (more on this in the sampling section). The gateway also handles attribute redaction and fan-out to multiple backends.

Why not skip the agent and send directly to the gateway? You can. But you lose node-level Kubernetes metadata enrichment, you push retry and batching logic into the application, and every service needs the gateway's address hardcoded. The agent is a node-local or cluster-local collection point, depending on how you expose it: applications send to a fixed local address, and the Collector handles everything else.

Deployment model options. The right model depends on your cluster size and requirements.

MODEL	USE WHEN	TRADE-OFF
DaemonSet agent + gateway	Production default for multi-service clusters	More moving parts; correct and scalable
Sidecar per pod	True localhost endpoint required; strict isolation	High resource overhead at scale
Gateway only	Small cluster, pilot, or early evaluation	No node-level enrichment; single point of congestion
Direct to backend	Local development only	No sampling, no enrichment, no control

For the agent to be reachable at the node IP, expose it with hostPort or hostNetwork on the DaemonSet, and restrict access with NetworkPolicy. If you expose the agent through a Kubernetes Service instead, do not assume node-local routing unless your Service configuration explicitly enforces it.

Application instrumentation and Collector configuration that survive production

Instrumentation minimum

Before configuring the Collector, the services need to send data. The practical minimum:

Deploy the Collector via the OTel Collector Helm chart or the OpenTelemetry Operator. Both support DaemonSet and Deployment modes and handle upgrades without manual manifest management.
Set OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES (at minimum deployment.environment) in pod environment variables — do not hardcode them in application code.
Use auto-instrumentation for HTTP, gRPC, and database calls. The OTel Operator supports zero-code injection for Python, Java, and Node.js via pod annotations, which means no SDK changes for existing services.
Add manual spans only for business-critical operations with no HTTP or DB analog: payment steps, batch processing stages, cache decision points.
Set propagators explicitly and consistently across all languages. W3C TraceContext (traceparent) is the standard; commit to it before the first cross-service call.

Agent pipeline: OpenTelemetry Kubernetes example

# collector-agent-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

  k8sattributes:
    auth_type: serviceAccount
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.namespace.name
        - k8s.node.name
      labels:
        - tag_name: app.version
          key: app.kubernetes.io/version
          from: pod
        - tag_name: team
          key: team
          from: pod
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.ip
      - sources:
          - from: connection

  filter/drop_health_checks:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.route"] == "/healthz"'
        - 'attributes["http.route"] == "/readyz"'
        - 'attributes["http.route"] == "/metrics"'

  batch:
    send_batch_size: 512
    timeout: 5s

exporters:
  loadbalancing:
    protocol:
      otlp:
        tls:
          insecure: true
    resolver:
      k8s:
        service: otel-gateway-collector-headless
        ports: [4317]

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, k8sattributes, filter/drop_health_checks, batch]
      exporters: [loadbalancing]

The examples use tls.insecure: true for readability inside a trusted cluster. In regulated or multi-tenant environments, use TLS/mTLS between agents, gateways, and backends, and restrict OTLP endpoints with NetworkPolicy.

Processor order is not optional. memory_limiter must be first — it is the circuit breaker that prevents OOM under load. k8sattributes runs before filter so that filtering can reference Kubernetes attributes. batch runs last, immediately before export, so it accumulates enriched and filtered spans.

Health check filtering belongs in the agent, not the gateway. Filtering /healthz and /readyz here prevents those spans from entering the tail sampling buffer in the gateway — wasted memory if they are never kept.

loadbalancing exporter, not plain otlp. Tail sampling requires that all spans in a trace reach the same gateway replica. The loadbalancing exporter handles this by routing spans by trace ID — see the tail sampling section for the full explanation and StatefulSet requirement.

Gateway pipeline configuration

# collector-gateway-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 2048
    spike_limit_mib: 512

  tail_sampling:
    decision_wait: 10s
    num_traces: 50000
    expected_new_traces_per_sec: 1000
    policies:
      - name: keep-errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: keep-slow
        type: latency
        latency:
          threshold_ms: 2000
      - name: keep-debug-flagged
        type: string_attribute
        string_attribute:
          key: sampling.priority
          values: ["always_on"]
      - name: sample-normal
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

  attributes/redact:
    actions:
      - key: db.statement
        action: update
        value: "[redacted]"
      - key: http.request.header.authorization
        action: delete
      - key: http.request.header.cookie
        action: delete

  batch:
    send_batch_size: 1024
    timeout: 10s

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777

service:
  extensions: [health_check, pprof]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, attributes/redact, batch]
      exporters: [otlp/tempo]

Expose the Collector's own telemetry. The health_check extension provides a /health endpoint for Kubernetes readiness probes. The pprof extension is invaluable when debugging Collector memory or CPU spikes. Enable both; they cost nothing under normal operation.

Redaction at the gateway, not the agent. The agent enriches; the gateway sanitizes. This separation means you can add new redaction rules without touching per-node config.

Why traces break at service boundaries

Context propagation is the mechanism by which a trace follows a request across service calls. Service A creates a span, injects a traceparent header into the outbound request, and Service B extracts it and creates a child span under the same trace ID. When this breaks, you see two disconnected root spans instead of one trace. There is no error — just a silent gap.

Three categories cover most production failures.

1. Propagation format mismatch

W3C TraceContext (traceparent) is the standard. B3 (Zipkin's format, used by older Java and Spring frameworks) and Jaeger's uber-trace-id are still common. A Python service using W3C calling a Java service using B3 produces two root spans with no connection.

Fix: set propagators explicitly at startup across all services, and choose one format before the first cross-service call.

# Python -- set globally before any HTTP clients initialize
from opentelemetry.propagate import set_global_textmap
from opentelemetry.propagators.composite import CompositeHTTPPropagator
from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator

# During migration: support both; drop B3 once all services are aligned
set_global_textmap(CompositeHTTPPropagator([
    TraceContextTextMapPropagator(),
    B3MultiFormat(),
]))


// Go -- set at TracerProvider initialization
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
    propagation.TraceContext{},
    b3.New(),
))

2. Async jobs and background workers

HTTP auto-instrumentation injects and extracts context automatically. Celery tasks, Kafka consumers, cron jobs, and queue workers receive no context by default — they start new root spans because there is nothing to extract.

Fix: explicitly capture context when enqueueing and restore it when consuming.

from opentelemetry import trace, propagate

# When enqueueing: capture current trace context into the task payload
def enqueue_order_processing(order_id: str):
    carrier = {}
    propagate.inject(carrier)  # {"traceparent": "00-abc123...-1"}
    process_order.delay(order_id, carrier)

# In the Celery task: restore context before doing any work
@celery_app.task
def process_order(order_id: str, carrier: dict):
    ctx = propagate.extract(carrier)
    with trace.get_tracer(__name__).start_as_current_span(
        "celery.process_order",
        context=ctx,
        kind=trace.SpanKind.CONSUMER,
    ) as span:
        span.set_attribute("order.id", order_id)
        do_processing(order_id)

The same pattern applies to Kafka (store the carrier in message headers), SQS (store in message attributes), and any other queue-based handoff.

3. Header loss at ingress and service mesh

Ingress controllers and API gateways may strip unknown headers or overwrite traceparent with their own trace ID. Istio and Linkerd handle mTLS at the proxy layer but do not automatically forward application tracing headers.

Fix for Nginx Ingress:

yaml

nginx.ingress.kubernetes.io/configuration-snippet: |
  proxy_set_header traceparent $http_traceparent;
  proxy_set_header tracestate $http_tracestate;

Fix for Istio: align the mesh tracing configuration with the propagation format used by your applications, and verify that proxies do not replace or drop traceparent / tracestate headers on the path between services. For services that construct HTTP clients manually (not using auto-instrumented libraries), always inject context explicitly:

from opentelemetry import propagate
import httpx

async def call_downstream(ctx, payload):
    headers = {}
    propagate.inject(headers)  # always; even if you think the library does it
    async with httpx.AsyncClient() as client:
        return await client.post("http://downstream/api", json=payload, headers=headers)

Propagation diagnosis checklist

When a trace looks short or a service appears as a root span unexpectedly:

[ ] Log incoming request headers in the downstream service
    → Is traceparent present? Does the trace-id segment match upstream?
[ ] Check propagator configuration in both services
    → Are they using the same format?
[ ] Check the ingress and any API gateway in between
    → Are traceparent and tracestate in the allow-list of forwarded headers?
[ ] If using a service mesh, check proxy tracing configuration
    → Is the mesh configured to forward (not replace) the application trace header?
[ ] For async jobs: is the carrier captured at enqueue time and restored at consume time?
[ ] Check if the Collector agent's filter is dropping spans from the service in question

OpenTelemetry tail-based sampling: keeping costs under control without losing signal

Head-based sampling — deciding at trace start whether to keep a trace — is simple but blind. It cannot factor in whether the trace ended in an error. A flat 10% head-based sample keeps 10% of errors — unless you use parent-based sampling with an always-on sampler for error paths, which requires SDK-level changes in every service. You keep noise and lose signal in equal proportion. Kubernetes cost optimization with OpenTelemetry starts here: tail-based sampling is the lever that lets you cut storage spend without cutting visibility into failures.

Tail-based sampling waits until the trace is complete, then decides. Keep all errors. Keep all traces above a latency threshold. Sample everything else at a low rate. This is the strategy that gives you the best chance of preserving the traces that matter — errors, slow requests, and explicitly flagged flows — provided routing, buffering, and decision_wait are configured correctly.

Tail sampling configuration

The full gateway pipeline configuration is in the previous section. Key parameters:

tail_sampling:
  decision_wait: 10s      # How long to buffer spans before deciding
                          # Must exceed your slowest async operation's duration
  num_traces: 50000       # Max traces buffered simultaneously -- set this carefully
  policies:
    - name: keep-errors   # 1. Always keep error traces
      type: status_code
      status_code:
        status_codes: [ERROR]
    - name: keep-slow     # 2. Always keep traces above the latency threshold
      type: latency
      latency:
        threshold_ms: 2000
    - name: sample-normal # 3. Sample the rest
      type: probabilistic
      probabilistic:
        sampling_percentage: 5

The processor evaluates all policies independently. A trace is kept if at least one policy votes to sample it and no policy votes to drop it — so keep-errors will always win over the probabilistic fallback.

decision_wait must exceed your longest async path. If a Celery task takes up to thirty seconds, set decision_wait: 35s. A trace that has not fully arrived when the timeout fires gets a decision made on partial information — usually a drop, because partial traces have no latency signal and may not have an error yet.

The multi-gateway problem

Tail sampling requires all spans for a trace to reach the same Collector instance. A multi-replica Deployment load-balances round-robin by default — different spans for the same trace land on different replicas, each seeing a partial trace and making a wrong sampling decision.

Fix the route by trace ID using the loadbalancing exporter in the agent (as shown in the agent config above). The agent hashes the trace ID and consistently routes to the same gateway pod. This requires the gateway to be a StatefulSet with a headless Service so individual pod addresses are DNS-resolvable.

# Gateway as StatefulSet -- required for loadbalancing exporter to resolve pod addresses
apiVersion: opentelemetry.io/v1beta1 # verify the version supported by your installed Operator
kind: OpenTelemetryCollector
metadata:
  name: otel-gateway
spec:
  mode: statefulset
  replicas: 2
# The Operator may or may not create a headless Service automatically depending
# on the installed version. Check with:
#   kubectl get svc -l app.kubernetes.io/component=opentelemetry-collector
# If no headless Service exists (ClusterIP: None), define one explicitly.

Memory sizing rule

The num_traces limit determines peak memory. Rough calculation:

num_traces × avg_spans_per_trace × avg_span_size_bytes = peak memory

Example:
50,000 × 20 spans × 1 KB = ~1 GB

Set memory_limiter to 80% of pod memory limit.
Set num_traces to fit within 70% of that.
Add 2× headroom for traffic spikes.

At 1,000 traces/sec with a 10s decision_wait, you need at least 10,000 traces in the buffer at any moment. 50,000 gives 5× headroom for bursts. If your burst factor is higher, increase num_traces first, then scale pod memory.

Operational checklist before rollout

This is not a validation checklist for the technology; it is a readiness checklist for the team running it. The specific failure modes behind each item are covered in the Limitations section below.

Collector self-observability

[ ] health_check extension enabled on agent and gateway
[ ] Kubernetes readiness probe points to /health endpoint
[ ] Collector's own metrics scraped by Prometheus
[ ] Alert on Collector pod memory approaching its limit
    → memory_limiter drops spans before OOMKill; alert before it gets there
[ ] Alert on exporter send failures and queue saturation
    → otelcol_exporter_send_failed_spans is stable across versions
[ ] Alert on refused or dropped spans at the receiver
[ ] Alert on tail-sampling processor eviction, late-span, and decision metrics
    → metric names vary by Collector version; check your installed version's telemetry docs
    → https://opentelemetry.io/docs/collector/internal-telemetry/

Sampling validation

[ ] Generate a test trace that results in an ERROR status
    → Verify it appears in the backend despite low probabilistic sampling rate
[ ] Generate a test trace above the latency threshold
    → Verify it appears
[ ] Run for 24h and check backend storage growth rate
    → Does it match the expected rate given your sampling config?

Propagation tests

[ ] Trigger a request that crosses at least two service boundaries
    → Verify a single trace ID appears in all three services' spans
[ ] Trigger an async job enqueued from an HTTP handler
    → Verify the job's spans appear as children of the HTTP span
[ ] Check ingress logs: is traceparent forwarded to the first backend service?

Backend and cost limits

[ ] Backend retention policy set and tested (data volume × retention = storage cost)
[ ] Backend ingestion rate limits configured to prevent runaway cost
[ ] Collector gateway resource limits set with memory_limiter aligned to pod limits

Runbook ownership

[ ] Who is on call for Collector issues?
[ ] Is the Collector restart procedure documented?
[ ] What happens to traces in flight during a gateway rolling restart?
    (Answer: spans in the tail sampling buffer are lost -- document this)
[ ] Is there a procedure for temporarily raising sampling rates during an incident?

Limitations and trade-offs

Before committing to this architecture, the team should be clear on what it costs and what it cannot do.

Tail sampling memory is your primary operational risk. When a traffic spike pushes more simultaneous open traces than num_traces allows, the oldest traces are evicted before a sampling decision is made — and you lose data with no alerting unless you instrument the Collector's own metrics. This is not a warning you can configure away; it is a consequence of stateful buffering at scale.

Late spans corrupt tail sampling decisions. If a span arrives after the decision_wait timeout, it is processed as a new root trace. Its parent trace may have already been dropped. These orphaned late spans are visible in the backend as short single-span traces with no apparent parent — a diagnostic signal that your decision_wait is too short or a service has unusually high processing latency.

Context propagation is not automatic everywhere. Auto-instrumentation handles HTTP and gRPC. Everything else — message queues, cron jobs, batch processing, custom protocols — requires explicit propagation code. This is not a limitation you can configure around; it requires application changes in each service that runs async work.

Kubernetes metadata enrichment has a startup race. Spans emitted in the first few seconds of pod startup may arrive at the agent before the Kubernetes API has propagated the pod metadata. These spans are enriched partially or not at all. This is expected behavior; do not alert on missing k8s.pod.name without filtering out spans from pods younger than thirty seconds.

What to take from this

The architecture in this guide is not the simplest possible OTel setup. It is the minimum that works reliably at production scale in a multi-service Kubernetes cluster. The three decisions that matter most, and that hello-world examples do not force you to make:

Where context propagation breaks. It will break at the first async boundary, the first ingress that strips headers, and the first service written in a different language with a different propagator configured. Finding those breaks before your users do requires explicit tests — not assuming auto-instrumentation handled everything.

How much buffer tail sampling is needed. The num_traces limit is not a suggestion. When it is exceeded, the oldest traces are evicted silently. Size it for your burst traffic, not your average, and alert on memory pressure before the buffer fills.

That the Collector is now your infrastructure. Once services depend on it for telemetry, it needs the same operational treatment as your ingress or DNS: resource limits, readiness probes, a runbook, and someone on call who knows how to restart it without losing in-flight traces.

The gap between "OTel installed" and "OTel working" is mostly these three things. The YAML gets you there. The operational discipline keeps you there.