Stop Random Pod Scheduling: Master Kubernetes Affinity & Anti-Affinity with NGINX (Practical Guide for DevOps & SRE)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    Stop Random Pod Scheduling: Master Kubernetes Affinity & Anti-Affinity with NGINX (Practical Guide for DevOps & SRE)

    When you don’t control where pods land in Kubernetes, you leak performance, reliability, and even cost. The default scheduler is good—but it’s not telepathic. It has no idea that your frontend wants to be near Redis or that your three replicas shouldn’t be sitting on the same node waiting to be killed together.

    That’s exactly why Pod Affinity and Pod Anti-Affinity exist.


    In this article, we’ll break down both concepts in a way that makes sense for real production clusters and then prove scheduling behavior with NGINX deployments.


    What Is Pod Affinity? (Keep Pods Close)


    Pod Affinity instructs Kubernetes to place pods close to other pods that match certain labels.


    Think about:

    App → Cache (low latency)

    API → DB (chatty traffic)

    Microservice → Microservice (tight coupling)


    When two components constantly talk over the network, co-locating them eliminates cross-node hops and sometimes cross-zone cloud charges.

    Common use cases:


    Reduce inter-node network latency


    Co-locate with Redis / Memcached / Kafka controllers

    Keep components in the same failure domain


    What Is Pod Anti-Affinity? (Keep Pods Apart)


    Pod Anti-Affinity does the opposite. It tells Kubernetes:

    “Don’t crowd pods from this group on the same node.”


    This is critical for:


    high availability (HA)

    fault isolation

    failure domain control

    If you run three replicas of your NGINX ingress and they all land on a single node, that node becomes a single point of failure. Anti-Affinity fixes that.


    Hard vs Soft Rules


    There are two modes:

    Hard


    requiredDuringSchedulingIgnoredDuringExecution

    Must satisfy the rule. Scheduler will refuse to place the pod otherwise.


    Soft


    preferredDuringSchedulingIgnoredDuringExecution


    Best effort. Scheduler tries, but won’t block placement.

    In production, HA components generally use hard Anti-Affinity.


    Deploying Pod Affinity with NGINX (Proof: Pods Stay Close)


    For demonstration, we’ll co-locate NGINX with Redis.


    Step 1: Deploy Redis


    Yaml

    apiVersion: apps/v1

    kind: Deployment

    metadata:

    name: redis

    spec:

    replicas: 1

    selector:

    matchLabels:

    app: redis

    template:

    metadata:

    labels:

    app: redis

    spec:

    containers:

    - name: redis

    image: redis


    Step 2: Deploy NGINX with Pod Affinity


    Yaml

    apiVersion: apps/v1

    kind: Deployment

    metadata:

    name: nginx-affinity

    spec:

    replicas: 2

    selector:

    matchLabels:

    app: nginx-affinity

    template:

    metadata:

    labels:

    app: nginx-affinity

    spec:

    affinity:

    podAffinity:

    requiredDuringSchedulingIgnoredDuringExecution:

    - labelSelector:

    matchLabels:

    app: redis

    topologyKey: kubernetes.io/hostname

    containers:

    - name: nginx

    image: nginx


    Verify Placement


    kubectl get pods -o wide


    Expected output style:


    redis-7dfc6 Running node1

    nginx-affinity-fc77 Running node1

    nginx-affinity-d6b8 Running node1


    Proof achieved: scheduler co-located pods on same node to reduce latency.


    Deploying Pod Anti-Affinity with NGINX (Proof: Pods Stay Apart)


    Now let’s use 3 replicas and force them to spread for HA.


    Deployment YAML


    Yaml

    apiVersion: apps/v1

    kind: Deployment

    metadata:

    name: nginx-anti

    spec:

    replicas: 3

    selector:

    matchLabels:

    app: nginx-anti

    template:

    metadata:

    labels:

    app: nginx-anti

    spec:

    affinity:

    podAntiAffinity:

    requiredDuringSchedulingIgnoredDuringExecution:

    - labelSelector:

    matchLabels:

    app: nginx-anti

    topologyKey: kubernetes.io/hostname

    containers:

    - name: nginx

    image: nginx

    Verify Placement


    kubectl get pods -o wide


    Expected output:


    nginx-anti-1 Running node1

    nginx-anti-2 Running node2

    nginx-anti-3 Running node3


    Proof achieved: replicas spread across nodes and no single node failure can kill the service completely.


    Why DevOps & SRE Actually Use This


    This isn’t theoretical. You will use Affinity/Anti-Affinity frequently if you operate any non-toy cluster.

    Real industry scenarios include:


    For performance

    API close to Redis or Memcached

    Kafka brokers near Zookeeper

    Elasticsearch hot nodes near collectors


    For reliability


    NGINX ingress replicas spread across nodes

    Prometheus and Alertmanager separated

    Kafka brokers in separate availability zones


    For cost

    Prevent cross-zone data transfer pricing (AWS/GCP)


    For compliance

    Keep workloads within geo or legal boundaries

    This is what separates “can deploy YAML” engineers from real systems engineers.


    Topology and Failure Domain Awareness


    Affinity only becomes truly valuable when you incorporate topology. Kubernetes exposes multiple topology keys such as:

    node (hostname)

    rack

    datacenter

    zone

    region


    Cloud vendor defaults include:

    topology.kubernetes.io/zone

    topology.kubernetes.io/region

    Real systems use these to ensure brokers like Kafka or MongoDB never share the same zone, because losing a zone must not kill quorum.


    Topology Spread Constraints (The Modern Alternative)


    Kubernetes also provides a more generalized mechanism: Topology Spread Constraints.

    These constraints allow you to declare that pods must be evenly distributed across topologies without explicitly referring to other pods.


    Example:


    Yaml

    topologySpreadConstraints:
    • maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
      matchLabels:
      app: nginx


    This makes Kubernetes behave like a distributed systems scheduler instead of a simple bin-packer.


    Scheduler Strategy for Distributed Systems


    Affinity and anti-affinity patterns become essential in stateful or quorum-based systems like:


    Kafka brokers

    Zookeeper

    Etcd

    MongoDB

    Cassandra

    Elasticsearch masters

    Consul

    TiDB

    Vitess


    These systems are built around failure domains and must survive node or zone loss. Anti-affinity enforces quorum safety. Affinity often co-locates controllers near storage engines.


    What Real DevOps & SRE Teams Actually Do


    Once you enter real production, affinity is used alongside:


    multi-AZ deployments

    cluster autoscalers

    pod disruption budgets

    node taints and tolerations

    resource quotas

    priority classes


    Affinity does not exist in isolation. It is part of expressing system intent to schedulers.


    Understanding the Scheduler Internals


    Kubernetes performs two phases during scheduling:


    Filtering Nodes that violate hard rules are excluded.


    Scoring Remaining nodes are ranked based on:




    availability
    resource pressure
    topology
    affinity/anti-affinity weights.





    The highest score wins deployment. Hard rules gate placement; soft rules modulate scoring.


    This makes Kubernetes schedulers deterministic under constraints and intelligent under preference.


    Key Takeaways


    Pod Affinity = keep pods together for performance


    Pod Anti-Affinity = keep pods apart for HA


    Hard rules block scheduling

    Soft rules influence but don’t block


    Verified behavior using NGINX deployments

    Useful for DevOps, SRE, platform and infra engineers running real workloads




    More...
Working...