Horizontal Scaling PostgreSQL with Citus: A Practical Deep Dive

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    Horizontal Scaling PostgreSQL with Citus: A Practical Deep Dive

    Introduction

    As databases grow beyond the capabilities of a single server, teams face a critical decision: scale vertically by adding more resources to one machine, or scale horizontally by distributing data across multiple servers. While vertical scaling eventually hits physical and economic limits, horizontal scaling offers virtually unlimited growth potential.


    In this hands-on guide, we'll explore Citus—an open-source extension that transforms PostgreSQL into a distributed database. We'll build a real Citus cluster from scratch, demonstrate data distribution, implement foreign key relationships across distributed tables, and understand the principles that make horizontal scaling work.


    By the end, you'll have practical experience with distributed databases and understand both their power and their limitations.





    What is Horizontal Scaling?

    Imagine a pizza restaurant with one chef making all the pizzas. During rush hour, orders pile up because one person can only work so fast. This is vertical scaling—you could train the chef to work faster (add more CPU), give them better tools (add more RAM), or expand their workspace (add more storage). But eventually, you hit limits.


    Now imagine hiring three chefs, each handling different orders. This is horizontal scaling—you distribute the workload across multiple workers. The more customers you have, the more chefs you can add.


    In database terms:
    • Vertical scaling: Bigger server (more CPU, RAM, storage)
    • Horizontal scaling: More servers working together


    Citus brings horizontal scaling to PostgreSQL by:

    1. Splitting your data across multiple worker nodes (sharding)
    2. Keeping related data together on the same node (co-location)
    3. Maintaining PostgreSQL's ACID guarantees and SQL compatibility





    Architecture: Understanding Citus Components

    A Citus cluster consists of at least three components:


    1. Coordinator Node

    The coordinator is like a project manager—it doesn't do the heavy lifting but knows where everything is. When you connect to a Citus cluster, you connect to the coordinator. It:
    • Routes queries to the appropriate workers
    • Combines results from multiple workers
    • Manages distributed transactions
    • Stores metadata about data distribution


    2. Worker Nodes

    Workers are the muscle of your cluster—they store the actual data and execute queries. Each worker:
    • Holds a subset of your data (shards)
    • Executes queries locally on its data
    • Communicates with other workers when needed
    • Functions as a full PostgreSQL instance


    3. Shards

    Shards are pieces of your distributed tables. Think of them as mini-tables that together form your complete dataset. When you distribute a table:
    • Citus creates 32 shards by default (configurable)
    • Shards are distributed across workers
    • Each row goes to exactly one shard based on a hash function
    • Related data can be co-located on the same worker





    Setting Up a Citus Cluster on Digital Ocean

    For this demonstration, we'll create a three-node cluster:
    • citus-01: Coordinator node
    • citus-02: Worker 1
    • citus-03: Worker 2


    Step 1: Install PostgreSQL and Citus Extension

    On all three droplets:






    # Add PostgreSQL repository
    sudo apt-get update
    sudo apt-get install -y postgresql-common
    sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh

    # Install PostgreSQL 16
    sudo apt-get install -y postgresql-16

    # Install Citus extension
    curl https://install.citusdata.com/community/deb.sh | sudo bash
    sudo apt-get install -y postgresql-16-citus-12.1

    # Configure PostgreSQL to load Citus
    echo "shared_preload_libraries = 'citus'" | sudo tee -a /etc/postgresql/16/main/postgresql.conf

    # Restart PostgreSQL
    sudo systemctl restart postgresql







    Step 2: Configure Network Access

    Edit /etc/postgresql/16/main/postgresql.conf on all nodes:






    listen_addresses = '*'







    Edit /etc/postgresql/16/main/pg_hba.conf on all nodes to allow connections:






    host all all 0.0.0.0/0 md5







    Restart PostgreSQL:






    sudo systemctl restart postgresql







    Step 3: Initialize the Coordinator

    On the coordinator node (citus-01):






    -- Connect as postgres user
    sudo -u postgres psql

    -- Create the Citus extension
    CREATE EXTENSION citus;

    -- Add worker nodes to the cluster
    SELECT citus_add_node('10.114.0.11', 5432);
    SELECT citus_add_node('10.114.0.12', 5432);

    -- Verify worker registration
    SELECT * FROM citus_get_active_worker_nodes();







    You should see both worker nodes listed:






    node_name | node_port
    -------------+-----------
    10.114.0.11 | 5432
    10.114.0.12 | 5432







    Step 4: Initialize Workers

    On each worker node (citus-02 and citus-03):






    sudo -u postgres psql
    CREATE EXTENSION citus;







    Your Citus cluster is now operational!





    Demonstration 1: Basic Data Distribution

    Let's create a simple distributed table and observe how data spreads across workers.


    Creating a Distributed Table





    -- On the coordinator
    CREATE TABLE citus_demo (
    id SERIAL,
    user_name TEXT,
    city TEXT,
    age INT,
    created_at TIMESTAMP DEFAULT NOW()
    );

    -- Distribute the table by 'id' column
    SELECT create_distributed_table('citus_demo', 'id');







    The create_distributed_table function tells Citus to:

    1. Create 32 shards (mini-tables) by default
    2. Distribute these shards across available workers
    3. Route future queries based on the distribution column (id)


    Inserting Data





    INSERT INTO citus_demo (user_name, city, age) VALUES
    ('Alice', 'New York', 25),
    ('Bob', 'Chicago', 30),
    ('Charlie', 'Seattle', 27),
    ('David', 'Boston', 35),
    ('Emma', 'Denver', 22),
    ('Frank', 'Portland', 29),
    ('Grace', 'Miami', 28),
    ('Henry', 'Atlanta', 31),
    ('Iris', 'Phoenix', 26),
    ('Jack', 'Dallas', 33);







    Examining Distribution

    Check how shards are distributed:






    SELECT * FROM citus_shards WHERE table_name::text = 'citus_demo';







    Result (abbreviated):






    table_name | shardid | shard_name | nodename | nodeport | shard_size
    ------------+---------+-------------------+-------------+----------+------------
    citus_demo | 102488 | citus_demo_102488 | 10.114.0.11 | 5432 | 16384
    citus_demo | 102489 | citus_demo_102489 | 10.114.0.12 | 5432 | 16384
    citus_demo | 102490 | citus_demo_102490 | 10.114.0.11 | 5432 | 8192
    ... (32 shards total)







    We see 32 shards split between two workers. Some shards have data (16 KB), others are empty (8 KB).


    Understanding Shard Assignment

    See which shard each record belongs to:






    SELECT id, user_name, city,
    get_shard_id_for_distribution_column('citus_demo', id) as shard_id
    FROM citus_demo
    ORDER BY id;







    Result:






    id | user_name | city | shard_id
    ----+-----------+----------+----------
    1 | Alice | New York | 102489
    2 | Bob | Chicago | 102512
    3 | Charlie | Seattle | 102503
    4 | David | Boston | 102496
    5 | Emma | Denver | 102494
    6 | Frank | Portland | 102508
    7 | Grace | Miami | 102496
    8 | Henry | Atlanta | 102488
    9 | Iris | Phoenix | 102516
    10 | Jack | Dallas | 102492







    Notice that David (id=4) and Grace (id=7) share shard 102496—this happens when their IDs hash to the same shard. With only 10 records across 32 shards, collisions and empty shards are normal.


    Viewing Data on Individual Workers

    Connect directly to Worker 1 (10.114.0.11):






    -- Query a specific shard
    SELECT * FROM citus_demo_102488;







    Result:






    id | user_name | city | age | created_at
    ----+-----------+---------+-----+----------------------------
    8 | Henry | Atlanta | 31 | 2026-01-23 22:42:50.523619







    Worker 1 only stores the data in its shards. Query other shards on this worker:






    SELECT * FROM citus_demo_102512; -- Bob
    SELECT * FROM citus_demo_102496; -- David and Grace







    Connect to Worker 2 (10.114.0.12) and query:






    SELECT * FROM citus_demo_102489; -- Alice
    SELECT * FROM citus_demo_102503; -- Charlie







    Key insight: Each worker only stores a subset of the data. When you query the main table through the coordinator, Citus automatically fetches data from all workers and combines the results.





    Demonstration 2: Foreign Keys and Co-location

    One of the biggest challenges in distributed databases is maintaining relationships between tables. Let's explore how Citus handles foreign keys through co-location.


    The Challenge

    Imagine you have users and their orders:
    • Users table: Alice, Bob, Charlie
    • Orders table: Order 1 (Alice's pizza), Order 2 (Bob's burger), Order 3 (Alice's soda)


    In a distributed setup, what if:
    • Alice's user record is on Worker 1
    • Alice's orders are on Worker 2


    When inserting an order for Alice, Worker 2 must verify Alice exists—but Alice is on Worker 1! This requires expensive cross-node communication for every foreign key check.


    The Solution: Co-location

    Co-location ensures related data lives on the same worker. Citus achieves this by distributing both tables using the same column.


    Creating Co-located Tables





    -- Create users table
    CREATE TABLE users (
    user_id INT PRIMARY KEY,
    name TEXT
    );

    -- Create orders table with composite primary key
    CREATE TABLE orders (
    order_id SERIAL,
    user_id INT,
    item TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (order_id, user_id) -- Must include distribution column
    );

    -- Distribute both tables by user_id
    SELECT create_distributed_table('users', 'user_id');
    SELECT create_distributed_table('orders', 'user_id');

    -- Add foreign key constraint
    ALTER TABLE orders
    ADD CONSTRAINT fk_user
    FOREIGN KEY (user_id) REFERENCES users(user_id);







    Important: The orders primary key includes user_id because Citus requires the distribution column in unique constraints. This ensures uniqueness can be verified locally on each worker.


    Inserting Related Data





    INSERT INTO users (user_id, name) VALUES
    (1, 'Alice'),
    (2, 'Bob'),
    (3, 'Charlie'),
    (4, 'David'),
    (5, 'Emma');

    INSERT INTO orders (user_id, item) VALUES
    (1, 'Pizza'),
    (1, 'Soda'),
    (1, 'Ice Cream'),
    (2, 'Burger'),
    (2, 'Fries'),
    (3, 'Taco'),
    (3, 'Burrito'),
    (3, 'Nachos'),
    (4, 'Pasta'),
    (5, 'Salad');







    Verifying Co-location

    Check which shards contain user and order data:






    SELECT
    u.user_id,
    u.name,
    o.item,
    get_shard_id_for_distribution_column('users', u.user_id) as user_shard,
    get_shard_id_for_distribution_column('orders', o.user_id) as order_shard
    FROM users u
    JOIN orders o ON u.user_id = o.user_id
    ORDER BY u.user_id, o.order_id;







    Result:






    user_id | name | item | user_shard | order_shard
    ---------+---------+-----------+------------+-------------
    1 | Alice | Pizza | 102521 | 102553
    1 | Alice | Soda | 102521 | 102553
    1 | Alice | Ice Cream | 102521 | 102553
    2 | Bob | Burger | 102544 | 102576
    2 | Bob | Fries | 102544 | 102576
    3 | Charlie | Taco | 102535 | 102567
    3 | Charlie | Burrito | 102535 | 102567
    3 | Charlie | Nachos | 102535 | 102567







    Wait, the shard IDs are different! This might seem wrong, but it's actually correct.


    Understanding Different Shard IDs

    The users table has its own set of shards (102521, 102544, 102535...), and the orders table has different shard IDs (102553, 102576, 102567...). However, what matters is that they're on the same physical worker node.


    Verify this:






    SELECT
    table_name,
    shardid,
    nodename
    FROM citus_shards
    WHERE shardid IN (102521, 102553) -- Alice's user and order shards
    ORDER BY nodename, table_name;







    Result:






    table_name | shardid | nodename
    ------------+---------+-------------
    orders | 102553 | 10.114.0.11
    users | 102521 | 10.114.0.11







    Both shards are on Worker 1 (10.114.0.11)! This is co-location—different shard tables, same physical location.


    Think of it like two filing cabinets (users and orders) in the same office. Even though they're separate cabinets with different drawer labels, they're in the same room, so finding related information is instant.


    Testing Foreign Key Constraints

    Try inserting an order for a non-existent user:






    INSERT INTO orders (user_id, item) VALUES (999, 'Ghost Pizza');







    Error:






    ERROR: insert or update on table "orders" violates foreign key constraint "orders_user_id_fkey"
    DETAIL: Key (user_id)=(999) is not present in table "users".







    Perfect! The foreign key constraint works. Because user_id=999 doesn't exist, and thanks to co-location, the worker can check this locally without network calls to other workers.


    Performing JOINs

    Count orders per user:






    SELECT
    u.user_id,
    u.name,
    COUNT(o.order_id) as num_orders
    FROM users u
    LEFT JOIN orders o ON u.user_id = o.user_id
    GROUP BY u.user_id, u.name
    ORDER BY u.user_id;







    Result:






    user_id | name | num_orders
    ---------+---------+------------
    1 | Alice | 3
    2 | Bob | 2
    3 | Charlie | 3
    4 | David | 1
    5 | Emma | 1







    This JOIN is fast because each worker can join its local user and order data without coordinating with other workers. This is the power of co-location.





    Key Concepts and Best Practices

    1. Distribution Column Selection

    Choose your distribution column carefully:


    Good choices:
    • user_id for multi-tenant applications (each tenant's data together)
    • account_id for SaaS applications
    • customer_id for e-commerce


    Bad choices:
    • created_at (causes temporal hotspots)
    • Low-cardinality columns (poor distribution)
    • Columns that change frequently


    Rule of thumb: Pick a column that:
    • Appears in most queries (for query efficiency)
    • Enables co-location of related tables
    • Has high cardinality (many unique values)
    • Remains stable (doesn't change)


    2. Co-location Requirements

    For foreign keys to work across distributed tables:

    1. Same distribution column: Both parent and child tables must be distributed by the same column
    2. Include in constraints: The distribution column must be part of PRIMARY KEY and UNIQUE constraints
    3. Foreign key on distribution column: The foreign key must reference the distribution column


    3. Shard Count

    Default: 32 shards per table
    • Too few shards: Limits parallelism and future scalability
    • Too many shards: Increases metadata overhead and query planning time


    For most applications, 32-128 shards works well. Adjust based on:
    • Expected data size
    • Number of workers (more workers = more shards beneficial)
    • Query patterns


    4. When to Use Citus

    Good use cases:
    • Multi-tenant SaaS applications
    • Real-time analytics on time-series data
    • High-throughput transactional workloads
    • Applications that exceed single-server capacity
    • Workloads with natural sharding keys


    Not ideal for:
    • Small databases (< 100GB)
    • Workloads requiring many cross-shard JOINs
    • Applications with no clear distribution key
    • Use cases where a single PostgreSQL instance suffices





    Performance Considerations

    Query Routing

    Citus routes queries differently based on their scope:


    1. Single-shard queries (fastest):






    -- Routed to one worker because user_id=1 uniquely determines shard
    SELECT * FROM orders WHERE user_id = 1;







    2. Co-located JOINs (fast):






    -- Each worker joins its local data, then results are combined
    SELECT u.name, COUNT(o.*)
    FROM users u
    JOIN orders o ON u.user_id = o.user_id
    GROUP BY u.name;







    3. Broadcast queries (slower):






    -- Must query all shards and combine results
    SELECT * FROM orders WHERE item = 'Pizza';







    4. Cross-shard queries (slowest):






    -- Requires moving data between workers
    SELECT o.item, p.product_name
    FROM orders o
    JOIN products p ON o.product_id = p.id
    WHERE o.user_id = 1;
    -- Only works if products is a reference table or co-located







    Reference Tables

    For small lookup tables (countries, products, categories), use reference tables:






    CREATE TABLE countries (
    code CHAR(2) PRIMARY KEY,
    name TEXT
    );

    -- Replicate to all workers
    SELECT create_reference_table('countries');







    Reference tables are fully replicated to every worker, enabling efficient JOINs with distributed tables.





    Monitoring and Maintenance

    Checking Cluster Health





    -- View active workers
    SELECT * FROM citus_get_active_worker_nodes();

    -- Check shard distribution
    SELECT nodename, COUNT(*) as shard_count
    FROM citus_shards
    GROUP BY nodename;

    -- Monitor distributed queries
    SELECT * FROM citus_stat_statements;

    -- Check replication lag (if using HA setup)
    SELECT * FROM citus_replication_status();







    Rebalancing Shards

    As you add workers, rebalance shards:






    -- Move shards to balance load
    SELECT citus_rebalance_start();

    -- Check rebalance progress
    SELECT * FROM citus_rebalance_status();













    Conclusion

    We've successfully demonstrated horizontal scaling with Citus by:

    1. Building a distributed cluster with one coordinator and two workers
    2. Creating distributed tables and observing how data spreads across shards
    3. Implementing foreign key relationships through co-location
    4. Verifying co-location by checking that related data resides on the same workers
    5. Testing foreign key constraints to ensure data integrity
    6. Performing distributed JOINs that execute efficiently due to co-location


    Citus transforms PostgreSQL into a horizontally scalable database while maintaining SQL compatibility and ACID guarantees. By distributing data intelligently and co-locating related tables, it achieves impressive performance even as datasets grow beyond single-server capacity.


    The Elephant in the Room: Single Points of Failure

    However, our current setup has a critical limitation: every node is a single point of failure.


    Consider these scenarios:
    • Coordinator failure: If citus-01 goes down, your entire application stops. No queries can be routed, no data can be accessed, even though workers are healthy.
    • Worker failure: If citus-02 crashes, all data on that worker becomes unavailable. Half your users suddenly can't access their orders.
    • Maintenance downtime: Need to apply a security patch? You'll have to take the coordinator offline, causing application downtime.


    In production environments, single points of failure are unacceptable. Your database must survive:
    • Hardware failures
    • Network issues
    • Planned maintenance
    • Software crashes
    • Data center outages


    What's Next: High Availability with Patroni

    In our next article, we'll solve these availability challenges by integrating Patroni—an open-source high availability solution for PostgreSQL. We'll transform our vulnerable single-node cluster into a highly available system:


    Architecture we'll build:
    • Coordinator cluster: Primary + Standby (automatic failover)
    • Worker 1 cluster: Primary + Standby
    • Worker 2 cluster: Primary + Standby
    • etcd cluster: Distributed consensus for leader election


    What you'll learn:
    • Setting up Patroni with etcd for distributed consensus
    • Configuring automatic failover for coordinators and workers
    • Testing failure scenarios (simulating crashes)
    • Monitoring cluster health and replication lag
    • Performing switchovers for maintenance
    • Understanding trade-offs between availability and consistency


    By combining Citus's horizontal scalability with Patroni's high availability, you'll have a production-ready distributed database that can:
    • Scale to billions of rows
    • Handle millions of queries per second
    • Survive node failures automatically
    • Support zero-downtime maintenance
    • Provide strong consistency guarantees


    Stay tuned for "High Availability for Citus: Implementing Automatic Failover with Patroni" where we'll make this scalable database truly resilient.





    Resources






    Special thanks to the Citus team for building such an elegant distributed database solution.




    More...
Working...