Optimizing PostgreSQL Queries: From 820 ms to 120 ms with Indexing

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    Optimizing PostgreSQL Queries: From 820 ms to 120 ms with Indexing

    Introduction:


    When I first built HabitHero, my gamified habit-tracking app, I ran into an unexpected bottleneck. slow queries. As more test users added tasks and streak data, my dashboards and API endpoints started lagging.


    At first, I thought the problem was my code. But when I dug deeper, I realized the real culprit was the database. PostgreSQL was scanning far more rows than necessary.


    By carefully applying indexes, I cut query times by nearly 85% — a change that instantly made the app feel smoother. In this post, I’ll walk through exactly how indexing turned HabitHero’s queries from sluggish to snappy.


    The Dataset (Simplified Example)


    HabitHero tracks tasks, streaks, and categories, but to make this reproducible, I’ll use a simple orders dataset with 1 million rows. The idea is the same: filter queries on frequently accessed fields (like customer_id in this example, or user_id in HabitHero).






    CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date TIMESTAMP NOT NULL,
    total_amount NUMERIC(10,2) NOT NULL
    );

    -- Insert 1M rows for simulation
    INSERT INTO orders (customer_id, order_date, total_amount)
    SELECT (random()*10000)::INT,
    NOW() - (random() * interval '365 days'),
    (random()*500)::NUMERIC(10,2)
    FROM generate_series(1, 1000000);







    The Query (Baseline)


    Let’s say we need all orders for a given customer in the last year (similar to “get all tasks completed by a user in the last 30 days” in HabitHero):






    SELECT *
    FROM orders
    WHERE customer_id = 1234
    AND order_date >= NOW() - INTERVAL '1 year';







    Baseline performance (no indexes)






    EXPLAIN ANALYZE
    SELECT *
    FROM orders
    WHERE customer_id = 1234
    AND order_date >= NOW() - INTERVAL '1 year';







    Result was:






    Seq Scan on orders (cost=0.00..19345.00 rows=5000 width=48)
    Filter: ((customer_id = 1234) AND (order_date >= (now() - '1 year'::interval)))
    Execution Time: 820 ms







    Now lets Add Indexes:


    Step 1: Index on customer_id






    CREATE INDEX idx_customer_id ON orders(customer_id);







    and re-run query:






    Index Scan using idx_customer_id on orders
    Execution Time: 220 ms







    Step 2: Composite Index (customer_id, order_date)






    CREATE INDEX idx_customer_date ON orders(customer_id, order_date);







    Re-run query:






    Index Scan using idx_customer_date on orders
    Execution Time: 120 ms







    Testing More Queries


    Query 1: Range filter (amount > 400)






    SELECT *
    FROM orders
    WHERE total_amount > 400;
    • Without index: 620 ms
    • With index on total_amount: 130 ms


    In HabitHero, I noticed the same effect when filtering tasks by XP points earned.


    Query 2: Sorting by date






    SELECT *
    FROM orders
    WHERE customer_id = 1234
    ORDER BY order_date DESC
    LIMIT 10;
    • Without index: 710 ms
    • With composite index: 140 ms


    For HabitHero, this helped in showing “last 10 completed tasks” instantly.


    customer + date Filter 820ms 220ms 120ms
    Amount > 400 620ms 130ms N/A
    Sort by Date 710ms N/A 140ms


    Lessons Learned
    • Index what you query often, user_id and created_at solved 80% of my slow queries in HabitHero.
    • Don’t over index, I tried indexing everything once and inserts slowed badly.
    • Measure before & after, Always use EXPLAIN ANALYZE.
    • Think ahead, Design indexes for future dashboard/reporting needs too.


    Conclusion


    Indexes turned HabitHero’s slow queries into snappy responses — the same way they transformed this simulated dataset.


    On paper, the difference between 820 ms and 120 ms seems small. In practice, those milliseconds compound into massive time and cost savings at scale.


    If you’re building an app like HabitHero (or any system with user-driven queries), proper indexing is one of the highest ROI optimizations you can make.




    More...
Working...