Part 2: Process Managers - Keeping Your App Alive with PM2

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    Part 2: Process Managers - Keeping Your App Alive with PM2

    Series: From "Just Put It on a Server" to Production DevOps


    Reading time: 10 minutes


    Level: Beginner-friendly



    Quick Recap

    In Part 1, we deployed our Sales Signal Processing Platform to a Linode server the manual way. It worked... until:
    • We closed our SSH session (app died)
    • The app crashed (stayed dead)
    • The server rebooted (app didn't restart)


    Today's mission: Keep the app alive without babysitting it.



    The Problem: Processes Are Fragile

    Let's simulate what happens in production.


    SSH into your server and start the API:






    cd /opt/sspp/services/api
    npm start &







    Now kill it on purpose:






    # Find the process ID
    ps aux | grep node

    # Kill it
    kill -9








    Test the API:






    curl http://localhost:3000/api/v1/health







    Dead. And it's not coming back.


    In production, processes die for many reasons:
    • Unhandled exceptions
    • Memory leaks (OOM killer strikes)
    • Dependency failures (database connection lost)
    • Random cosmic rays (yes, really)


    You need something that automatically restarts your app.





    Enter PM2: Process Manager 2

    PM2 is a production-grade process manager for Node.js applications. Think of it as a babysitter that:

    1. Keeps your app running - Restarts on crash
    2. Survives reboots - Starts on system boot
    3. Manages logs - Aggregates stdout/stderr
    4. Monitors resources - CPU, memory usage
    5. Zero-downtime reloads - Update without dropping connections


    Why PM2? It's battle-tested, actively maintained, and used by thousands of companies in production.





    Installation

    SSH into your server:






    # Install PM2 globally
    npm install -g pm2

    # Verify
    pm2 --version







    Simple. Now let's use it.





    Running Your App with PM2

    Basic Usage

    Instead of npm start, use PM2:






    cd /opt/sspp/services/api

    # Start the app
    pm2 start npm --name "sspp-api" -- start

    # Check status
    pm2 status







    Output:






    ┌─────┬──────────────┬─────────┬──────┬───────┬─── ─────┬─────────┬────────┬──────┬───────────┬────── ────┐
    │ id │ name │ mode │ ↺ │ status│ cpu │ memory │
    ├─────┼──────────────┼─────────┼──────┼───────┼─── ─────┼─────────┤
    │ 0 │ sspp-api │ fork │ 0 │ online│ 0% │ 45.2mb │
    └─────┴──────────────┴─────────┴──────┴───────┴─── ─────┴─────────┘







    Your app is now:
    • Named (no more anonymous PIDs)
    • Monitored (PM2 watches it)
    • Managed (can be controlled by name)


    Better: Use an Ecosystem File

    Create a PM2 configuration file:






    cd /opt/sspp
    cat > ecosystem.config.js <<'EOF'
    module.exports = {
    apps: [
    {
    name: 'sspp-api',
    cwd: '/opt/sspp/services/api',
    script: 'npm',
    args: 'start',
    instances: 1,
    autorestart: true,
    watch: false,
    max_memory_restart: '500M',
    env: {
    NODE_ENV: 'production',
    PORT: 3000,
    DB_HOST: 'localhost',
    DB_PORT: 5432,
    DB_NAME: 'sales_signals',
    DB_USER: 'sspp_user',
    DB_PASSWORD: 'sspp_password',
    REDIS_HOST: 'localhost',
    REDIS_PORT: 6379,
    ELASTICSEARCH_URL: 'http://localhost:9200',
    },
    error_file: '/var/log/sspp/api-error.log',
    out_file: '/var/log/sspp/api-out.log',
    time: true,
    },
    {
    name: 'sspp-worker',
    cwd: '/opt/sspp/services/worker',
    script: 'npm',
    args: 'start',
    instances: 2,
    autorestart: true,
    watch: false,
    max_memory_restart: '500M',
    env: {
    NODE_ENV: 'production',
    DB_HOST: 'localhost',
    DB_PORT: 5432,
    DB_NAME: 'sales_signals',
    DB_USER: 'sspp_user',
    DB_PASSWORD: 'sspp_password',
    REDIS_HOST: 'localhost',
    REDIS_PORT: 6379,
    ELASTICSEARCH_URL: 'http://localhost:9200',
    QUEUE_NAME: 'sales-events',
    },
    error_file: '/var/log/sspp/worker-error.log',
    out_file: '/var/log/sspp/worker-out.log',
    time: true,
    },
    ],
    };
    EOF







    What this does:
    • Defines both services (API + Worker) in one place
    • Sets environment variables (no more .env files to manage)
    • Configures resources (max memory before restart)
    • Organizes logs (separate files for each service)
    • Runs multiple workers (2 worker instances for parallel processing)


    Create log directory:






    mkdir -p /var/log/sspp







    Start everything:






    pm2 start ecosystem.config.js

    # Check status
    pm2 status







    Output:






    ┌─────┬──────────────┬─────────┬──────┬───────┬─── ─────┬─────────┐
    │ id │ name │ mode │ ↺ │ status│ cpu │ memory │
    ├─────┼──────────────┼─────────┼──────┼───────┼─── ─────┼─────────┤
    │ 0 │ sspp-api │ fork │ 0 │ online│ 1.2% │ 48.3mb │
    │ 1 │ sspp-worker │ fork │ 0 │ online│ 0.8% │ 42.1mb │
    │ 2 │ sspp-worker │ fork │ 0 │ online│ 0.7% │ 41.8mb │
    └─────┴──────────────┴─────────┴──────┴───────┴─── ─────┴─────────┘







    Now you have:
    • 1 API instance
    • 2 Worker instances (for parallel event processing)
    • All managed by PM2





    Testing Auto-Restart

    Let's intentionally crash the API:






    # Find the process ID
    pm2 list

    # Kill the API process
    pm2 delete sspp-api
    pm2 start ecosystem.config.js --only sspp-api

    # Now kill it brutally
    kill -9 $(pgrep -f "sspp-api")







    Wait 1 second, then check:






    pm2 status







    The ↺ (restart count) increases! PM2 automatically restarted it.


    Test the API:






    curl http://localhost:3000/api/v1/health







    Still alive. 🎉





    Startup Script: Survive Reboots

    The app survives crashes now. But what about server reboots?






    # Generate startup script
    pm2 startup systemd

    # Follow the command it prints (looks like):
    # sudo env PATH=$PATH:/usr/bin pm2 startup systemd -u root --hp /root







    Run that sudo command it generates.


    Save the current PM2 process list:






    pm2 save







    This creates /root/.pm2/dump.pm2 with your process configuration.


    Test it:






    # Reboot the server
    sudo reboot







    Wait 30 seconds, SSH back in:






    pm2 status







    Your apps are running! Without you doing anything.





    Managing Your Apps

    View Logs





    # All logs (combined)
    pm2 logs

    # Specific app
    pm2 logs sspp-api

    # Last 100 lines
    pm2 logs sspp-api --lines 100

    # Live tail
    pm2 logs sspp-worker --lines 0







    Monitor Resources





    pm2 monit







    This opens an interactive dashboard showing:
    • CPU usage
    • Memory usage
    • Logs (live stream)


    Press Ctrl+C to exit.


    Restart/Reload





    # Restart (kills and starts)
    pm2 restart sspp-api

    # Reload (zero-downtime, only works for cluster mode)
    pm2 reload sspp-api

    # Restart all
    pm2 restart all







    Stop/Delete





    # Stop (keeps in PM2 list)
    pm2 stop sspp-api

    # Delete (removes from PM2 list)
    pm2 delete sspp-api

    # Stop all
    pm2 stop all

    # Delete all
    pm2 delete all










    Cluster Mode (Bonus: Load Balancing)

    PM2 can run multiple instances of your app and load-balance between them:






    // In ecosystem.config.js
    {
    name: 'sspp-api',
    script: './dist/main.js', // Direct script, not npm
    instances: 4, // Or 'max' for CPU count
    exec_mode: 'cluster', // Enable cluster mode
    // ... rest of config
    }







    Restart PM2:






    pm2 delete all
    pm2 start ecosystem.config.js







    Now you have 4 API instances behind PM2's built-in load balancer.


    Why this matters:
    • Utilizes all CPU cores
    • Automatic load distribution
    • Zero-downtime reloads (one instance at a time)





    What We Solved

    With PM2, we fixed:


    Automatic restart on crash - App crashes are now recoverable


    Startup on boot - Server reboots don't kill your service


    Log management - Centralized, timestamped logs


    Resource monitoring - Know when memory leaks happen


    Process naming - No more searching for PIDs


    Multi-instance management - Run workers in parallel



    What We Didn't Solve

    PM2 is great, but it doesn't solve:


    "Works on my machine" - Still manual dependency installation


    Environment consistency - Different Node versions, OS differences


    Multi-server scaling - PM2 is single-server only


    Deployment strategy - Still manual git pull, restart


    Rollback capability - No version management


    Network complexity - How do API and Worker discover services?


    Resource isolation - Apps can steal CPU/memory from each other


    PM2 is a massive improvement over raw processes. But we're still managing dependencies manually, and we can't easily scale to multiple servers.



    Real-World PM2 Tips

    1. Always Use Ecosystem Files

    Don't run pm2 start with inline arguments. Use ecosystem.config.js:






    # ❌ Don't do this
    pm2 start npm --name api -- start

    # ✅ Do this
    pm2 start ecosystem.config.js







    2. Set Memory Limits

    Prevent runaway processes:






    {
    max_memory_restart: '500M', // Restart if memory exceeds 500MB
    }







    3. Use Absolute Paths

    Relative paths break when PM2 restarts:






    {
    cwd: '/opt/sspp/services/api', // Absolute path
    script: 'npm', // Not '../../../node_modules/...'
    }







    4. Separate Logs

    Don't dump everything to one file:






    {
    error_file: '/var/log/sspp/api-error.log',
    out_file: '/var/log/sspp/api-out.log',
    }







    5. Use Log Rotation

    Logs grow forever. Set up rotation:






    pm2 install pm2-logrotate
    pm2 set pm2-logrotate:max_size 10M
    pm2 set pm2-logrotate:retain 7










    Production Checklist

    Before going live with PM2:
    • [ ] Ecosystem file configured
    • [ ] Startup script installed (pm2 startup)
    • [ ] Process list saved (pm2 save)
    • [ ] Memory limits set
    • [ ] Log rotation enabled
    • [ ] Monitoring alerts configured (e.g., PM2 Plus)





    What's Next?

    PM2 solves the "keep it running" problem beautifully. But we're still stuck with:
    • Manual dependency management (Node, PostgreSQL, Redis, Elasticsearch)
    • "Works on my machine" syndrome (different environments)
    • Single-server limitations (can't easily scale horizontally)


    In Part 3, we'll tackle these by introducing Docker—containers that package your entire application environment.





    What PM2 Does NOT Fix

    PM2 solves process management, but let's be honest about what's still broken:


    What PM2 fixes:
    • Auto-restarts on crash
    • Survives SSH disconnects
    • Starts on server boot
    • Basic logging and monitoring


    What PM2 does NOT fix:
    • Environment consistency - Still manually installing Node, PostgreSQL, Redis
    • Infrastructure drift - Every server is a unique snowflake
    • Scaling - Can't easily add more servers
    • Dependency conflicts - Node v16 on this server, v18 on that one
    • Reproducibility - "Works on my machine" still exists (just less obviously)
    • Onboarding - New devs still need 2+ hours of setup
    • Rollbacks - No easy way to undo deployments


    The hidden danger:


    PM2 makes things feel professional, which can hide deeper problems.





    Try It Yourself

    Experience what breaks next:

    1. Set up PM2 for both API and Worker services
    2. Enable startup script: pm2 startup && pm2 save
    3. Now try to set up a second server identically
    4. Notice how many steps you have to remember
    5. Notice how easy it is to have version mismatches


    This pain is important. It's why Docker exists.





    Next: Making Environments Consistent

    In Part 3, we'll solve the "works on my machine" problem:


    "How do I package my app so it runs the same everywhere?"


    We'll use Docker to:
    • Freeze dependencies in time
    • Eliminate environment drift
    • Make onboarding instant
    • Enable reliable rollbacks


    But spoiler: Docker solves packaging, not operations. We'll discover what breaks when you have multiple containers.





    Previous: Part 1: The Default Way - Putting an App on a Server


    Next: Part 3: Docker - Freezing the Application in Time

    About the Author


    Building this series to demonstrate real DevOps thinking for my Proton.ai application. If you're hiring for platform engineering roles, let's connect.



    More...
Working...