CI Guardian: Safe Human-in-the-Loop AI CI Remediation

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    CI Guardian: Safe Human-in-the-Loop AI CI Remediation

    This is a submission for the GitHub Copilot CLI Challenge


    What I Built

    CI Guardian is implemented as a GitHub CLI extension (gh ci-guardian) and runs entirely from the terminal, integrating GitHub Actions logs with GitHub Copilot CLI for safe, human-in-the-loop remediation.


    Instead of blindly applying AI-generated patches, CI Guardian analyzes real CI logs, summarizes the failure, and attempts a minimal fix only if it’s low-risk. If the fix is unclear or unsafe, it stops and leaves the decision to a human.


    The tool can:
    • Diagnose CI failures with structured root-cause analysis
    • Attempt minimal, semantic fixes
    • Automatically open PRs only when patches apply cleanly
    • Refuse unsafe or low-confidence fixes and escalate to a human when necessary


    I tested CI Guardian on both a small demo repo and a real fork of Flask, including scenarios with fork permissions, pull-request-only CI, and multiple workflows.


    Demo

    Repository:


    Contribute to sasubillis/gh-ci-guardian development by creating an account on GitHub.



    The extension entrypoint maps directly to ci_guardian/cli.py, which handles run discovery, log extraction, Copilot prompting, patch validation, and PR creation.


    All screenshots below were captured against real repositories with real failing CI runs, including a fork of Flask to demonstrate behavior on a production-scale codebase.


    Example usage:




    # Diagnose the latest failing CI run
    gh ci-guardian diagnose --latest --branch all

    # Attempt a safe fix and open a PR if possible
    gh ci-guardian fix --latest --branch all





    What the demo shows:
    • CI failures diagnosed into structured JSON
    • Copilot-generated unified diffs
    • Automatic PR creation when patches are safe
    • Graceful refusal with preserved diffs when fixes are unsafe (human-in-the-loop)


    This behavior was demonstrated on a real Flask fork where CI failures only surface on pull requests, not direct pushes.


    Diagnosis on Failing CI with demo repo





    Fix made by ci-guardian on demo repo





    PR opened in GitHub by ci-guardian

    When a fix is safe and minimal, CI Guardian automatically opens a remediation pull request.




    Diagnosis on Failing CI on real repo (Flask)

    CI Guardian converts a real failing GitHub Actions run into a structured, machine-readable diagnosis using GitHub Copilot CLI.




    Human-in-the-loop Intervention

    CI Guardian safely refuses to auto-fix an ambiguous CI failure on a real Flask fork and escalates to human review.





    My Experience with GitHub Copilot CLI

    GitHub Copilot CLI was used as a reasoning engine, not a blind code generator. I used copilot -p to:
    • Summarize CI logs into structured root-cause explanations
    • Generate minimal unified diffs grounded in real failure logs
    • Draft concise pull request titles and descriptions


    The key insight was that Copilot is most effective when paired with strict guardrails. CI Guardian treats Copilot output as a proposal, not a command, and enforces safety checks before applying any change. This results in automation that accelerates debugging without sacrificing trust or correctness.




    More...
Working...