CI Guardian: Safe Human-in-the-Loop AI CI Remediation

**MyrinNew** · 01-27-2026, 02:51 AM

This is a submission for the GitHub Copilot CLI Challenge

What I Built

CI Guardian is implemented as a GitHub CLI extension (gh ci-guardian) and runs entirely from the terminal, integrating GitHub Actions logs with GitHub Copilot CLI for safe, human-in-the-loop remediation.

Instead of blindly applying AI-generated patches, CI Guardian analyzes real CI logs, summarizes the failure, and attempts a minimal fix only if it’s low-risk. If the fix is unclear or unsafe, it stops and leaves the decision to a human.

The tool can:

Diagnose CI failures with structured root-cause analysis
Attempt minimal, semantic fixes
Automatically open PRs only when patches apply cleanly
Refuse unsafe or low-confidence fixes and escalate to a human when necessary

I tested CI Guardian on both a small demo repo and a real fork of Flask, including scenarios with fork permissions, pull-request-only CI, and multiple workflows.

Demo

Repository:

GitHub - sasubillis/gh-ci-guardian

https://github.com/sasubillis/gh-ci-guardian

Contribute to sasubillis/gh-ci-guardian development by creating an account on GitHub.

The extension entrypoint maps directly to ci_guardian/cli.py, which handles run discovery, log extraction, Copilot prompting, patch validation, and PR creation.

All screenshots below were captured against real repositories with real failing CI runs, including a fork of Flask to demonstrate behavior on a production-scale codebase.

Example usage:

# Diagnose the latest failing CI run
gh ci-guardian diagnose --latest --branch all

# Attempt a safe fix and open a PR if possible
gh ci-guardian fix --latest --branch all

What the demo shows:

CI failures diagnosed into structured JSON
Copilot-generated unified diffs
Automatic PR creation when patches are safe
Graceful refusal with preserved diffs when fixes are unsafe (human-in-the-loop)

This behavior was demonstrated on a real Flask fork where CI failures only surface on pull requests, not direct pushes.

Diagnosis on Failing CI with demo repo

Fix made by ci-guardian on demo repo

PR opened in GitHub by ci-guardian

When a fix is safe and minimal, CI Guardian automatically opens a remediation pull request.

Diagnosis on Failing CI on real repo (Flask)

CI Guardian converts a real failing GitHub Actions run into a structured, machine-readable diagnosis using GitHub Copilot CLI.

Human-in-the-loop Intervention

CI Guardian safely refuses to auto-fix an ambiguous CI failure on a real Flask fork and escalates to human review.

My Experience with GitHub Copilot CLI

GitHub Copilot CLI was used as a reasoning engine, not a blind code generator. I used copilot -p to:

Summarize CI logs into structured root-cause explanations
Generate minimal unified diffs grounded in real failure logs
Draft concise pull request titles and descriptions

The key insight was that Copilot is most effective when paired with strict guardrails. CI Guardian treats Copilot output as a proposal, not a command, and enforces safety checks before applying any change. This results in automation that accelerates debugging without sacrificing trust or correctness.

More...