Using AI to Detect Flaky Tests in CI/CD Pipelines: A Practical Framework for QA Teams

**MyrinNew** · 06-11-2025, 04:00 PM

Flaky tests—those that pass and fail intermittently without code changes—are the bane of any CI/CD pipeline. They erode developer confidence and block deployments. In this post, I’ll walk you through how to detect flaky tests using machine learning and how this AI-driven approach can improve your software delivery.

Why Use AI in Testing?

With Agile and DevOps pushing rapid deployments, we need smarter testing solutions. AI brings automation and intelligence to QA by:

Predicting failure-prone areas

Optimizing test execution

Auto-generating test cases

Detecting flaky tests

Let’s focus on the last one—flaky test detection using ML.

** The Framework: Flaky Test Detection with ML**

** Step 1: Collect CI Data**

Use logs from Jenkins/GitHub Actions:

Test execution results

Commit metadata

Stack traces

Store this data in CSV or a small database.

** Step 2: Feature Engineering**

Extract meaningful features like:

Frequency of failure

Execution time variance

Code churn (lines added/deleted)

Stack trace similarity

**

Step 3: Train the Model**

Use ML classifiers like:

Random Forest

SVM

XGBoost

from xgboost import XGBClassifier

from sklearn.metrics import accuracy_score

model = XGBClassifier()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

** Step 4: Integrate into CI/CD**

Build a REST API or Jenkins plugin to call your model.

Flag flaky tests during pull requests.

Alert devs via Slack or cancel the build if flakiness is too high.

Benefits

Reduce debugging time

Boost confidence in automation

Improve release stability

Save CI/CD cost

Future Work

Use GPT-style LLMs to generate test cases

Apply Reinforcement Learning for self-healing automation

Build Explainable AI to justify ML decisions to dev teams

Conclusion

AI in testing is no longer a buzzword—it's a necessity. By using ML models for flaky test detection, you bring stability, speed, and intelligence to your QA pipelines.

More...