How I Built and Evaluated an AI Book-Writing System with ACP and Promptfoo

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    How I Built and Evaluated an AI Book-Writing System with ACP and Promptfoo




    Introduction

    Have you ever wondered if AI could write an entire book — from idea to polished chapters — without human help?

    What if multiple AI agents could collaborate, like a team of ghostwriters, editors, and publishers?


    That’s exactly what I explored in this project:

    ✅ ACP (Agent Communication Protocol) to build a multi-agent system

    ✅ OpenAI GPT-4o to generate and edit text

    ✅ Promptfoo to evaluate the agents’ outputs automatically


    In this post, I’ll share how I built acp-booksmith, an AI-powered book creation pipeline, how it works, and how I used Promptfoo to test it like a pro.

    What is ACP (Agent Communication Protocol)?

    ACP, developed by IBM, is an open standard that enables AI agents, apps, and humans to communicate smoothly, regardless of their underlying backend technology stack.


    Think of it as a universal language for agents.

    With ACP, I could easily connect multiple agents like:
    • outline agent → drafts book structure
    • chapter agent → writes full chapters
    • editor agent → polishes text
    • compiler agent → stitches the final book


    They all run on a local server (http://localhost:8000) and talk to each other through standardized ACP calls.

    What is Promptfoo?

    Promptfoo is a powerful open-source framework for evaluating and stress-testing LLM systems, agents, and prompt chains.


    Think of it as your AI quality assurance toolkit — it helps you:
    • Define structured test cases (via YAML or CLI)
    • Compare model or agent outputs across providers
    • Run automated checks (e.g., “is the output non-empty?”, “does it follow the format?”)
    • Visualize results in an interactive web viewer
    • Launch red teaming campaigns to probe for safety, bias, and robustness issues


    In this project, I used Promptfoo not just to test individual OpenAI model outputs, but to evaluate the full ACP-booksmith system, covering how all the agents work together to deliver a polished, end-to-end book-writing pipeline.


    By combining ACP + Promptfoo, I got both system-level validation and security-level insights — all in one workflow.

    Resources

    Link: https://github.com/i-am-bee/acp

    Link: https://github.com/promptfoo/promptfoo

    Step-by-Step Process to Build and Evaluate an AI Book-Writing System with ACP and Promptfoo

    Step 1: Set Up the Project Environment

    Before diving in, make sure your system is ready:






    python --version # >= 3.11
    node --version # >= 20.x
    npm --version # >= 10.x











    Then, initialize the project:






    uv init --python '>=3.11' my_acp_project
    cd my_acp_project
    uv add acp-sdk














    Step 2: Install all required libraries & set OpenAI API key

    Install Python libraries


    Run this one command to install all needed dependencies:






    pip install \
    acp-sdk==1.0.0 \
    fastapi==0.115.0 \
    uvicorn==0.29.0 \
    openai==1.30.1 \
    gradio==4.28.3 \
    reportlab==4.1.0 \
    requests==2.32.3








    This will install:


    ✅ acp-sdk → for the multi-agent protocol

    ✅ fastapi + uvicorn → for the server

    ✅ openai → for GPT calls

    ✅ gradio → for the web interface

    ✅ reportlab → for PDF generation

    ✅ requests → for HTTP calls





    For Promptfoo Installation, run the following command:






    npm install -g promptfoo











    Export your OpenAI API key


    Before running anything (main.py, agent.py, or Gradio app), set your API key:






    export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx"








    (Replace with your real key from the OpenAI account)


    Step 3: Write the Agents (agent.py)

    I built four key agents:
    • outline agent → Generates a detailed book outline
    • chapter agent → Writes a full chapter from a summary
    • editor agent → Edits the chapter for style and clarity
    • compiler agent → Combines all content into a single book


    These agents use openai.AsyncOpenAI under the hood and communicate via ACP.






    import asyncio
    import os
    from collections.abc import AsyncGenerator

    import openai
    from acp_sdk.models import Message
    from acp_sdk.server import Context, RunYield, RunYieldResume, Server

    # Initialize OpenAI async client using environment variable API key
    client = openai.AsyncOpenAI(api_key=os.getenv("OPENAI_API_K EY"))

    # Create ACP server instance to register agents
    server = Server()

    # Helper function to call OpenAI API with given prompt and token limit
    async def call_openai(prompt, max_tokens=1000):
    try:
    response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.7,
    max_tokens=max_tokens
    )
    return response.choices[0].message.content # Return generated text
    except Exception as e:
    print(f"[OpenAI API error]: {type(e).__name__}: {e}")
    return "[Error: Failed to generate content]"

    # Agent: Generates book outline based on title
    @server.agent()
    async def outline(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
    title = input[0].parts[0].content # Extract title from input
    prompt = f"Create a detailed book outline with chapters and sections for the book titled '{title}'."
    outline_text = await call_openai(prompt) # Get outline from OpenAI
    yield Message(parts=[{"content": outline_text, "content_type": "text/plain"}])

    # Agent: Generates full chapter text (~3000 words) from chapter summary
    @server.agent()
    async def chapter(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
    chapter_summary = input[0].parts[0].content # Extract chapter summary
    prompt = f"Write a full book chapter (~3000 words) based on this summary:\n{chapter_summary}"
    chapter_text = await call_openai(prompt, max_tokens=3000) # Get chapter draft
    yield Message(parts=[{"content": chapter_text, "content_type": "text/plain"}])

    # Agent: Edits chapter text for clarity, style, and coherence
    @server.agent()
    async def editor(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
    raw_text = input[0].parts[0].content # Extract raw chapter text
    prompt = f"Please edit and polish the following chapter for clarity, style, and coherence:\n\n{raw_text}"
    edited_text = await call_openai(prompt, max_tokens=3000) # Get edited version
    yield Message(parts=[{"content": edited_text, "content_type": "text/plain"}])

    # Agent: Compiles all parts (outline + chapters) into one full text
    @server.agent()
    async def compiler(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
    compiled = "\n\n".join(msg.parts[0].content for msg in input) # Concatenate all inputs
    yield Message(parts=[{"content": compiled, "content_type": "text/plain"}])

    # Run the ACP server to start serving agent endpoints
    server.run()







    Run them with:






    uv run agent.py








    Check they’re live:






    curl http://localhost:8000/agents














    Step 4: Create the Orchestrator (orchestrator.py)

    This script:
    • Calls each agent in order
    • Collects outlines, chapters, edited content
    • Writes output to final_book.txt and final_book.pdf using reportlab


    The magic here? It acts like a project manager, coordinating the AI team.






    import asyncio

    from acp_sdk.client import Client
    from acp_sdk.models import Message, MessagePart
    from reportlab.pdfgen import canvas # Library to generate PDF files


    # Helper function to call a specific agent with input text
    async def call_agent(client, agent_name, input_text, model):
    # Sends request to ACP agent and returns the content of the response
    run = await client.run_sync(
    agent=agent_name,
    input=[Message(parts=[MessagePart(content=input_text, content_type="text/plain")])]
    )
    return run.output[0].parts[0].content

    # Main orchestrator function to run full book creation pipeline
    async def main(title="The Quantum Cat's Journey", model="gpt-4o", progress_callback=None):
    async with Client(base_url="http://localhost:8000") as client:
    if progress_callback:
    progress_callback(0.05) # Update progress bar if using UI (like Gradio)

    # Step 1: Generate book outline
    outline = await call_agent(client, "outline", title, model)
    if progress_callback:
    progress_callback(0.2)

    chapters = []
    # Step 2: Generate 3 chapters (can increase this later if desired)
    for i in range(1, 4):
    chapter_prompt = f"{outline} - Chapter {i}" # Prepare chapter input
    chapter_content = await call_agent(client, "chapter", chapter_prompt, model)
    if progress_callback:
    progress_callback(0.2 + i * 0.15)

    # Step 3: Edit chapter using editor agent
    edited_chapter = await client.run_sync(
    agent="editor",
    input=[Message(parts=[MessagePart(content=chapter_content, content_type="text/plain")])]
    )
    chapters.append(edited_chapter.output[0].parts[0].content)

    # Step 4: Combine outline + chapters into full book text
    full_book = f"{outline}\n\n" + "\n\n".join(chapters)
    with open("final_book.txt", "w") as f:
    f.write(full_book)
    if progress_callback:
    progress_callback(0.85)

    # Step 5: Export final book to PDF format
    pdf = canvas.Canvas("final_book.pdf")
    pdf.setFont("Helvetica", 12)
    y = 800 # Set initial vertical position on PDF page
    for line in full_book.split("\n"):
    pdf.drawString(50, y, line[:100]) # Draw text line, truncate if too long
    y -= 15 # Move down by 15 pixels
    if y 50: # If near bottom, start new page
    pdf.showPage()
    pdf.setFont("Helvetica", 12)
    y = 800
    pdf.save() # Save the PDF file

    if progress_callback:
    progress_callback(1.0) # Mark as complete in UI if applicable







    Step 5: Build a CLI (main.py)

    To make it user-friendly, I added:
    • A CLI menu to run the full book generation pipeline
    • Option to extend later with more commands or features




    import asyncio
    import sys

    from orchestrator import (
    main as orchestrator_main, # Import the orchestrator main function
    )


    # Function to display a simple text menu in the terminal
    def print_menu():
    print("\nWelcome to acp-booksmith!")
    print("Select an option:")
    print("1. Run book generation workflow")
    print("2. Exit")

    # Main loop function for CLI (Command Line Interface)
    def main():
    while True:
    print_menu() # Show the menu options
    choice = input("Enter choice [1-2]: ") # Get user input
    if choice == "1":
    asyncio.run(orchestrator_main()) # Run orchestrator async function to generate book
    print("\n✅ Book generation completed! Check final_book.txt and final_book.pdf.\n")
    elif choice == "2":
    print("Goodbye!") # Exit message
    sys.exit() # Exit the program
    else:
    print("Invalid choice. Please enter 1 or 2.") # Handle invalid input

    # Entry point when running script directly
    if __name__ == "__main__":
    main()







    Now you can just run:






    python3 main.py








    And it’ll walk you through the process.





    After setting up the agent.py, orchestrator.py, and main.py files, we run our book system in the terminal to check if everything works end-to-end. We start the ACP server with uv run agent.py and then open another terminal to send test prompts (usually three to four), like generating an outline, drafting chapters, or editing content using curl commands. This allows us to confirm that the agents communicate correctly, OpenAI API calls succeed, and we receive polished outputs in both text and PDF formats — all orchestrated smoothly by the system.


    Prompt 1 — Generate Outline





    curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d '{"agent_name": "outline", "input": [{"role": "user", "parts": [{"content": "The Quantum Cat'\''s Journey", "content_type": "text/plain"}]}]}'











    Prompt 2 - Chapter Agent





    curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d '{"agent_name": "chapter", "input": [{"role": "user", "parts": [{"content": "Chapter 1: The Cat Enters the Quantum Realm", "content_type": "text/plain"}]}]}'











    Prompt 3 - Editor Agent





    curl -X POST http://localhost:8000/runs \
    -H "Content-Type: application/json" \
    -d '{
    "agent_name": "editor",
    "input": [
    {
    "role": "user",
    "parts": [
    { "content": "This is a raw chapter draft that needs editing for clarity and flow.", "content_type": "text/plain" }
    ]
    }
    ]
    }'











    Prompt 4 - Compiler Agent





    curl -X POST http://localhost:8000/runs \
    -H "Content-Type: application/json" \
    -d '{
    "agent_name": "compiler",
    "input": [
    {
    "role": "user",
    "parts": [
    { "content": "Outline content here", "content_type": "text/plain" }
    ]
    },
    {
    "role": "user",
    "parts": [
    { "content": "Chapter 1 content here", "content_type": "text/plain" }
    ]
    },
    {
    "role": "user",
    "parts": [
    { "content": "Chapter 2 content here", "content_type": "text/plain" }
    ]
    }
    ]
    }'











    Step 6: Add a Browser UI with Gradio (gradio_app.py)

    Not everyone loves the terminal, so I added a Gradio app!






    import asyncio
    import os
    import shutil

    import gradio as gr
    from orchestrator import main # Import orchestrator to run agent pipeline


    # Async function to generate the book using orchestrator and update Gradio progress bar
    async def generate_book_async(title, model, progress=gr.Progress()):
    # Clear old book files if they exist
    for file in ["final_book.txt", "final_book.pdf"]:
    if os.path.exists(file):
    os.remove(file)

    # Run the orchestrator with given title + model, passing in progress callback
    await main(title, model=model, progress_callback=progress)

    # Read final book text from generated TXT file
    with open("final_book.txt", "r") as f:
    book_text = f.read()

    # Return book text + file paths for download components
    return book_text, "final_book.txt", "final_book.pdf"

    # Wrapper to run async function inside sync Gradio button click
    def generate_book(title, model):
    return asyncio.run(generate_book_async(title, model))

    # Build Gradio interface
    with gr.Blocks() as demo:
    gr.Markdown("# 🐱 Quantum Cat Book Generator") # App title
    gr.Markdown("Enter a book title, pick a model, and generate a complete polished book with TXT and PDF downloads.")

    with gr.Row():
    title_input = gr.Textbox(label="Title", placeholder="Enter book title...") # Input box for title
    model_selector = gr.Dropdown(choices=["gpt-4o", "gpt-3.5-turbo"], value="gpt-4o", label="Model") # Model dropdown

    output_text = gr.Textbox(label="Generated Book", lines=20) # Output textbox to display book
    txt_download = gr.File(label="Download TXT") # Download button for .txt
    pdf_download = gr.File(label="Download PDF") # Download button for .pdf

    generate_btn = gr.Button("🚀 Generate Book") # Main action button

    # Link button click to generate_book function with inputs and outputs
    generate_btn.click(
    fn=generate_book,
    inputs=[title_input, model_selector],
    outputs=[output_text, txt_download, pdf_download]
    )

    # Launch the Gradio app on localhost:7860
    demo.launch(share=True)







    This lets you:
    • Enter a book title
    • Choose the OpenAI model (gpt-4o or gpt-3.5-turbo)
    • Click “Generate” and get the full book in the browser, with TXT and PDF download buttons


    Launch it with:






    python3 gradio app.py











    Open in your browser at:





















    Step 7: Launch Promptfoo Interactive CLI

    Once Promptfoo is installed and the version is verified, run the following command to open the interactive CLI:






    promptfoo init








    You'll see a terminal-based interface prompting:


    "What would you like to do?"


    Use your arrow keys to navigate and select your intention. You can choose from:
    • Not sure yet (explore options)
    • Improve prompt and model performance
    • Improve RAG performance
    • Improve agent/chain of thought performance
    • Run a red team evaluation





    Step 8: Choose Your First Model Provider (We’re Only Using OpenAI Here)

    After choosing your evaluation goal, Promptfoo will ask:


    "Which model providers would you like to use?"


    In this guide, we're using OpenAI as the model provider.
    • Use the arrow keys to select OpenAI
    • Hit space to check the box
    • Then press Enter to continue





    Step 9: Initialize Promptfoo Evaluation

    Once you've selected the model provider (in this case, we’re starting with OpenAI), Promptfoo will automatically generate the necessary setup files:
    • README.md
    • promptfooconfig.yaml





    Step 10: Write Promptfoo Configuration

    promptfooconfig.yaml
    • Defines test prompts, agents, and JS-based assertions




    description: 'ACP Agent Evaluation' # Description of this evaluation suite

    prompts:
    - '{{book_title}}' # Dynamic prompt variable used in each test case

    providers:
    - id: file://./provider.py # Connects to local provider script
    label: ACP Outline Agent # Label shown in Promptfoo UI
    config:
    agent_name: outline # Tell provider.py to call the 'outline' agent

    - id: file://./provider.py
    label: ACP Chapter Agent
    config:
    agent_name: chapter # Tell provider.py to call the 'chapter' agent

    - id: file://./provider.py
    label: ACP Editor Agent
    config:
    agent_name: editor # Tell provider.py to call the 'editor' agent

    defaultTest:
    assert:
    # ✅ Check the output is a string (using JS in Promptfoo)
    - type: javascript
    value: typeof output === 'string'

    # ✅ Check the output is not an empty string
    - type: javascript
    value: output.trim().length > 0

    tests:
    - description: 'Generate outline for book' # Test outline agent
    vars:
    book_title: "The Quantum Cat's Journey"

    - description: 'Generate chapter draft' # Test chapter agent
    vars:
    book_title: "The Quantum Cat's Journey - Chapter 1"

    - description: 'Edit draft content' # Test editor agent
    vars:
    book_title: "Refine The Quantum Cat's Journey draft"







    provider.py
    • Sends HTTP POST to localhost:8000/runs for each agent
    • Extracts clean text outputs
    • Returns result to Promptfoo




    import requests # Import HTTP requests library


    def call_api(prompt, config=None, context=None):
    agent_name = config.get("agent_name", "outline") # Get agent name from config, default to 'outline'
    url = "http://localhost:8000/runs" # ACP server endpoint

    payload = {
    "input": [{
    "text": prompt, # Original prompt text
    "parts": [{
    "type": "text", # Content type (text)
    "content": prompt # Content body
    }]
    }],
    "agent_name": agent_name # Target agent to call (outline, chapter, editor)
    }

    headers = {"Content-Type": "application/json"} # Set JSON header

    try:
    response = requests.post(url, json=payload, headers=headers) # Make POST request to ACP server
    response.raise_for_status() # Raise error if HTTP response is not 200

    result = response.json() # Parse response JSON

    # Check if ACP server returned an error
    if result.get('error'):
    return {"output": f"[ERROR] {result['error'].get('message', 'Unknown error')}"}

    # Extract and return the first content part from output
    outputs = result.get('output', [])
    if outputs:
    first_output = outputs[0]
    if 'parts' in first_output and first_output['parts']:
    first_part = first_output['parts'][0]
    if 'content' in first_part:
    return {"output": str(first_part['content']).strip()}

    return {"output": "[ERROR] No valid content found."} # Fallback if no valid output

    except Exception as e:
    return {"output": f"[ERROR] Exception during call: {e}"} # Catch and report exceptions







    Step 11: Run Evaluation

    Now that everything is configured, it's time to run your first evaluation!


    In the terminal, run the following command:






    promptfoo eval




















    What Are We Testing? ACP Agents or OpenAI Models?

    When running acp-booksmith with Promptfoo, it’s important to understand what part of the system we are evaluating.


    System architecture overview


    We built a multi-agent system using IBM’s ACP (Agent Communication Protocol).


    The ACP agents are:
    • outline → generates a book outline.
    • chapter → writes detailed chapters.
    • editor → polishes the text.
    • compiler → stitches everything together.


    Each agent runs inside a Python server (agent.py) on:















    Inside the agents, we use:






    openai.AsyncOpenAI(api_key=os.getenv("OPENAI_API_K EY"))








    to call GPT-4o models.


    How Promptfoo fits into the system
    • Promptfoo does NOT connect directly to OpenAI models.
    • Instead, Promptfoo runs test cases defined in:




    promptfooconfig.yaml








    It sends these prompts to:















    using provider.py, which talks to ACP agents.


    The ACP agents receive the request, process it, and, inside their own logic, call OpenAI’s API to generate the response.


    Why test through ACP agents?

    We want to test how well our entire system works, not just the raw OpenAI output.


    We care about:
    • Are the agents responding correctly?
    • Is the outline agent producing structured outlines?
    • Does the editor agent polish the text properly?
    • Can we stitch the book end-to-end?


    This gives us a real-world evaluation of:
    • agent design,
    • orchestration,
    • and LLM usage, all together.

    Step 12: Visualize and Analyze Agent Outputs with Promptfoo Web Viewer

    Once you’ve completed your promptfoo eval and run:






    promptfoo view








    You will see:


    A local server starts at:















    You can open it in your browser (just press y when asked).





    What you’ll see on the web viewer

    For each test case, you’ll get:
    • Description:

      What was tested (e.g., generate outline, draft chapter, edit content)
    • Variables:

      Input values, like book_title: The Quantum Cat's Journey.
    • Outputs:

      What the agent produced, e.g.,
    • Book title
    • Chapter summaries
    • Edited sections
    • Assertions (PASS/FAIL):

      Green ✅ if the output is a string and not empty (based on your promptfooconfig.yaml).
    • Comparison view:

      See side-by-side results from

      ACP Outline Agent, ACP Chapter Agent, ACP Editor Agent.
    • Model and agent details:

      Which agent was called, and what label was it given.







    Step 13: Run Red Teaming to Stress-Test ACP Agents

    After you’ve run a basic promptfoo eval and visually confirmed that your ACP agents (outline, chapter, editor) pass normal test cases, the next big move is:


    Now that we’ve evaluated basic behavior, it’s time to turn up the heat — and dive into red teaming.


    Promptfoo has a built-in Red Team mode designed for:


    Surfacing weaknesses


    Triggering edge-case behavior


    Probing for harmful, biased, or rule-breaking outputs


    🛠️ How to Activate It:

    In the Promptfoo dashboard, look at the top bar.


    Click on the dropdown next to “Evals”.


    Choose the Red team from the menu.


    Boom — you’re now in red teaming mode. The view will shift, highlighting pass/fail bars, vulnerability spread, and prompt score deltas even more clearly.




    Step 14: Set Up Your Red Team Target Configuration

    Target Type

    ➡ Select: Custom (or similar, since you’re targeting your own ACP agent)


    Target Name






    acp-booksmith-outline-agent








    (Use something meaningful. For example, if you test the chapter agent later (name it acp-booksmith-chapter-agent, etc.)


    Target ID






    file://./provider.py








    This matches your local provider setup. It connects the Red Team to your provider script.


    Custom Configuration (JSON)






    {
    "agent_name": "outline",
    "temperature": 0.5
    }








    (Adjust "agent_name" to "chapter" or "editor" when testing other agents.)





    Additional Configuration

    Delay → leave empty or set e.g., 1 (1 second between requests)


    Extension Hook → leave empty (unless you write custom Python hooks)


    Prompts

    Add your test prompts. Example:






    Generate an outline for the book titled "The Quantum Cat's Journey"








    (You can later set a prompts list or dataset if you want to run many tests automatically.)





    Step 15: Specify Your Usage Type – Are You Testing a Model or an App?

    Now that you’ve configured your red team target, it’s time to define how you want to evaluate it — is this a model or a full application?


    What You’re Seeing:

    Promptfoo gives you two options here:


    Option 1: I'm testing a model

    This is what you want.

    Since we’re directly red teaming Ollama running DeepSeek-R1, select this.


    No need to simulate application flows or pass surrounding context.


    You’ll go straight into prompt injection, safety probing, and reasoning stress tests.


    Option 2: I'm testing an application

    Only use this if you're evaluating an AI-powered product (like a chat assistant or multi-step agent with UI/API layers).


    What to Do:

    Click "I'm testing a model" on the right.


    You’ll see a note confirming:
    • “You don't need to provide application details. You can proceed to configure the model and test scenarios in the next steps.”
    • Select “I’m testing an application” to define the red teaming context for the full ACP-booksmith system.
    • Under Main Purpose, describe that the system generates complete books via multi-agent collaboration using ACP.
    • Under Key Features, list outline generation, chapter drafting, editing, compilation, export, Gradio interface, and API endpoints.
    • Under Industry/Domain, fill in publishing, creative writing, education, AI tools, and content automation.
    • Under Specific Constraints, explain it only handles book-related prompts, uses OpenAI models via ACP, and ignores unrelated or malicious prompts.







    Step 16: Plugin Configuration

    • Go to Plugin Configuration in Promptfoo Red Team setup.
    • Review all available plugin presets (like Recommended, Minimal Test, RAG, Foundation, Guardrails Evaluation, etc.).
    • For broad, balanced coverage, select Recommended — this runs a general set of tests across safety, robustness, and compliance.
    • If you want more specialized security or risk testing, optionally choose presets like OWASP LLM Top 10, Guardrails Evaluation, or MITRE.
    • Click Next after selection to apply these plugins to your red teaming run.


    Select the Recommended preset — it’s designed for broad, balanced testing across safety, robustness, and compliance.




    Step 17: Strategy Configuration

    • Go to the Strategy Configuration section in Promptfoo.
    • Select Custom mode to fine-tune your attack strategy selection.
    • Enable Single-shot Optimization (recommended, agent-based) — it optimizes one-turn attacks to bypass controls.
    • Enable Composite Jailbreaks (recommended) — it chains multiple attack methods for stronger testing.
    • Skip Basic or advanced multi-turn agents unless you want deeper experiments — focus on efficient, high-impact tests.




    Step 18: Review and Finalize Your Configuration

    This is the final checkpoint before Promptfoo launches the red team evaluation on your ACP-booksmith system.


    Here’s what to review:


    Plugins (39):

    You’ve selected a broad and powerful set including:
    • Bias detection (e.g., bias:age, bias:race, bias:gender, bias:disability)
    • Privacy and sensitive data (e.g., pii:direct, pii:session, pii:api-db, harmfulrivacy)
    • Safety and harmful content (e.g., harmful:self-harm, harmful:misinformation-disinformation, harmful:violent-crime, harmful:specialized-advice)
    • Injection and hacking risks (e.g., hijacking, harmful:cybercrime, harmful:cybercrime:malicious-code)


    Strategies (2):

    You’ve configured high-impact testing strategies:
    • Single-shot Optimization (Agent-based, single-turn attack optimization)
    • Composite Jailbreaks (Chains multiple attack vectors for enhanced effectiveness)


    Final check:
    • Configuration description
    • All plugin categories cover your security, safety, and fairness concerns
    • Strategies are aligned with your goals




    Step 19: Run Your Configuration (CLI or Browser)

    You now have two options depending on your use case:


    Option 1: Save and Run via CLI

    Best for: Large-scale testing, automation, deeper debugging.


    Click “Save YAML” – this downloads your configuration as a .yaml file.


    On your terminal or VM where Promptfoo is installed, run:






    promptfoo redteam run








    This command picks up your saved config and starts the red teaming process.


    Why CLI?


    Supports headless runs


    Better logging and error tracing


    CI/CD and repo integration


    Option 2: Run Directly in the Browser

    Best for: Simpler tests, quick feedback, small scans.


    Click the blue “Run Now” button.


    Promptfoo will start executing the configured tests in the web UI.


    You’ll see model outputs and vulnerabilities flagged inline.








    Since we are using Option 2, Promptfoo is:


    Actively running your full configuration against the ACP-booksmith multi-agent system (powered by OpenAI models under ACP orchestration).


    Using your selected plugins (39 types), including:
    • Bias detection (age, race, gender, disability)
    • Privacy & PII (e.g., pii:direct, pii:session, harmfulrivacy)
    • Security & injection risks (e.g., hijacking, cybercrime, malicious code)
    • Harmful & unsafe content filters (e.g., self-harm, misinformation, violence)


    Applying your chosen strategies:
    • Single-shot Optimization (agent-driven, one-turn attacks)
    • Composite Jailbreaks (multi-vector, chained attack paths)


    Testing 6,240 probes — a large, high-coverage scan that simulates real-world attacks on AI-driven book generation systems!














    Step 20: Review Results and Generate Vulnerability Report

    After the tests finish running, Promptfoo shows you a detailed breakdown of model performance across various security domains and plugins.








    Conclusion

    Building acp-booksmith was more than just stringing together a few API calls.

    It was about designing a collaborative system where AI agents play distinct roles — from outlining and drafting to editing and compiling —

    and making sure they communicate, coordinate, and deliver like a true creative team.


    But here’s the key insight: even the most elegant multi-agent system is only as good as its weakest link.

    That’s where Promptfoo came in — it helped me uncover blind spots,

    test the agents under pressure, and surface edge cases I would have never thought to check manually.


    By pairing ACP’s agent orchestration with Promptfoo’s evaluation and red teaming,

    I not only automated book creation — I made sure the system was robust, reliable, and responsible.


    If you’re working on your own AI pipelines or agent frameworks,

    I highly recommend adding Promptfoo to your stack —

    because in the world of AI, trust isn’t built on magic, it’s built on testing.




    More...
Working...