Building a multi-agent Kalshi algotrader with Sentient Foundation's ROMA SDK

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5168

    #1

    Building a multi-agent Kalshi algotrader with Sentient Foundation's ROMA SDK

    I spent a weekend building a paper trading system for Kalshi's 15-minute BTC binary prediction markets. The hook was Sentient Foundation's roma-dspy Python package — their ROMA (Recursive Open Meta-Agent) framework — which I wanted to actually use for something real-ish rather than run the hello-world example and close the tab.


    This post is about what that looks like in practice: the architecture, the places ROMA genuinely helped, the places it caused problems, and how the whole thing actually behaved.





    What KXBTC15M is

    Kalshi runs a market called KXBTC15M. Every 15 minutes a new binary contract opens: will BTC's price be higher at the end of this window than it was at the start? You bet YES or NO in cents (0–99¢), which maps directly to the market's implied probability. A 38¢ YES ask means the crowd thinks there's roughly a 38% chance BTC ends the window above the strike.


    The ticker format is KXBTC15M-{YY}{MON}{DD}{HHMM}-{NN} in US Eastern Time. The floor_strike field on the market object is the BTC price to beat, set when the window opens. These markets are only live during certain hours — worth knowing before you try to test against a live environment.





    Architecture overview

    Two processes, one purpose:






    Next.js 16 (port 3000) Python FastAPI (port 8001)
    ───────────────────── ──────────────────────────
    API routes (proxy) /analyze → roma-dspy solve()
    6-agent TypeScript pipeline ←→ /reset → circuit breaker reset
    useMarketTick (2s poll) /health
    usePipeline (5m cycle)







    The Next.js app owns the UI, the Kalshi API calls, the price feed, and the orchestration. The Python service does exactly one thing: accept a goal + context string, run it through Sentient's roma-dspy solve(), and return the result as JSON.


    The service supports four LLM providers out of the box: Grok, Anthropic, OpenAI, and OpenRouter. OpenRouter is worth highlighting — it gives you access to any model through a single API key and pay-per-use pricing, which is useful when you're hitting per-provider rate limits (more on that shortly).





    The pipeline DAG

    Six agents, run in sequence:






    MarketDiscovery ──┐
    PriceFeed ─────────┼──► SentimentAgent (ROMA) ──► ProbabilityModel (ROMA) ──► RiskManager ──► Execution
    Orderbook ─────────┘











    // lib/agents/index.ts — abbreviated
    export async function runAgentPipeline(...): PromisePipelineState> {
    const mdResult = await runMarketDiscovery(markets) // rule-based
    const pfResult = runPriceFeed(quote, strike) // rule-based

    const sentResult = await runSentiment(...) // ROMA

    await new Promise(r => setTimeout(r, 8_000)) // rate-limit breathing room

    const probResult = await runProbabilityModel(...) // ROMA

    const riskResult = runRiskManager(...) // rule-based
    const execResult = runExecution(...) // rule-based
    }







    The design decision I'm most confident about: only the two judgment agents use ROMA. MarketDiscovery, PriceFeed, RiskManager, and Execution are all deterministic. Putting LLM reasoning in the risk manager felt like a bad idea — you want that layer to be predictable, auditable, and fast.





    How the Python ROMA service works





    # python-service/main.py
    from roma_dspy.core.engine.solve import solve, ROMAConfig

    @app.post("/analyze")
    def analyze(req: AnalyzeRequest):
    config = build_roma_config(_llm_config)
    result = solve(full_prompt, max_depth=req.max_depth, config=config)
    ...







    That's the whole thing. solve() runs the Atomizer → Planner → parallel Executors → Aggregator flow internally — the core of what Sentient Foundation built with ROMA. At max_depth=1 it tends to solve atomically — one LLM call, no decomposition — which is what I want here. Decomposing "assess BTC sentiment" into parallel calls on a rate-limited key was the source of most of my problems.


    The /reset endpoint exists for a specific reason: ROMA has internal circuit breakers. If enough LLM calls fail (429s, timeouts), the breaker opens and every subsequent call fails immediately. That's sensible in a long-running service, but frustrating when you've fixed the underlying issue and the service is stuck refusing all requests until restart. The TypeScript client detects the error message and auto-resets before retrying:






    if (text.includes('Circuit breaker is open')) {
    await resetCircuitBreakers() // POST /reset, best-effort
    }







    Both agents have rule-based fallbacks. The UI shows which path ran — SentimentAgent (roma-dspy · grok) vs SentimentAgent (rule-based · roma-dspy unavailable). The pipeline keeps running either way.





    The rate-limit problem

    ROMA's Planner decomposes a goal into N parallel executor tasks and fires them concurrently. At max_depth=2, a single /analyze call can generate 4–6 simultaneous LLM requests. Two ROMA agents per cycle means 10–12 LLM calls within a few seconds. With a rate-limited API key that reliably produces a 429 cascade, which trips the circuit breaker, which makes the second agent fail before it even tries.


    Two fixes: max_depth=1 (atomic solve, one LLM call instead of six) and an 8-second pause between the two agents in the orchestrator. Neither is elegant. Both work.


    If you're running this seriously, OpenRouter is the move here. Set AI_PROVIDER=openrouter and you can route to Claude, Grok, Gemini, or any other model through one key with generous shared rate limits — much better than hammering a single provider's per-minute cap with parallel executor calls.





    What the two ROMA agents actually do

    SentimentAgent receives live BTC price, 1h/24h changes, strike distance, minutes until close, and top-5 orderbook levels on each side. ROMA returns natural language reasoning. A separate structured extraction call pulls out { score, label, momentum, orderbookSkew, signals }.


    ProbabilityModelAgent receives the sentiment score and signals, plus the market-implied probability (yes_ask / 100). It asks ROMA to estimate true P(YES) and whether the model edge justifies a trade.


    A real output from a working cycle:






    SentimentAgent: neutral (0.01)
    — strong 24h momentum offset by bearish orderbook skew

    ProbabilityModelAgent: P(model)=32% vs P(market)=31%, edge +1%
    → NO_TRADE — edge below 3% threshold

    RiskManager: REJECTED — edge 1.0% below minimum (3%)







    That's the system working correctly. Thin edge, no trade. Right call.





    Limitations, honestly

    It's paper trading by default. Live mode exists behind a confirmation modal, but I haven't run enough live cycles to have any opinion on whether the edge estimates are real.


    KXBTC15M markets are illiquid most of the time. Outside certain hours there's often no active market. The pipeline handles this gracefully but it limits how much live testing you can actually do.


    The 3% minimum edge is conservative by design. Given that the probability estimates come from an LLM reasoning over 1h/24h momentum and orderbook depth — not a trained model — that conservatism seems right.





    What I actually think about using ROMA here

    ROMA externalizes the "how do I break this problem down" question from application code. I write a goal; the framework decides whether to decompose it. For a focused single-topic analysis, that decomposition turns out to be unnecessary — atomic solve works fine. Where it could be genuinely valuable is for broader goals that benefit from parallel investigation across multiple dimensions simultaneously.


    The pattern I'd take from this into other projects: accept that ROMA returns natural language, and put a thin structured extraction layer at the boundary to get typed outputs. Keeps the two concerns separate. Works well.





    Full code at github.com/Julian-dev28/sentient-market-reader. If you're curious about the Kalshi auth (RSA-PSS signed headers), the circuit breaker handling, or the TS/Python boundary — drop a reply.




    More...
Working...