I Created An Enterprise MCP Gateway

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5168

    #1

    I Created An Enterprise MCP Gateway

    When you start building AI applications beyond simple experiments, everything changes. Models need access to files, databases, APIs, and internal services. That's where the Model Context Protocol (MCP) comes in.


    But managing dozens of MCP servers, tools, and integrations in production quickly becomes a nightmare. I spent the last few months building an enterprise MCP gateway using Bifrost, and I want to share what I learned.








    ๐Ÿ’ป The Problem: MCP Without a Gateway is Bad

    Here's what happens without proper infrastructure:


    Your models spend precious tokens discovering available tools. Teams can't control who uses what. An engineer accidentally deletes the wrong database because the model had access it shouldn't have. API costs spike unexpectedly. You have no idea which AI workflows are running where.


    The root issue: MCP was designed for flexibility. When you scale from a chatbot to production AI systems, you need:
    • Centralized tool management instead of scattered MCP servers
    • Fine-grained access control so marketing tools don't leak into engineering
    • Rate limiting per tool to prevent API abuse and runaway costs
    • Complete audit trails for compliance and debugging





    ๐Ÿ‘€ Why Bifrost?

    Bifrost is a high-performance, Go-based LLM gateway that solves these problems:






    # Quick start - 30 seconds with -p 8000
    npx -y @maximhq/bifrost

    # Opens http://localhost:8000










    ๐Ÿ’Ž Star Bifrost โ˜†



    Key advantages:
    • 40x lower overhead than another gateways (11ยตs vs 440ยตs)
    • 68% less memory usage
    • 100% success rate at 5,000 RPS
    • Code Mode - Models generate orchestration code instead of step-by-step calls
    • Semantic caching - 40-60% cost reduction on similar queries
    • Built-in control - RBAC, rate limiting, cost tracking, audit logs





    ๐Ÿ“ฆ 1. Collect All MCP Servers

    Instead of direct model access to scattered MCP servers:






    // Gateway configuration - single entry point
    mcpConfig := &schemas.MCPConfig{
    ClientConfigs: []schemas.MCPClientConfig{
    {
    Name: "filesystem",
    ConnectionType: schemas.MCPConnectionTypeSTDIO,
    StdioConfig: &schemas.MCPStdioConfig{
    Command: "npx",
    Args: []string{"-y", "@anthropic/mcp-filesystem"},
    },
    ToolsToExecute: []string{"*"},
    },
    {
    Name: "web_search",
    ConnectionType: schemas.MCPConnectionTypeHTTP,
    ConnectionString: bifrost.Ptr("http://localhost:3001/mcp"),
    ToolsToExecute: []string{"search", "fetch_url"},
    },
    },
    }

    client, err := bifrost.Init(context.Background(), schemas.BifrostConfig{
    Account: account,
    MCPConfig: mcpConfig,
    Logger: bifrost.NewDefaultLogger(schemas.LogLevelInfo),
    })







    Benefits:
    • Single source of truth for all tools
    • Unified security policies
    • Centralized monitoring and cost tracking
    • Consistent behavior across all models





    โš™๏ธ 2. Control Tool Access Based on Roles

    Different teams need different tool access levels. Implement role-based access control:






    roleToToolsMapping := map[string][]string{
    "engineering": {"filesystem", "database", "github-api"},
    "marketing": {"web-search", "document-generation"},
    "finance": {"cost-tracking"},
    "admin": {"*"}, // All tools
    }

    roleLimits := map[string]map[string]int{
    "engineering": {"filesystem": 1000, "database": 500},
    "marketing": {"web_search": 100},
    "finance": {"cost_tracking": 50},
    }

    // Check access
    async function checkToolAccess(userId, role, toolName) {
    const allowedTools = roleToToolsMapping[role];
    if (!allowedTools.includes(toolName)) {
    throw new Error(`Tool '${toolName}' is denied for role '${role}'`);
    }
    }







    Real example - Access denied:






    curl -X POST http://localhost:8000/v1/mcp/tool/execute \
    -H "Content-Type: application/json" \
    -d '{
    "tool_call": {
    "tool_name": "database",
    "params": {"query": "SELECT * FROM users"}
    },
    "user_role": "marketing"
    }'

    # Response (403):
    # {
    # "error": "Access Denied",
    # "message": "Tool 'database' is not allowed for role 'marketing'"
    # }







    This single change prevents entire categories of security issues.





    ๐Ÿ”Ž 3. Implement Rate Limiting

    An AI workflow once got stuck in a loop, hammering the database with thousands of queries per second. The costs spiked $2,000 in 2 hours before we caught it.


    Rate limiting is your firewall against your own systems:






    class RateLimiter {
    async checkLimit(toolName, userId, limit) {
    const key = `${toolName}:${userId}`;
    const now = Date.now();
    const windowStart = now - 60000; // 1 minute

    if (!this.windows.has(key)) {
    this.windows.set(key, []);
    }

    const timestamps = this.windows.get(key)
    .filter(t => t > windowStart);

    if (timestamps.length >= limit) {
    return {
    allowed: false,
    retryAfter: Math.ceil((timestamps[0] + 60000 - now) / 1000)
    };
    }

    timestamps.push(now);
    return { allowed: true, remaining: limit - timestamps.length };
    }
    }







    Real example - Rate limit exceeded:






    curl -X POST http://localhost:8000/v1/mcp/tool/execute \
    -H "Content-Type: application/json" \
    -d '{
    "tool_call": {
    "tool_name": "web_search",
    "params": {"query": "another search"}
    },
    "user_id": "user-123",
    "user_role": "marketing"
    }'

    # Response (429 - Rate Limited):
    # {
    # "error": "Rate Limit Exceeded",
    # "message": "Tool 'web_search' limit exceeded (100/min)",
    # "retryAfter": 45
    # }







    The rate limiter caught what would have been a $5,000+ incident in under 30 seconds.





    ๐Ÿ“Š 4. Track Costs and Audit Everything

    Production AI systems need accountability. Who ran what? When? How much did it cost?






    type AuditLog struct {
    Timestamp time.Time
    UserId string
    UserRole string
    ToolName string
    Success bool
    Cost float64
    Duration time.Duration
    Error string
    }

    async function executeTool(toolName, params, context) {
    const startTime = Date.now();

    try {
    const result = await toolExecutor.execute(toolName, params);
    const duration = Date.now() - startTime;
    const cost = calculateCost(toolName, params);

    await auditLogger.log({
    userId: context.userId,
    userRole: context.userRole,
    toolName,
    success: true,
    cost,
    duration
    });

    return result;
    } catch (error) {
    await auditLogger.log({
    userId: context.userId,
    toolName,
    success: false,
    error: error.message
    });
    throw error;
    }
    }







    Example - Cost breakdown:






    GET /v1/analytics/costs?team_id=team-engineering&period=month

    {
    "total_cost": "$127.45",
    "budget": "$1000.00",
    "remaining": "$872.55",
    "usage_by_tool": [
    {
    "tool": "web_search",
    "calls": 1234,
    "cost": "$12.34"
    },
    {
    "tool": "database",
    "calls": 567,
    "cost": "$56.70"
    }
    ]
    }







    This visibility was transformative. Teams saw exactly what they were spending. Anomalies became obvious.





    ๐Ÿ–‹๏ธ The Complete Flow

    Here's what tool execution looks like with all control layers:






    app.post("/v1/mcp/tool/execute", async (req, res) => {
    const { toolName, params, userId, userRole, teamId } = req.body;

    try {
    // 1. Check role-based access
    await checkToolAccess(userId, userRole, toolName);

    // 2. Check rate limits
    const limit = roleLimits[userRole]?.[toolName];
    const rateLimitCheck = await limiter.checkLimit(toolName, userId, limit);
    if (!rateLimitCheck.allowed) {
    return res.status(429).json({
    error: "Rate Limit Exceeded",
    retryAfter: rateLimitCheck.retryAfter
    });
    }

    // 3. Check budget
    const cost = estimateCost(toolName, params);
    const budgetCheck = await budgetTracker.deductCost(teamId, toolName, cost);
    if (!budgetCheck.allowed) {
    return res.status(402).json({
    error: "Budget Exceeded"
    });
    }

    // 4. Execute tool
    const result = await executeTool(toolName, params);

    // 5. Log the action
    await auditLogger.log({
    userId, userRole, teamId, toolName,
    success: true, cost, duration
    });

    res.json({ success: true, data: result });

    } catch (error) {
    // Log failures too
    await auditLogger.log({
    userId, userRole, teamId, toolName,
    success: false, error: error.message
    });

    res.status(400).json({ success: false, error: error.message });
    }
    });










    โœ… Code Mode

    Instead of calling tools one by one, models generate TypeScript code that orchestrates them:






    // Model generates this automatically
    const tools = await listToolFiles(); // List available tools
    const githubTool = await readToolFile('github'); // Read definition

    // Execute a complete workflow
    const results = await executeToolCode(async () => {
    const repos = await github.search_repos({
    query: "golang bifrost",
    maxResults: 5
    });

    const formatted = repos.items.map(repo => ({
    name: repo.name,
    stars: repo.stargazers_count,
    url: repo.html_url
    }));

    return { repositories: formatted, count: formatted.length };
    });







    Benefits:
    • ~40% reduction in token usage
    • Single execution vs multiple calls
    • Better control and debugging
    • Faster execution





    ๐Ÿ“Š Key Metrics

    Bifrost performance at 5,000 RPS:


    Gateway Overhead ~440 ยตs ~11 ยตs 40x faster
    Memory Usage Baseline -68% 68% less
    Queue Wait 47 ยตs 1.67 ยตs 28x faster
    Success Rate 89% 100% Perfect


    Why Go language?
    • Goroutines: lightweight concurrency (~2 KB each)
    • Compiled binary: no startup overhead
    • Memory efficient: 68% less than another
    • True parallelism across CPU cores





    โš™๏ธ Getting Started





    # 1. Install Bifrost (30 seconds)
    npx -y @maximhq/bifrost

    # 2. Configure API keys (.env)
    OPENAI_API_KEY=sk-...
    ANTHROPIC_API_KEY=sk-ant-...

    # 3. Open dashboard
    open http://localhost:8000

    # 4. Make your first call
    curl -X POST http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello Bifrost!"}]
    }'

    # 5. Drop-in replacement
    # Change this:
    base_url = "https://api.openai.com"
    # To this:
    base_url = "http://localhost:8000/openai"










    ๐Ÿ’ป What I'd Do Differently

    1. Start with cost tracking from day one - Retrofit is painful
    2. Make rate limits configurable - Teams have different needs
    3. Implement caching aggressively - Semantic caching saves 40%+
    4. Build hierarchical permissions - Flat models don't scale
    5. Set up real-time alerting - Don't wait for weekly reviews





    โœ… The Real Benefit

    At the end of the day, the gateway isn't about being fancy. It's about control.


    When you centralize tool management, you get:
    • Security - Tools isolated by role, mistakes bounded
    • Visibility - Every action logged and costs tracked
    • Optimization - See what's expensive and fix it
    • Debugging - Complete audit trail for incidents


    For us, this infrastructure turned AI from "a cool demo" into something we could deploy to production with confidence.





    ๐Ÿ”— Resources






    Are you building AI infrastructure at scale? Let me know in the comments!


    Thanks for reading!




    More...
Working...