Oracle Forge — Multi-Database Data Agent

Team PaLM's multi-database analytics agent built for the UC Berkeley DataAgentBench benchmark. Answers natural-language questions across PostgreSQL, MongoDB, SQLite, and DuckDB using MCP as the database access layer with self-correcting execution and traceable audit logs.

PythonLangGraphMCPPostgreSQLMongoDBDuckDBOpenRouterCloudflare Workers

Overview

Problem

The UC Berkeley DataAgentBench (DAB) benchmark requires agents to answer complex natural-language questions across heterogeneous databases (PostgreSQL, MongoDB, SQLite, DuckDB) spanning domains like Yelp reviews, book reviews, financial data, and genomics. The challenge is building a system that routes queries to the right databases, handles multi-database joins, and self-corrects on failures — all with measurable, traceable execution.

Dataset

UC Berkeley DataAgentBench datasets: yelp, bookreview, googlelocal, agnews, crmarenapro, stockindex, PANCANCER_ATLAS, DEPS_DEV_V1, and GITHUB_REPOS. Each dataset is served through MCP-backed database connections (PostgreSQL, MongoDB, SQLite, DuckDB).

Architecture

Four runtime layers: (1) OracleForgeAgent assembles context and handles the user question using explicit context layers instead of prompt sprawl; (2) QueryRouter selects the right databases and decomposes multi-database work; (3) ExecutionEngine dispatches database reads through Google MCP Toolbox and post-processing through a Cloudflare Worker sandbox supporting extract, merge, transform, and validate operations; (4) Evaluation layer traces all tool calls and stores benchmark-style run artifacts. MCP is the exclusive database access layer — the agent never connects directly to databases.

Training

No model training. The agent uses OpenRouter LLMs (Gemini) for natural-language understanding and query planning. Self-correction is implemented via iterative re-execution with trace-based error diagnosis. The sandbox provides a safe execution environment for post-retrieval operations outside the LLM.

Results

Successfully handles multi-database queries across all DAB datasets. MCP-backed tool calls are fully traceable via trace_events in run artifacts. The sandbox contract enforces structured extract/merge/transform/validate operations with PASSED/FAILED/ERROR validation status on every execution. Deployed on a shared server with HTTP API access for benchmark evaluation.

GitHub Repository

View on GitHub