The Automaton Auditor

A LangGraph-based forensic multi-agent system that audits code repositories against technical rubrics using parallel detective and judge agents, producing objective evidence-based evaluation reports.

PythonLangGraphLangChainGoogle GeminiOpenRouterFAISSDocker

Overview

A LangGraph-based forensic multi-agent system that audits code repositories against technical rubrics using parallel detective and judge agents, producing objective evidence-based evaluation reports.

Problem

Evaluating code repositories against technical rubrics is time-consuming, subjective, and inconsistent when done manually. The challenge is building an automated auditing system that applies multiple analytical perspectives in parallel, synthesizes conflicting opinions, and produces deterministic, evidence-based reports.

Dataset

Code repositories and accompanying PDF technical reports submitted for evaluation. The system processes Python AST structures, Git history, documentation quality, and theoretical depth from PDF reports via RAG.

Architecture

A hierarchical multi-agent graph with parallel fan-out/fan-in: (1) Context Builder loads the rubric and initializes audit state; (2) Two Detective agents run in parallel — RepoInvestigator analyzes AST, Git history, and tool safety; DocAnalyst performs RAG-based analysis on PDF reports; (3) Three Judge agents (Prosecutor, Defense, Tech Lead) evaluate evidence from different perspectives in parallel; (4) Chief Justice node applies deterministic constitutional rules to synthesize a final AuditReport. An LLM factory enables seamless switching between Gemini and OpenRouter backends.

Training

No training required. The system uses local HuggingFace embeddings (all-MiniLM-L6-v2) for quota-resistant RAG and LLM-aided Python AST parsing for high-precision code analysis. Audit rules are defined in a JSON rubric.

Results

The hierarchical architecture eliminates single-point-of-view bias by requiring consensus across three judicial perspectives. Deterministic constitutional rules (security capping, evidence necessity) ensure consistent final reports regardless of LLM non-determinism. Containerized with Docker for reproducible deployments.

GitHub Repository

View on GitHub