Secret-Service: Multi-Agent MCP Server for Structured Problem-Solving
Problem
Debugging across composed LLM tool layers (skills, hooks, subagents, MCP servers) lacks observability and traceability. Individual pieces work; their interactions don't.
What It Is
A Python MCP server that routes problems through 7 specialised LLM agents in a structured pipeline. Multiple strategies compete in parallel; the best wins. Every step is a structured, queryable event. Runs fully local — SQLite + sqlite-vss for storage/vector search, sentence-transformers for embeddings, MCP sampling for LLM inference. Zero external dependencies, zero API keys.
Architecture
Blackboard pattern: All agents read/write to a shared SQLite DB — no hidden state. Parallel fan-out: The Strategist generates N strategies (default 3), each executed concurrently in its own branch.
Pipeline (14 LLM calls for 3 strategies):
- Intake (sequential): Reception → Master → Strategist
- Execution (parallel per strategy): Taktik Planner → Judge → Mission (Judge rejects? Retry up to 3×)
- Evaluation (sequential): Jury scores all Missions → Master synthesises final answer
The 7 Agents
- Reception (temp 0.1) — precise problem intake
- Master (0.3) — orchestration and synthesis
- Strategist (0.9) — divergent strategy generation
- Taktik Planner (0.8) — creative step planning
- Judge (0.1) — rigorous plan verification (gatekeeper before execution)
- Mission (0.2) — faithful plan execution
- Jury (0.2) — consistent multi-dimensional scoring
High temp = creativity. Low temp = quality control.
Memory System
Learns from every session:
- High-scoring successes →
good_practice+patternmemories (permanent, reusable) - Failures →
bad_practice/anti_patternmemories - Memories are embedded and recalled via similarity search for future strategy generation
- Integrity: confidence decay on contradictions, near-duplicate supersession, relevance tracking
Observability
Append-only structured event stream (16 event types) with typed payloads. inspect_session() returns the full causal chain from problem to solution. No log grepping.
Scope Limits
- Does not write code, run tests, or modify files — returns structured recommendations
- Does not call LLM providers directly — requires MCP sampling support from the client
- Does not replace human judgment
Quick Start
git clone <repo-url> && cd Secret-Service
pip install -e ".[dev]"
Workflow: solve() → poll get_events() → get_result() → optionally inspect_session().
Stats
7 agents · 14 LLM calls/session · 13 DB tables · 8 MCP tools · 239 tests · 0 external dependencies · ~2,500 lines of Python.