ATLAS Integration PRD v2

⬛ Forge Brief v2 — Adversarial Assessment Required

Read this. Read the ATLAS codebase. Then read Graphiti. Push back hard on all of it.

PRD v1 had infrastructure errors now corrected. v2 adds two new components: FolkloreDB (graph/narrative store) and Graphiti (Zep's temporal knowledge graph). The core hardware/ATLAS thesis carries forward, but the retrieval architecture is now significantly more complex. Your job: validate ATLAS codebase AND assess whether the 3-layer retrieval stack (Qdrant vector + FolkloreDB graph + Graphiti temporal) creates compounding retrieval or compounding complexity.

Context: Jason is evaluating ~$990 GPU hardware for local ATLAS deployment + Forge migration, with FolkloreDB and Graphiti as the knowledge layers that make expert clones and JV validation genuinely more powerful than pure vector RAG. The architecture claim needs stress-testing.

Read worker.py — does the Ralph Loop actually accumulate error context across retries or just retry with higher temperature? Critical distinction for code quality.
LoRA trainer claims CPU training on 14B model nightly. What are the actual time and RAM requirements? Does it complete in an overnight window?
OpenAI-compatible proxy — does it pass streaming, function calling, and system prompts correctly? LiteLLM cascade integration depends on this.
install.sh — what's the real complexity? K3s + NVIDIA consumer GPU + CUDA 12.8. What breaks in practice?
Novelty check — is there anything architecturally novel in ATLAS or purely integration work? Claude said no. Confirm or challenge.
What would YOU build differently for Forge specifically?
NEW: Three retrieval layers (Qdrant + FolkloreDB + Graphiti) — does this compound retrieval quality or compound operational complexity? What's the realistic failure mode at the intersection of all three?
NEW: Graphiti as Forge's temporal memory — does it actually solve the "Forge forgets between sessions" problem, or does it introduce consistency and hallucination risks that make it worse than stateless operation?

atlas/worker.py atlas/rag_api.py scripts/install.sh docs/ARCHITECTURE.md atlas.conf.example embedding-service/ llm-proxy/

github.com/itigges22/ATLAS github.com/getzep/graphiti github.com/itigges22/opencode

00 //

v1 Corrections Updated

Infrastructure Correction

v1 PRD incorrectly placed web pages on Hetzner. Actual state: Hetzner is compute-only (Forge, n8n, LiteLLM, agent infrastructure). Web pages and frontend are on Vercel + Hostinger. Auth and relational data on Supabase. There is no public web layer on Hetzner to protect. This simplifies the hybrid architecture — local box connects directly to Hetzner compute via Tailscale. No need to route around a web tier.

New Components Added

FolkloreDB — graph/knowledge database for storing narratives and relationships. Exists in Jason's stack. Not mentioned in v1. Graphiti (not "Graphite") — Zep's temporal knowledge graph library for AI agents. Stores facts as graph nodes/edges with time-awareness. Built specifically for agent persistent memory. Together these add a second and third retrieval layer on top of ATLAS's Qdrant vector store, changing the expert clone and Forge memory architecture fundamentally.

01 //

What ATLAS Is (Unchanged)

ATLAS — self-hosted AI coding agent stack. Consumer GPU (RTX 5060 Ti 16GB). Six components: LLM Proxy, RAG API, llama-server (Qwen3-14B), Embeddings (MiniLM-L6-v2), Qdrant (vector DB), Task Worker (Ralph Loop). Redis queues. Nightly LoRA fine-tuner. OpenAI-compatible API.

99.5%

Task Success
Ralph Loop

Retrieval Layers
v2 Architecture

14B

Model Params
Qwen3

~$990

Hardware Cost
All-In

⚠ Still True From v1

ATLAS is not novel. Every component existed before 2025. Zero stars, zero forks. Single developer. The Ralph Loop is elegant applied math, not invention. The moat is in how you deploy it against your specific expert domains — not in the stack itself. Do not oversell.

02 //

The Three-Layer Retrieval Stack New in v2

This is the architectural insight that v2 introduces. Pure vector RAG (what ATLAS ships with) answers the question: "what chunks of text are semantically similar to this query?" That's useful but shallow. Adding FolkloreDB and Graphiti creates two additional retrieval dimensions that are qualitatively different — and together they change what expert clones and Forge can do.

Retrieval Stack — Three Layers

Qdrant — Vector Similarity (ATLAS native)

Semantic nearest-neighbor search across embedded knowledge chunks. Answers: "what content is most similar to this query?" Fast, scalable, stateless. Already in ATLAS. 100GB storage, HNSW index, MiniLM-L6-v2 embeddings (384 dims). This is the baseline — the floor, not the ceiling.

VECTOR

FolkloreDB — Graph Traversal (Narrative + Relationships)

Stores concepts, narratives, and relationships as graph nodes and edges. Answers: "what concepts connect to this, and how?" Where Qdrant finds similar text, FolkloreDB finds related ideas through explicit relationship paths. For expert clones: Brad's TIGER QUEST methodology isn't just chunks of text — it's a graph of connected sales concepts (qualification → discovery → close), each with narrative context. Graph traversal retrieves the relational structure, not just the surface content.

Forge application: JV ideas stored as narrative graphs — "this idea connects to this market, which connects to this expert, which has these constraints." Validation subagents traverse the graph to find second-order implications, not just vector-similar concepts.

GRAPH

Graphiti — Temporal Knowledge Graph (Agent Memory)

Zep's temporal knowledge graph. Stores facts as graph nodes/edges with time-awareness — every fact has a "valid from / valid until" timestamp. Answers: "what do we know, when did we learn it, and has it changed?" This is the layer that gives Forge genuine memory across sessions. Not just "here's what's in the vector store" but "here's what Forge decided about Brad's JV on March 15th, and here's how that evolved by March 27th."

Expert clone application: A clone doesn't just know Brad's methodology — it knows that Brad updated his TIGER QUEST rubric in Week 3, that a specific sales objection pattern was added after a real call, and that certain concepts have higher recency weight. The clone answers from the most temporally relevant state of Brad's knowledge, not a static snapshot.

Forge application: Forge's persistent memory. Every JV evaluation, every build decision, every dead end becomes a temporal fact node. Forge doesn't repeat analysis it already did — it queries Graphiti first, finds prior reasoning, and either builds on it or updates it with new context.

TEMPORAL

The compound query: When Forge evaluates a new JV idea, a full retrieval hit queries all three layers simultaneously — vector similarity for related content (L1), graph traversal for concept relationships (L2), temporal facts for what Forge has already learned and when (L3). The answer is richer than any single layer could produce. The risk is orchestration complexity and latency. Forge assesses whether the compound gain justifies the compound overhead.

03 //

The Case For Integration

Leverage #1

Parallel JV Validation at Zero Marginal Cost

Forge spawns subagents to evaluate multiple JV ideas simultaneously. Against local ATLAS: free. Rate limits disappear. Volume of ideas validated per dollar collapses toward zero.

Leverage #2

Expert Clones That Compound Nightly

LoRA trainer turns 4+ rated sessions into training data. Clones improve from usage automatically. The asset appreciates while you sleep — without Sumit sprints or Jason intervention.

Leverage #3 — New

Graphiti Gives Forge Persistent Memory

Today Forge forgets between sessions. Graphiti makes every decision, evaluation, and dead-end a persistent temporal fact. Forge doesn't re-analyze what it already analyzed — it builds on prior reasoning. Compound intelligence, not reset intelligence.

Leverage #4 — New

FolkloreDB Turns Expert IP Into Relational Knowledge

Brad's TIGER QUEST isn't just text chunks — it's a graph of connected sales concepts. Graph retrieval finds structural relationships vector search misses. Richer clone responses, better expert IP preservation.

Leverage #5

Sovereign Infra = Premium Tier Unlock

"Your expert clone runs on our sovereign hardware, no data leaves our stack." Healthcare, finance, legal clients have data residency requirements. SaaS wrappers can't offer this. You can.

Leverage #6

Cascade Routing Eliminates Commodity API Spend

ATLAS Tier 0 handles routine coding + retrieval. Claude Sonnet is the exception. At volume, API spend drops significantly. Break-even on hardware by Month 4 vs. cloud GPU rental.

04 //

The Case Against — Hard Challenges

Challenge #1 — Three Retrieval Layers = Three Failure Surfaces

v1 had one retrieval system (Qdrant). v2 has three. Each adds operational overhead, latency, and failure modes. If FolkloreDB returns stale graph paths, or Graphiti's temporal index has consistency issues, the compound query returns worse results than a single clean vector search. Complexity must earn its keep. Forge stress-tests whether L2 + L3 compound the answer or compound the noise.

Challenge #2 — Graphiti as Agent Memory: Hallucination Risk

Graphiti stores facts Forge generates — which means it stores Forge's mistakes too. If Forge made an incorrect assessment of a JV opportunity in Week 1, that incorrect fact lives in the temporal graph and influences Week 4 reasoning. Temporal memory amplifies good reasoning AND bad reasoning equally. The contamination problem compounds over time.

Challenge #3 — FolkloreDB Maintenance Overhead

Graph databases require schema design and relationship maintenance. Who defines the narrative/relationship schema for each expert's IP? Who updates it when Brad evolves his methodology? This is not a technical problem — it's an ongoing knowledge engineering problem that requires human attention. Vector RAG is relatively maintenance-free. Graph RAG is not.

Challenge #4 — ATLAS Codebase Not Validated

Carries forward from v1. Zero production deployments. Single developer. The entire stack is contingent on Forge's code assessment. Don't architect a three-layer retrieval system on top of an unvalidated inference engine.

Challenge #5 — Qwen3-14B Quality Gap for Complex Reasoning

Three-layer retrieval produces richer context. A weaker model may produce worse results with richer context than a stronger model with simple context — more rope to hang itself. Validate model quality before investing in retrieval sophistication.

Challenge #6 — Volume Threshold Still Applies

$990 hardware + electricity only beats cloud GPU rental at sufficient task volume. Adding Graphiti + FolkloreDB doesn't change the break-even math. It adds architectural complexity before the economics are proven.

05 //

Second-Order Effects

Graphiti + Expert Clones → The Clone Knows What You've Tried

Today, every interaction with an expert clone starts from scratch. With Graphiti, a clone knows: "This user asked about TIGER QUEST objection handling three sessions ago, tried the reframe approach, and it didn't land." The clone's next response incorporates temporal learning about that specific user — not just about Brad's methodology in general.

Second-order: Clones that track individual user progress over time become coaching systems, not just Q&A systems. The monetization model shifts from "per query" to "ongoing coaching engagement." Higher LTV, higher retention.

FolkloreDB + Forge → JV Idea Graph

Every JV concept, expert, market, and constraint stored as a graph. When Forge evaluates a new JV idea, it traverses the graph: "This idea is structurally similar to the Samuel/Align360 opportunity (same market, different IP), which had these constraints, which led to this outcome." Forge learns from your portfolio history, not just the current pitch.

Second-order: Over time, the JV idea graph becomes the most valuable asset in MasteryMade — a proprietary map of what works, what doesn't, and why, across every domain you've evaluated. That's not replicable by a competitor starting fresh.

Graphiti Temporal Layer → Regulatory Compliance Surface

Temporal knowledge graphs that track "what did the AI know and when" are exactly what regulators are starting to require for AI systems making consequential recommendations. Graphiti gives you an audit trail by default. Not a reason to build it — but a reason it ages well if AI regulation tightens.

Three-Layer Stack → Competitive Moat Narrative

Vector RAG is commodity. Graph + temporal is not. The MasteryOS pitch to JV partners becomes: "We don't just store your IP — we model your methodology as a knowledge graph that evolves over time and learns from every interaction." That's a qualitatively different product conversation than "we built a ChatGPT wrapper on your transcripts."

LoRA + Graphiti Drift Risk → Compounding Contamination

The nightly LoRA trainer improves the base model from user ratings. Graphiti stores Forge's reasoning over time. If bad reasoning enters Graphiti early, it influences future reasoning, which influences future LoRA training data, which degrades the base model. A contamination feedback loop across two separate systems. The ground truth validation layer (expert-clone-scorer) must intercept both — not just LoRA inputs but Graphiti write operations.

Supabase + Qdrant + FolkloreDB + Graphiti → Four Storage Systems

Supabase (relational + auth), Qdrant (vectors), FolkloreDB (graph), Graphiti (temporal graph). Four storage systems to maintain, back up, keep consistent, and query across. The operational overhead is real. This is not an argument against — it's an argument for sequencing. Don't run four systems simultaneously from day one. Phase them in with validation gates between each.

06 //

Proposed Architecture v2 Corrected

PUBLIC LAYER (Vercel + Hostinger — always on, no Hetzner)

Vercel (web apps) → Hostinger (web pages) → Supabase (auth + relational)

COMPUTE LAYER (Hetzner VPS — Forge, n8n, LiteLLM)

Hetzner: n8n Webhooks → LiteLLM Cascade Router → Forge Agent

TUNNEL (Tailscale — Hetzner ↔ Local)

Hetzner VPS ←— Tailscale private network —→ Local Box (GPU)

INFERENCE LAYER (Local Box — Tier 0)

ATLAS :8000 (Qwen3-14B) fallback→ Claude Sonnet (Tier 2) fallback→ Claude Opus (Tier 3)

RETRIEVAL LAYER L1 — Vector (Local)

Qdrant :6333 (100GB HNSW) ←→ ATLAS Embeddings :8080 (MiniLM) ←→ RAG API :8001

RETRIEVAL LAYER L2 — Graph (FolkloreDB)

FolkloreDB (narratives + relationships) ←→ RAG Orchestrator

RETRIEVAL LAYER L3 — Temporal (Graphiti)

Graphiti (temporal knowledge graph) ←→ Forge Agent Memory

LEARNING LAYER (Nightly 2am — Local)

Redis (training queue) → LoRA Trainer → Hot-swap adapter → llama-server

# litellm_config.yaml — full cascade
model_list:
  - model_name: "tier0-local"
    litellm_params:
      model: "openai/qwen3-14b"
      api_base: "http://local-box:8000/v1"   # Tailscale addr
      api_key: "atlas-local-key"
  - model_name: "tier2-claude"
    litellm_params:
      model: "claude-sonnet-4-6"
      api_key: os.environ/ANTHROPIC_API_KEY
router_settings:
  routing_strategy: "usage-based-routing-v2"
  fallbacks: [{"tier0-local": ["tier2-claude"]}]
  allowed_fails: 3
  cooldown_time: 60

# Forge retrieval query pattern (pseudo)
def retrieve(query, context):
    l1 = qdrant.search(query, top_k=5)      # vector
    l2 = folkloredb.traverse(query)          # graph
    l3 = graphiti.get_facts(query, k=5)      # temporal
    return merge_and_rank(l1, l2, l3)

07 //

Hardware Specification (Unchanged)

Component	Spec	Rationale	Cost
GPU	RTX 3090 24GB (used)	24GB VRAM for Qwen3-14B + headroom. Future-proof. Mature drivers. Better value than new 5060 Ti.	$400–450
CPU	Ryzen 7 5700X	8 cores. LoRA CPU training + subagent orchestration. CPU bottlenecks the trainer.	$120
RAM	32GB DDR4	LoRA on 14B model needs RAM headroom. 16GB is risky.	$60
Storage	1TB NVMe SSD	Qdrant 100GB + model weights ~28GB + FolkloreDB + system. NVMe for vector I/O.	$70
Motherboard	B550 PCIe 4.0	PCIe 4.0 GPU bandwidth. B550 sweet spot.	$100
PSU	750W 80+ Gold	RTX 3090 TDP 350W. 750W safe headroom.	$80
Case	Mid-tower ATX	Airflow for 24/7 GPU load.	$60
UPS	APC 1000VA	Non-negotiable. Power blinks corrupt Redis queues and LoRA checkpoints.	$100
Total			~$990

Break-even vs. $250/mo cloud GPU rental: Month 4. Year 2 delta: ~$2,400 saved. Graphiti and FolkloreDB run on existing Hetzner or Supabase — no additional hardware required for L2/L3 retrieval layers.

08 //

Implementation Phases v2

Forge Code Review — Hard Gate

Nothing proceeds until Forge validates ATLAS codebase and the three-layer retrieval architecture. The v2 questions in the Forge Brief above are mandatory. Net-positive verdict required before P1.

Forge reads ATLAS codebase (8 questions in Forge Brief)
Forge reads Graphiti codebase (2 new questions)
Forge returns adversarial assessment with code references
Jason + Claude review — proceed, pivot, or kill

$3 RunPod Validation

Spin RTX 4090 on RunPod (~$0.70/hr, 4 hours). Deploy ATLAS. Run 20 real Forge tasks. Measure success rate, latency, quality vs. Claude Sonnet.

Deploy ATLAS via install.sh — document all friction
Wire LiteLLM Forge → ATLAS Tier 0
Run 20 real tasks — code gen, RAG retrieval, framework building
Quality gate: ATLAS output must pass at ≥70% of tasks without Claude fallback
Total cost: ~$3

Hardware + Local ATLAS Deployment

Only if P1 validates. Order hardware. Deploy. Migrate Forge to local. Wire Tailscale. Keep Hetzner for n8n and webhooks.

Order RTX 3090 used (eBay / r/hardwareswap)
Deploy ATLAS on local box — replicate P1 results
Tailscale: Hetzner VPS ↔ local box private network
Migrate Forge agent compute to local, ATLAS Tier 0
Keep Hetzner: n8n, LiteLLM router, public webhooks

L1 Migration — Expert Factory to Qdrant

Move Expert Factory vectors from Supabase/pgvector to ATLAS Qdrant. Local embeddings. Validate retrieval quality before proceeding to L2.

Export expert knowledge bases from Supabase
Re-embed via local MiniLM-L6-v2
Validate retrieval quality vs. pgvector baseline on sample queries
Wire expert clone inference through ATLAS RAG API
Legal review: IP co-location obligations before local move

L2 Activation — FolkloreDB Graph Layer

Model expert methodologies as graph structures in FolkloreDB. Wire into RAG orchestrator alongside Qdrant. Validate that graph traversal adds measurable retrieval quality over vector-only.

Design graph schema for expert methodology (concepts, relationships, narrative threads)
Ingest Brad's TIGER QUEST as first graph — manual review
A/B test: L1-only vs. L1+L2 retrieval quality on identical queries
Proceed only if L2 measurably improves answer quality — don't add complexity for its own sake

L3 Activation — Graphiti Temporal Memory

Wire Graphiti as Forge's persistent session memory. Every JV evaluation, build decision, and dead end becomes a temporal fact node. Build contamination guard — validate Graphiti writes before they persist.

Deploy Graphiti on Hetzner (runs without GPU)
Wire Forge to write decisions + outcomes to Graphiti post-session
Build contamination guard: flag low-confidence facts before write
Test: does Forge's Week 2 reasoning improve from Week 1 Graphiti context?
JV idea graph: all evaluated opportunities as connected temporal nodes

Continuous Learning Activation

Nightly LoRA from user interactions. Ground truth validation gate before adapter deployment. Drift monitoring across all three retrieval layers.

1–5 rating UI on expert clone responses
Wire 4+ ratings → Redis training queue
First nightly run — validate trainer completes in window
Automated expert-clone-scorer pre-deployment check (LoRA drift guard)
Graphiti write validation: sample 10% of facts against ground truth before persistence
Weekly drift monitoring across L1 + L2 + L3

09 //

Risk Matrix v2

ATLAS codebase doesn't deliver on README claimsZero production deployments. Single developer. P0 Forge code read is the only resolution.

HIGHKILL SWITCH

Three retrieval layers compound complexity, not qualityEach layer adds latency, maintenance, and failure surfaces. Compounding layers only justify themselves if measurable quality improvement is demonstrated at each gate.

HIGHGATEABLE

Graphiti contamination loopBad Forge reasoning stored as temporal facts → influences future reasoning → degrades LoRA training data. Contamination amplifies across two systems simultaneously.

HIGHMITIGABLE

FolkloreDB schema maintenance is a knowledge engineering problemGraph databases need ongoing schema design and curation. This requires human attention — not a one-time setup. Who owns this as expert roster grows?

MEDPLAN REQUIRED

Qwen3-14B quality gap for agentic reasoningRicher context from 3-layer retrieval amplifies model capability — but also amplifies model errors. Validate model quality before investing in retrieval sophistication.

HIGHMEASURABLE

Four storage systems operational overheadSupabase + Qdrant + FolkloreDB + Graphiti. Each needs backup, consistency management, and monitoring. Phase these in — never run all four before any single one is proven.

MEDSEQUENCING SOLVES

IP co-location legal exposureMultiple expert IPs on local hardware creates obligations if JV dissolves. Legal architecture review required before P3.

LOWWORTH FLAGGING

Volume threshold for hardware ROIEconomics only work at sufficient task volume. At low current volume, hardware may not pay back before you upgrade it.

MEDMEASURABLE

10 //

Verdict v2

PRD Status — Conditional, Sequenced

The thesis got stronger.
The sequence got longer.

Adding Graphiti and FolkloreDB makes the end-state architecture genuinely compelling — three-layer retrieval, temporal agent memory, graph-structured expert IP. But it adds two additional systems on top of an unvalidated inference engine. P0 Forge code read is still mandatory. Then $3 RunPod test. Then hardware. Then each retrieval layer, one at a time, with a measurable quality gate before advancing to the next. No layer earns its operational overhead until it proves it in production.

If Forge Says No on ATLAS

Steal algorithms, keep the retrieval vision

Ralph Loop in Forge directly. Qdrant standalone, no K3s. Graphiti on Hetzner now (no GPU needed). FolkloreDB schema design as a planning exercise. The retrieval architecture survives even if ATLAS doesn't.

If Forge Says Yes

$3 RunPod → hardware → phase by phase

Never run ahead of validated layers. Each phase gate is a deliberate quality check, not a formality. The three-layer retrieval stack is the destination — but you earn each layer sequentially.

Regardless of Forge Verdict

Graphiti now, no hardware required

Graphiti runs on Hetzner without a GPU. Forge's persistent memory problem is solvable today, independent of the hardware decision. This is the highest-leverage immediate action regardless of what Forge finds in the ATLAS codebase.

11 //

For Forge — Conversation Context

This PRD was built from a full conversation covering: the $500 GPU HN story, Cursor parallel subagents signal, ATLAS architecture deep-dive, VPS vs. hardware economics, Forge migration feasibility, FolkloreDB graph layer, Graphiti temporal memory, and second-order effects across the full stack. Infrastructure was corrected in v2 from v1 (Hetzner is compute-only; Vercel + Hostinger + Supabase are the actual web/data layer).

Claude's analysis is based on README reads, not code inspection. Forge goes to the actual code. The 10 questions in the Forge Brief are mandatory. Forge's pushback determines whether this proceeds, pivots, or gets killed. The amber card above ("Graphiti now") is the one action Forge should evaluate independently of ATLAS — it may be the most immediately valuable regardless of the hardware decision.

ATLAS Codebase Graphiti (Zep) OpenCode Client

ATLAS.GRAPHITI.FOLKLOREDB.FORGE