PRD v1 had infrastructure errors now corrected. v2 adds two new components: FolkloreDB (graph/narrative store) and Graphiti (Zep's temporal knowledge graph). The core hardware/ATLAS thesis carries forward, but the retrieval architecture is now significantly more complex. Your job: validate ATLAS codebase AND assess whether the 3-layer retrieval stack (Qdrant vector + FolkloreDB graph + Graphiti temporal) creates compounding retrieval or compounding complexity.
Context: Jason is evaluating ~$990 GPU hardware for local ATLAS deployment + Forge migration, with FolkloreDB and Graphiti as the knowledge layers that make expert clones and JV validation genuinely more powerful than pure vector RAG. The architecture claim needs stress-testing.
v1 PRD incorrectly placed web pages on Hetzner. Actual state: Hetzner is compute-only (Forge, n8n, LiteLLM, agent infrastructure). Web pages and frontend are on Vercel + Hostinger. Auth and relational data on Supabase. There is no public web layer on Hetzner to protect. This simplifies the hybrid architecture — local box connects directly to Hetzner compute via Tailscale. No need to route around a web tier.
FolkloreDB — graph/knowledge database for storing narratives and relationships. Exists in Jason's stack. Not mentioned in v1. Graphiti (not "Graphite") — Zep's temporal knowledge graph library for AI agents. Stores facts as graph nodes/edges with time-awareness. Built specifically for agent persistent memory. Together these add a second and third retrieval layer on top of ATLAS's Qdrant vector store, changing the expert clone and Forge memory architecture fundamentally.
ATLAS — self-hosted AI coding agent stack. Consumer GPU (RTX 5060 Ti 16GB). Six components: LLM Proxy, RAG API, llama-server (Qwen3-14B), Embeddings (MiniLM-L6-v2), Qdrant (vector DB), Task Worker (Ralph Loop). Redis queues. Nightly LoRA fine-tuner. OpenAI-compatible API.
ATLAS is not novel. Every component existed before 2025. Zero stars, zero forks. Single developer. The Ralph Loop is elegant applied math, not invention. The moat is in how you deploy it against your specific expert domains — not in the stack itself. Do not oversell.
This is the architectural insight that v2 introduces. Pure vector RAG (what ATLAS ships with) answers the question: "what chunks of text are semantically similar to this query?" That's useful but shallow. Adding FolkloreDB and Graphiti creates two additional retrieval dimensions that are qualitatively different — and together they change what expert clones and Forge can do.
Semantic nearest-neighbor search across embedded knowledge chunks. Answers: "what content is most similar to this query?" Fast, scalable, stateless. Already in ATLAS. 100GB storage, HNSW index, MiniLM-L6-v2 embeddings (384 dims). This is the baseline — the floor, not the ceiling.
Stores concepts, narratives, and relationships as graph nodes and edges. Answers: "what concepts connect to this, and how?" Where Qdrant finds similar text, FolkloreDB finds related ideas through explicit relationship paths. For expert clones: Brad's TIGER QUEST methodology isn't just chunks of text — it's a graph of connected sales concepts (qualification → discovery → close), each with narrative context. Graph traversal retrieves the relational structure, not just the surface content.
Forge application: JV ideas stored as narrative graphs — "this idea connects to this market, which connects to this expert, which has these constraints." Validation subagents traverse the graph to find second-order implications, not just vector-similar concepts.
Zep's temporal knowledge graph. Stores facts as graph nodes/edges with time-awareness — every fact has a "valid from / valid until" timestamp. Answers: "what do we know, when did we learn it, and has it changed?" This is the layer that gives Forge genuine memory across sessions. Not just "here's what's in the vector store" but "here's what Forge decided about Brad's JV on March 15th, and here's how that evolved by March 27th."
Expert clone application: A clone doesn't just know Brad's methodology — it knows that Brad updated his TIGER QUEST rubric in Week 3, that a specific sales objection pattern was added after a real call, and that certain concepts have higher recency weight. The clone answers from the most temporally relevant state of Brad's knowledge, not a static snapshot.
Forge application: Forge's persistent memory. Every JV evaluation, every build decision, every dead end becomes a temporal fact node. Forge doesn't repeat analysis it already did — it queries Graphiti first, finds prior reasoning, and either builds on it or updates it with new context.
The compound query: When Forge evaluates a new JV idea, a full retrieval hit queries all three layers simultaneously — vector similarity for related content (L1), graph traversal for concept relationships (L2), temporal facts for what Forge has already learned and when (L3). The answer is richer than any single layer could produce. The risk is orchestration complexity and latency. Forge assesses whether the compound gain justifies the compound overhead.
Forge spawns subagents to evaluate multiple JV ideas simultaneously. Against local ATLAS: free. Rate limits disappear. Volume of ideas validated per dollar collapses toward zero.
LoRA trainer turns 4+ rated sessions into training data. Clones improve from usage automatically. The asset appreciates while you sleep — without Sumit sprints or Jason intervention.
Today Forge forgets between sessions. Graphiti makes every decision, evaluation, and dead-end a persistent temporal fact. Forge doesn't re-analyze what it already analyzed — it builds on prior reasoning. Compound intelligence, not reset intelligence.
Brad's TIGER QUEST isn't just text chunks — it's a graph of connected sales concepts. Graph retrieval finds structural relationships vector search misses. Richer clone responses, better expert IP preservation.
"Your expert clone runs on our sovereign hardware, no data leaves our stack." Healthcare, finance, legal clients have data residency requirements. SaaS wrappers can't offer this. You can.
ATLAS Tier 0 handles routine coding + retrieval. Claude Sonnet is the exception. At volume, API spend drops significantly. Break-even on hardware by Month 4 vs. cloud GPU rental.
v1 had one retrieval system (Qdrant). v2 has three. Each adds operational overhead, latency, and failure modes. If FolkloreDB returns stale graph paths, or Graphiti's temporal index has consistency issues, the compound query returns worse results than a single clean vector search. Complexity must earn its keep. Forge stress-tests whether L2 + L3 compound the answer or compound the noise.
Graphiti stores facts Forge generates — which means it stores Forge's mistakes too. If Forge made an incorrect assessment of a JV opportunity in Week 1, that incorrect fact lives in the temporal graph and influences Week 4 reasoning. Temporal memory amplifies good reasoning AND bad reasoning equally. The contamination problem compounds over time.
Graph databases require schema design and relationship maintenance. Who defines the narrative/relationship schema for each expert's IP? Who updates it when Brad evolves his methodology? This is not a technical problem — it's an ongoing knowledge engineering problem that requires human attention. Vector RAG is relatively maintenance-free. Graph RAG is not.
Carries forward from v1. Zero production deployments. Single developer. The entire stack is contingent on Forge's code assessment. Don't architect a three-layer retrieval system on top of an unvalidated inference engine.
Three-layer retrieval produces richer context. A weaker model may produce worse results with richer context than a stronger model with simple context — more rope to hang itself. Validate model quality before investing in retrieval sophistication.
$990 hardware + electricity only beats cloud GPU rental at sufficient task volume. Adding Graphiti + FolkloreDB doesn't change the break-even math. It adds architectural complexity before the economics are proven.
Today, every interaction with an expert clone starts from scratch. With Graphiti, a clone knows: "This user asked about TIGER QUEST objection handling three sessions ago, tried the reframe approach, and it didn't land." The clone's next response incorporates temporal learning about that specific user — not just about Brad's methodology in general.
Second-order: Clones that track individual user progress over time become coaching systems, not just Q&A systems. The monetization model shifts from "per query" to "ongoing coaching engagement." Higher LTV, higher retention.
Every JV concept, expert, market, and constraint stored as a graph. When Forge evaluates a new JV idea, it traverses the graph: "This idea is structurally similar to the Samuel/Align360 opportunity (same market, different IP), which had these constraints, which led to this outcome." Forge learns from your portfolio history, not just the current pitch.
Second-order: Over time, the JV idea graph becomes the most valuable asset in MasteryMade — a proprietary map of what works, what doesn't, and why, across every domain you've evaluated. That's not replicable by a competitor starting fresh.
Temporal knowledge graphs that track "what did the AI know and when" are exactly what regulators are starting to require for AI systems making consequential recommendations. Graphiti gives you an audit trail by default. Not a reason to build it — but a reason it ages well if AI regulation tightens.
Vector RAG is commodity. Graph + temporal is not. The MasteryOS pitch to JV partners becomes: "We don't just store your IP — we model your methodology as a knowledge graph that evolves over time and learns from every interaction." That's a qualitatively different product conversation than "we built a ChatGPT wrapper on your transcripts."
The nightly LoRA trainer improves the base model from user ratings. Graphiti stores Forge's reasoning over time. If bad reasoning enters Graphiti early, it influences future reasoning, which influences future LoRA training data, which degrades the base model. A contamination feedback loop across two separate systems. The ground truth validation layer (expert-clone-scorer) must intercept both — not just LoRA inputs but Graphiti write operations.
Supabase (relational + auth), Qdrant (vectors), FolkloreDB (graph), Graphiti (temporal graph). Four storage systems to maintain, back up, keep consistent, and query across. The operational overhead is real. This is not an argument against — it's an argument for sequencing. Don't run four systems simultaneously from day one. Phase them in with validation gates between each.
| Component | Spec | Rationale | Cost |
|---|---|---|---|
| GPU | RTX 3090 24GB (used) | 24GB VRAM for Qwen3-14B + headroom. Future-proof. Mature drivers. Better value than new 5060 Ti. | $400–450 |
| CPU | Ryzen 7 5700X | 8 cores. LoRA CPU training + subagent orchestration. CPU bottlenecks the trainer. | $120 |
| RAM | 32GB DDR4 | LoRA on 14B model needs RAM headroom. 16GB is risky. | $60 |
| Storage | 1TB NVMe SSD | Qdrant 100GB + model weights ~28GB + FolkloreDB + system. NVMe for vector I/O. | $70 |
| Motherboard | B550 PCIe 4.0 | PCIe 4.0 GPU bandwidth. B550 sweet spot. | $100 |
| PSU | 750W 80+ Gold | RTX 3090 TDP 350W. 750W safe headroom. | $80 |
| Case | Mid-tower ATX | Airflow for 24/7 GPU load. | $60 |
| UPS | APC 1000VA | Non-negotiable. Power blinks corrupt Redis queues and LoRA checkpoints. | $100 |
| Total | ~$990 | ||
Break-even vs. $250/mo cloud GPU rental: Month 4. Year 2 delta: ~$2,400 saved. Graphiti and FolkloreDB run on existing Hetzner or Supabase — no additional hardware required for L2/L3 retrieval layers.
Nothing proceeds until Forge validates ATLAS codebase and the three-layer retrieval architecture. The v2 questions in the Forge Brief above are mandatory. Net-positive verdict required before P1.
Spin RTX 4090 on RunPod (~$0.70/hr, 4 hours). Deploy ATLAS. Run 20 real Forge tasks. Measure success rate, latency, quality vs. Claude Sonnet.
Only if P1 validates. Order hardware. Deploy. Migrate Forge to local. Wire Tailscale. Keep Hetzner for n8n and webhooks.
Move Expert Factory vectors from Supabase/pgvector to ATLAS Qdrant. Local embeddings. Validate retrieval quality before proceeding to L2.
Model expert methodologies as graph structures in FolkloreDB. Wire into RAG orchestrator alongside Qdrant. Validate that graph traversal adds measurable retrieval quality over vector-only.
Wire Graphiti as Forge's persistent session memory. Every JV evaluation, build decision, and dead end becomes a temporal fact node. Build contamination guard — validate Graphiti writes before they persist.
Nightly LoRA from user interactions. Ground truth validation gate before adapter deployment. Drift monitoring across all three retrieval layers.
Ralph Loop in Forge directly. Qdrant standalone, no K3s. Graphiti on Hetzner now (no GPU needed). FolkloreDB schema design as a planning exercise. The retrieval architecture survives even if ATLAS doesn't.
Never run ahead of validated layers. Each phase gate is a deliberate quality check, not a formality. The three-layer retrieval stack is the destination — but you earn each layer sequentially.
Graphiti runs on Hetzner without a GPU. Forge's persistent memory problem is solvable today, independent of the hardware decision. This is the highest-leverage immediate action regardless of what Forge finds in the ATLAS codebase.
This PRD was built from a full conversation covering: the $500 GPU HN story, Cursor parallel subagents signal, ATLAS architecture deep-dive, VPS vs. hardware economics, Forge migration feasibility, FolkloreDB graph layer, Graphiti temporal memory, and second-order effects across the full stack. Infrastructure was corrected in v2 from v1 (Hetzner is compute-only; Vercel + Hostinger + Supabase are the actual web/data layer).
Claude's analysis is based on README reads, not code inspection. Forge goes to the actual code. The 10 questions in the Forge Brief are mandatory. Forge's pushback determines whether this proceeds, pivots, or gets killed. The amber card above ("Graphiti now") is the one action Forge should evaluate independently of ATLAS — it may be the most immediately valuable regardless of the hardware decision.
ATLAS Codebase Graphiti (Zep) OpenCode Client