6-Layer Pipeline
Every claim passes through a deterministic pipeline before becoming a memory. No raw data is stored without validation.
Overview
The pipeline consists of 7 stages (L0, L0.5, L1–L5), each responsible for a specific validation or transformation step.
L0: Policy Firewall
The first gate classifies the incoming claim into one of several types and applies policy rules:
| Claim Type | Examples | Policy |
|---|---|---|
identity | "My name is Huy" | Always allow, high priority |
occupational | "I work at VNG" | Allow, stable decay |
preference | "I prefer dark mode" | Allow, stable decay |
behavioral | "I run 5km daily" | Allow, moderate decay |
temporal | "Meeting tomorrow at 2pm" | Allow, ephemeral decay |
relational | "My wife is named Lan" | Allow, stable decay |
pii | Credit card, SSN | Block |
L0.5: Sentence Classifier
A fast regex pre-filter removes non-storable content before expensive LLM processing:
- Greetings: "xin chào", "hello", "hi there" → DROP
- Questions: "what is X?", "bao nhiêu?" → DROP (questions are not claims)
- Commands: "please do X", "hãy làm" → DROP
- Code blocks: Fenced code → DROP (code is not a claim)
- Assertions: Anything containing factual content → PASS to L1
L1: Claim Normalizer
The normalizer transforms raw text into a structured triple. It uses a two-stage approach:
- Regex fast-path — handles ~70% of common patterns (e.g., "I am X", "My X is Y", "I like X"). Supports English and Vietnamese.
- LLM fallback — for complex or ambiguous sentences, the LLM extracts the triple with kind and decay class assignment.
// Output of L1 normalizer
{
subject: "user", // who the claim is about
predicate: "works at", // the relationship
object: "Google", // the value
kind: "occupational", // claim classification
decayClass: "STABLE" // how fast it should decay
}L2: Confidence Scorer
Confidence is computed using a sigmoid function with 4 factors:
C = σ(α·S + β·K − γ·F + δ·T)
Where:
- S (Source weight) —
user_explicit: 1.0,agent_inferred: 0.3,group_chat: 0.5 - K (Corroboration) — number of supporting memories (0–1 normalized)
- F (Conflict penalty) — number of contradicting memories
- T (Kind bonus) — identity: 1.0, occupational: 0.9, preference: 0.8, behavioral: 0.7, temporal: 0.4
L3: Conflict Detector
Searches for contradictions using hybrid matching:
- Vector similarity search (cosine, top-5 results with threshold 0.85)
- Keyword overlap on subject + predicate
- Semantic contradiction detection via LLM comparison
When a conflict is found:
- The older claim is moved to
CHALLENGEDtier - Both claims get a
conflictGrouplink - System entropy increases
- If
conflictAutoResolveis enabled, the higher-confidence claim wins automatically
L4: Embedding & Storage
The claim is embedded using OpenAI's text-embedding-3-small (1536 dimensions) and stored in LanceDB with 36 columns including metadata, confidence history, and linking information.
L5: Tier Router
The final stage assigns a tier based on confidence thresholds:
| Tier | Range | Auto-Inject? | Behavior |
|---|---|---|---|
QUARANTINE | 0.00–0.29 | No | Hidden, awaiting evidence |
CANDIDATE | 0.30–0.49 | No | Searchable, not injected |
WORKING | 0.50–0.89 | Yes | Active working memory |
FACT | 0.90–1.00 | Yes (priority) | Verified, high confidence |
CHALLENGED | Any | No | In conflict, needs resolution |