First Memory Deep Dive

Follow a single claim through every pipeline layer and understand the data transformations that make epistemic memory work.

The Claim

A user sends in chat:

"Tôi thích uống cà phê đen mỗi sáng"
(I like drinking black coffee every morning)

The AI agent decides this is worth remembering and calls epistemic_store.

L0: Policy Firewall

The claim is first classified by type:

CheckResult
Is PII?No (food preference)
Is garbage?No
Claim typebehavioral
Policy verdictPASS

L0.5: Sentence Classifier

Fast regex checks filter non-storable content:

  • Not a greeting ("xin chào", "hello") → ✅
  • Not a question → ✅
  • Not a command ("hãy làm", "please do") → ✅
  • Contains factual assertion → ✅ proceed to L1

L1: Claim Normalizer

The Vietnamese text is parsed into a structured triple:

{
  "subject": "user",
  "predicate": "likes drinking",
  "object": "black coffee every morning",
  "kind": "behavioral",
  "decayClass": "STABLE"
}

The normalizer uses regex fast-path first (handles ~70% of patterns), then falls back to LLM extraction for complex sentences.

L2: Confidence Scorer

Confidence is computed via a multi-factor sigmoid function:

confidence = σ(α·source + β·corroboration − γ·conflict + δ·kind)
FactorValueWeightContribution
source = user_explicit1.0α = 0.4+0.40
corroboration0 (first mention)β = 0.2+0.00
conflict0 (none found)γ = 0.3-0.00
kind = behavioral0.8δ = 0.1+0.08

Raw score: 0.48 → σ(0.48) ≈ 0.618 → rounded and adjusted: 0.741

L3: Conflict Detector

Vector and keyword search checks for contradictions:

  • Search: "user likes drinking black coffee every morning" → 0 existing matches
  • No contradictions found → claim proceeds cleanly
  • Entropy delta: +0.00

L4: Embedding + Storage

The claim is embedded and stored in LanceDB:

// 36-column record written to LanceDB
{
  id: "mem_a1b2c3d4",
  claim: "Tôi thích uống cà phê đen mỗi sáng",
  subject: "user",
  predicate: "likes drinking",
  object: "black coffee every morning",
  kind: "behavioral",
  confidence: 0.741,
  tier: "WORKING",
  decayClass: "STABLE",
  source: "user_explicit",
  channelId: "telegram:123456",
  storedAt: "2026-01-15T10:30:00.000Z",
  lastAccessed: "2026-01-15T10:30:00.000Z",
  vector: Float32Array[1536],
  // ... 22 more columns
}

L5: Tier Router

The confidence score determines the memory's tier:

TierConfidence RangeBehavior
QUARANTINE< 0.30Hidden, never injected
CANDIDATE0.30 – 0.49Available on search only
WORKING0.50 – 0.89Auto-injected into prompts
FACT≥ 0.90Permanent, high-priority injection

Our coffee claim has confidence 0.741 → WORKING tier. It will be auto-injected into future conversations.

Tip: Use epistemic_promote to manually boost memories to FACT tier, or epistemic_demote to drop them back.

Summary

A single claim traverses 7 processing stages in under 200ms, resulting in a richly-annotated memory with confidence scoring, decay classification, conflict checking, and automatic tier assignment. This is what makes epistemic memory fundamentally different from key-value storage.