Engram: Cognitive-Inspired AI Memory Architecture

Most production AI memory is retrieval-augmented generation (RAG): embed old conversations, search on similarity, inject into context. This works, but it's not how human memory works.

Human memory consolidates similar experiences over time. It allows unused information to decay. It updates retrieved memories with new information. It strengthens repeatedly accessed facts.

I built Engram to test whether applying cognitive science principles to AI memory improves long-term conversational coherence. After two weeks in production managing 3,030 memories with 87.9% entity coverage and temporal relationship tracking, the system demonstrates measurable improvements in context continuity, contradiction prevention, and memory precision.

The Problem

Standard RAG implementations suffer from several fundamental issues:

No Consolidation: Duplicate or similar information accumulates without merging. "Ben prefers dark mode" stored three times creates redundancy and retrieval noise.

No Decay: All memories persist with equal weight regardless of relevance. Debugging output from January competes with critical decisions from February.

No Reconsolidation: When facts change, most systems either duplicate or miss the update entirely. "Ali works at Google" and "Ali works at Meta" both exist with no indication which is current.

No Temporal Context: RAG can't answer "What did I think about X in January?" or "When did this fact change?"

No Certainty Tracking: "Ben said X" and "Ben probably thinks X" carry equal weight.

Cognitive Science Principles

Engram implements five mechanisms from memory research:

Consolidation — Merge similar experiences over time
Decay — Deprioritize unused information
Reconsolidation — Update retrieved memories with new information
Temporal Validity — Track when facts were true vs when they changed
Uncertainty Tracking — Distinguish facts from inferences

Architecture

Engram uses dual storage: Qdrant for semantic search and markdown files for human readability.

Memory Metadata Schema

Every memory in Qdrant carries structured metadata:

interface StoredMemory {
  text: string;
  source: string;
  timestamp: number;
  entities: StoredEntity[];
  relationships: StoredRelationship[];
  importance: number;
  certainty: number;
  access_count: number;
  last_accessed: number;
  forgotten: boolean;
}

The certainty field distinguishes stated facts (1.0) from strong inferences (0.7) from weak guesses (0.4). Temporal validity fields (validFrom, `validUntil)) enable tracking fact changes over time.

Entity Resolution and Deduplication

Entities are identified by deterministic IDs: SHA-256(lowercase(name) + type).slice(0,16).

Deduplication strategies:

Exact name match (case-insensitive) — Auto-merge
Same name, multiple types — Auto-merge to canonical type
Fuzzy string matching (>75% Levenshtein) — Human review
Singular/plural variants — Auto-merge

Production results (Feb 14, 2026):

Before: 3,163 entities, 504 potential duplicates
After: 3,116 unique entities (48 merged)

Implementation

Auto-Capture (After Each Turn)

A local LLM (GLM-4.7-Flash, 30B MoE) extracts 0-5 key facts worth remembering. Average extraction latency: 4 seconds per turn (asynchronous, non-blocking).

Extraction prompt focuses on:

Preferences, decisions, personal info, project updates, opinions, plans
Skips greetings, tool outputs, temporary debugging

Deduplication before storage: Before storing a new fact, embed it and search Qdrant for near-duplicates (cosine similarity >0.85). If found, update the existing memory instead of creating clutter.

Auto-Recall (Before Each Turn)

Semantic search retrieves relevant memories and injects them as context. Each successful retrieval:

Increments access_count (reinforcement)
Updates last_accessed (recency tracking)
Memory recall adds 50-150ms to turn latency (acceptable for conversational AI)

Temporal Validity and Reconsolidation

When a new fact contradicts an existing one, auto-invalidation fires:

async function invalidateConflictingRelationships(
  source: string, type: string, newTarget: string
): Promise<void> {
  const existing = await searchRelationships(db, source, type);
  for (const rel of existing) {
    if (rel.target !== newTarget && !rel.validUntil) {
      rel.validUntil = Date.now();  // Invalidate old relationship
    }
  }
}

Example:

Store: "Ali works_at Stanford" (Jan 2024)
Store: "Ali works_at Google" (Feb 2025) → Auto-invalidates Stanford relationship
Store: "Ali works_at Meta" (Feb 2026) → Auto-invalidates Google relationship

Query "Where does Ali work?" retrieves only the active relationship (Meta).

Uncertainty Tracking

GLM classifies certainty during extraction. Production distribution:

1.0 (stated facts): 42%
0.7 (strong inferences): 51%
0.4 (weak guesses): 7%

Results

Production Metrics (Feb 14, 2026)

Metric	Value
Total memories	3,030
Unique entities	3,116
Entity coverage	87.9%
Entities merged	48
Auto-capture latency	4s (async)
Auto-recall latency	50-150ms

Qualitative Improvements

Context continuity: The AI references week-old decisions without re-explanation
Self-correction: When facts change, old memories update instead of creating contradictions
Reduced noise: Tuning similarity thresholds (0.3 → 0.5) cut irrelevant recalls by ~40%
Certainty awareness: Responses now qualify inferences with confidence levels

Lessons Learned

Dual Storage Is Worth the Complexity

Vector search is fast but opaque. Markdown is slow but debuggable. Having both means you can inspect what the AI "remembers" and fix it when it's wrong.

Forgetting Is a Feature, Not a Bug

The system intentionally deprioritizes unused memories. This prevents context pollution—old, irrelevant information crowding out what actually matters today.

Reconsolidation Solves a Real Problem

When facts change, most AI systems either duplicate information or miss the update entirely. Reconsolidation prevents both. It's one of the highest-ROI features.

Cognitive Science Provides Better Priors

I didn't invent these mechanisms. I borrowed them from decades of memory research: consolidation, decay, reconsolidation, spaced retrieval, the testing effect. Applying them to AI memory wasn't novel—but it was effective.

Open Questions

How do you measure memory "usefulness" beyond recall frequency? Frequency isn't the whole story—some important memories (passwords, emergency contacts) are rarely accessed.

Can uncertainty tracking guide active learning? If the system knows it's uncertain, should it ask clarifying questions?

What's the right decay curve for professional vs personal contexts? Work information might need faster decay than personal relationships.

Certainty-weighted retrieval: Future enhancement where final_score = semantic_score × certainty × importance × recency_boost.

Conclusion

Engram demonstrates that applying cognitive science principles to AI memory improves conversational coherence. The system isn't perfect—decay tuning needs work, certainty-weighted retrieval isn't implemented yet—but the core hypothesis holds: memory mechanisms from cognitive science translate directly to production AI systems with measurable benefits.

Abstract