AI Research

Engram: Cognitive-Inspired AI Memory Architecture

Ben Zanghi
February 14, 2026
15 min read
Engram: Cognitive-Inspired AI Memory Architecture

Abstract

Most production AI memory is RAG: embed, search, inject. It works, but it's not how human memory works. I built Engram to apply cognitive science principles to AI memory—with measurable improvements in context continuity and contradiction prevention.

Most production AI memory is retrieval-augmented generation (RAG): embed old conversations, search on similarity, inject into context. This works, but it's not how human memory works.

Human memory consolidates similar experiences over time. It allows unused information to decay. It updates retrieved memories with new information. It strengthens repeatedly accessed facts.

I built Engram to test whether applying cognitive science principles to AI memory improves long-term conversational coherence. After two weeks in production managing 3,030 memories with 87.9% entity coverage and temporal relationship tracking, the system demonstrates measurable improvements in context continuity, contradiction prevention, and memory precision.

The Problem

Standard RAG implementations suffer from several fundamental issues:

No Consolidation: Duplicate or similar information accumulates without merging. "Ben prefers dark mode" stored three times creates redundancy and retrieval noise.

No Decay: All memories persist with equal weight regardless of relevance. Debugging output from January competes with critical decisions from February.

No Reconsolidation: When facts change, most systems either duplicate or miss the update entirely. "Ali works at Google" and "Ali works at Meta" both exist with no indication which is current.

No Temporal Context: RAG can't answer "What did I think about X in January?" or "When did this fact change?"

No Certainty Tracking: "Ben said X" and "Ben probably thinks X" carry equal weight.

Cognitive Science Principles

Engram implements five mechanisms from memory research:

  1. Consolidation — Merge similar experiences over time
  2. Decay — Deprioritize unused information
  3. Reconsolidation — Update retrieved memories with new information
  4. Temporal Validity — Track when facts were true vs when they changed
  5. Uncertainty Tracking — Distinguish facts from inferences

Architecture

Engram uses dual storage: Qdrant for semantic search and markdown files for human readability.

Memory Metadata Schema

Every memory in Qdrant carries structured metadata:

interface StoredMemory {
  text: string;
  source: string;
  timestamp: number;
  entities: StoredEntity[];
  relationships: StoredRelationship[];
  importance: number;
  certainty: number;
  access_count: number;
  last_accessed: number;
  forgotten: boolean;
}

The certainty field distinguishes stated facts (1.0) from strong inferences (0.7) from weak guesses (0.4). Temporal validity fields (validFrom, `validUntil)) enable tracking fact changes over time.

Entity Resolution and Deduplication

Entities are identified by deterministic IDs: SHA-256(lowercase(name) + type).slice(0,16).

Deduplication strategies:

  1. Exact name match (case-insensitive) — Auto-merge
  2. Same name, multiple types — Auto-merge to canonical type
  3. Fuzzy string matching (>75% Levenshtein) — Human review
  4. Singular/plural variants — Auto-merge

Production results (Feb 14, 2026):

Implementation

Auto-Capture (After Each Turn)

A local LLM (GLM-4.7-Flash, 30B MoE) extracts 0-5 key facts worth remembering. Average extraction latency: 4 seconds per turn (asynchronous, non-blocking).

Extraction prompt focuses on:

Deduplication before storage: Before storing a new fact, embed it and search Qdrant for near-duplicates (cosine similarity >0.85). If found, update the existing memory instead of creating clutter.

Auto-Recall (Before Each Turn)

Semantic search retrieves relevant memories and injects them as context. Each successful retrieval:

Temporal Validity and Reconsolidation

When a new fact contradicts an existing one, auto-invalidation fires:

async function invalidateConflictingRelationships(
  source: string, type: string, newTarget: string
): Promise<void> {
  const existing = await searchRelationships(db, source, type);
  for (const rel of existing) {
    if (rel.target !== newTarget && !rel.validUntil) {
      rel.validUntil = Date.now();  // Invalidate old relationship
    }
  }
}

Example:

  1. Store: "Ali works_at Stanford" (Jan 2024)
  2. Store: "Ali works_at Google" (Feb 2025) → Auto-invalidates Stanford relationship
  3. Store: "Ali works_at Meta" (Feb 2026) → Auto-invalidates Google relationship

Query "Where does Ali work?" retrieves only the active relationship (Meta).

Uncertainty Tracking

GLM classifies certainty during extraction. Production distribution:

Results

Production Metrics (Feb 14, 2026)

MetricValue
Total memories3,030
Unique entities3,116
Entity coverage87.9%
Entities merged48
Auto-capture latency4s (async)
Auto-recall latency50-150ms

Qualitative Improvements

Lessons Learned

Dual Storage Is Worth the Complexity

Vector search is fast but opaque. Markdown is slow but debuggable. Having both means you can inspect what the AI "remembers" and fix it when it's wrong.

Forgetting Is a Feature, Not a Bug

The system intentionally deprioritizes unused memories. This prevents context pollution—old, irrelevant information crowding out what actually matters today.

Reconsolidation Solves a Real Problem

When facts change, most AI systems either duplicate information or miss the update entirely. Reconsolidation prevents both. It's one of the highest-ROI features.

Cognitive Science Provides Better Priors

I didn't invent these mechanisms. I borrowed them from decades of memory research: consolidation, decay, reconsolidation, spaced retrieval, the testing effect. Applying them to AI memory wasn't novel—but it was effective.

Open Questions

How do you measure memory "usefulness" beyond recall frequency? Frequency isn't the whole story—some important memories (passwords, emergency contacts) are rarely accessed.

Can uncertainty tracking guide active learning? If the system knows it's uncertain, should it ask clarifying questions?

What's the right decay curve for professional vs personal contexts? Work information might need faster decay than personal relationships.

Certainty-weighted retrieval: Future enhancement where final_score = semantic_score × certainty × importance × recency_boost.

Conclusion

Engram demonstrates that applying cognitive science principles to AI memory improves conversational coherence. The system isn't perfect—decay tuning needs work, certainty-weighted retrieval isn't implemented yet—but the core hypothesis holds: memory mechanisms from cognitive science translate directly to production AI systems with measurable benefits.

Back to all articles