[GENESIS]: Sunaiva research is back. Major finding — there's a **GENESIS_RLM_INTEGRATION_ULTIMATE.md** (471 lines) that describes exactly the transformation from static files to living relational semantic memory. Plus a complete **SUNAIVA_EVOLUTION_PLAN.md** (1,234 lines) with the full roadmap. Also surfaced: the **SUNAIVA_MASTER_VISION.md** — this is a $500M+ patent-protected AI OS play with 9 validation patents creating a 20-year competitive moat. Much bigger than just memory uploads. Still waiting on: - **Opus agent 1** — RLM workers deep-dive (still running, heavy) - **Opus agent 2** — Full architecture component map (still running) Once both land, I'll synthesize everything into the definitive architecture map showing: what's built, what's missing, how RLM makes it "living," and the exact build sequence. [KINAN]:

a558c4f

completed

Agent "Research RLM and living memory roadmap" completed

# Genesis Living Memory Architecture & RLM Workers: Comprehensive Research Summary ## 1. What ARE the RLM Workers? What Do They Do? The RLM (Reinforcement Learning from Memory) workers are a **5-module RLHF pipeline** located at `/mnt/e/genesis-system/AIVA/queen_outputs/rlm/`. They are designed to make AIVA Queen learn from human and AI preferences about what memory is valuable, improving memory quality over time through reinforcement learning. | Module | File | Purpose | Lines | |--------|------|---------|-------| | **Preference Learning** | `rlm_01_preference_learning.py` | Collects human preference pairs (A vs B), fits a Bradley-Terry model to predict which responses/memories humans prefer. Includes attention checks, spam filtering, active preference querying (information-theoretic selection). PostgreSQL-backed (`pl_preference_pairs` table). | ~1759 | | **Reward Model** | `rlm_02_reward_model.py` | Neural network (numpy-based MLP with GELU, layer norm, dropout) trained on preference data to predict reward scores. Adam optimizer, early stopping, Monte Carlo dropout for confidence estimates. Real-time inference with caching. PostgreSQL table `rlm_preference_pairs`. | ~2280 | | **PPO Engine** | `rlm_03_ppo_engine.py` | Full Proximal Policy Optimization training pipeline. Clipped surrogate objective, Generalized Advantage Estimation (GAE), adaptive KL divergence controller, experience replay buffer. Custom `MathOps` class (no numpy dependency). | ~1722 | | **DPO Trainer** | `rlm_04_dpo_trainer.py` | Direct Preference Optimization -- a more efficient alternative to PPO-based RLHF. Eliminates the need for a separate reward model. Supports sigmoid/hinge/IPO loss variants. Multi-format data loading (JSONL/JSON/CSV). Checkpoint save/load. | ~1392 | | **Constitutional AI** | `rlm_05_constitutional_ai.py` | Self-governance system for AIVA. 10 default constitutional principles, self-critique loops, iterative revision, red teaming (6 attack categories), harm checking (10 harm types). Keyword-based detection (no LLM evaluator connected yet). | ~1961 | **Status**: All 5 modules are **BUILT as production-quality Python code** with PostgreSQL integration via Elestio. However, they are **NOT yet connected to live memory feedback loops**. They exist as standalone components awaiting integration into the memory pipeline. **Important naming distinction**: "RLM" in the AIVA queen outputs means the RLHF pipeline above. The Knowledge Graph entity `ENT-rlm-framework-001` refers to MIT's **Recursive Language Models** framework (arXiv:2512.24601) -- a completely different concept about treating large datasets as searchable environments with sub-agent spawning. These are two separate things sharing the same acronym. --- ## 2. How Do They Make Memory a Living System vs. a Static Snapshot? The RLM workers are the **feedback mechanism** that transforms memory from static storage into a learning system. Here is how: **Static memory** (current state): Every memory gets the same surprise score (0.5), goes to the same tier, and sits there until a simple pruner (`memory_digestion.py`) deletes low-score items older than 7 days. No learning. No improvement. No adaptation. **Living memory** (target state): The RLM workers close the feedback loop: ``` Conversation → Memory Stored → Human rates memory quality (preference pairs) → RLM-01 collects preferences → RLM-02 trains reward model → Reward model scores new memories → Better routing decisions → RLM-03/04 optimize the scoring policy → Constitutional AI (RLM-05) ensures safety → Improved memory → Better conversations → More preference data → REPEAT ``` The key insight: **memory becomes living when it has a gradient signal**. The RLM workers provide that gradient by learning what humans consider valuable memory, then using that learned signal to improve memory routing, retention, and retrieval. --- ## 3. The Full Pipeline: Conversation to Living Memory ### Current Pipeline (Implemented) ``` Conversation/Input ↓ MemoryCortex.remember() [genesis_memory_cortex.py, line ~900+] ↓ Surprise Scoring [STUB: always returns 0.5] ↓ Tier Routing (based on surprise score): - DISCARD (<0.3) [never triggers - score always 0.5] - WORKING (0.3-0.5) [never triggers] - EPISODIC (0.5-0.8) [EVERYTHING goes here] - SEMANTIC (>=0.8) [never triggers] ↓ Storage to backend: - Redis (WorkingMemoryCache) [circuit breaker, adaptive TTL] - PostgreSQL (EpisodicMemoryStore) [em_episodic_memories table] - Qdrant (vector search) [via MCP] - FalkorDB (knowledge graph) [via MCP] - Supermemory (external) [via MCP] ↓ Night-Cycle Pruning [memory_digestion.py] - Deletes score < 0.4 older than 7 days - Placeholder consolidation (does nothing) ``` ### Target Pipeline (Designed in MEMORY_ARCHITECTURE_EVOLUTION.md) ``` Conversation/Input ↓ MemoryGateway (NEW) [unified entry point] ↓ 3-Layer Dedup Engine (NEW): 1. Hash dedup (exact match) 2. MinHash LSH (near-duplicate) 3. Semantic dedup (cosine similarity > 0.92) ↓ Titan Surprise Engine (NEW, replaces stub): 1. Predict expected content (based on context) 2. Compare prediction vs actual (embedding cosine distance) 3. Surprise score = prediction error ↓ VoI (Value of Information) Scoring: S = 0.30*V + 0.30*N + 0.25*I + 0.15*R (Volatility, Novelty, Impact, Recency) ↓ Intelligent Tier Routing: - DISCARD (<0.3): Noise, already known - WORKING (0.3-0.5): Redis, short TTL - EPISODIC (0.5-0.8): PostgreSQL, medium retention - SEMANTIC (>=0.8): Knowledge Graph, permanent ↓ Storage (same 5 backends) ↓ Decay Daemon (NEW, 6-hour cycles): - Ebbinghaus forgetting curve with importance floors - Memories decay unless reinforced by access - Below threshold → demote or delete ↓ Context Selector (NEW): - Knapsack-based optimization - Selects highest-value memories for context window injection - Respects token budget constraints ↓ Memory Bus (NEW, Redis pub/sub): - Cross-agent memory sharing - Agent A stores insight → Agent B receives it in real-time ↓ Decision Advisor (NEW): - First-class decision objects with outcome tracking - Decisions become memories with feedback loops ↓ RLM Preference Learning (EXISTING but unconnected): - Human rates memory quality → preference pairs - Reward model improves scoring - PPO/DPO optimize the routing policy - Constitutional AI ensures safety constraints ↓ FEEDBACK LOOP → Improved surprise scoring → Better routing → REPEAT ``` --- ## 4. What Exists vs. What Needs to Be Built ### EXISTS (Operational) | Component | File | Status | |-----------|------|--------| | MemoryCortex (orchestrator) | `core/genesis_memory_cortex.py` (1065 lines) | OPERATIONAL but hobbled by stub surprise | | WorkingMemoryCache | Same file | Redis-backed, circuit breaker, adaptive TTL | | EpisodicMemoryStore | Same file | PostgreSQL-strict, wraps PostgreSQLStore | | SemanticMemoryStore | Same file | MCP-based, queues to JSON log | | Night-cycle pruner | `core/memory_digestion.py` (62 lines) | Minimal -- just deletes old low-score items | | 5 RLM Workers | `AIVA/queen_outputs/rlm/rlm_01-05` | BUILT but not connected to live data | | 5 Storage Backends | Redis, PostgreSQL, Qdrant, FalkorDB, Supermemory | All on Elestio, all accessible | | Knowledge Graph | `KNOWLEDGE_GRAPH/` | 434+ axioms, 42+ entities, 125 relationships | | Memory Architecture Plan | `plans/MEMORY_ARCHITECTURE_EVOLUTION.md` (1612 lines) | DESIGN COMPLETE | ### NEEDS TO BE BUILT (Designed but Not Implemented) | Component | Planned File | What It Does | Priority | |-----------|-------------|--------------|----------| | **Titan Surprise Engine** | `core/titan_surprise_engine.py` | Replace 0.5 stub with real prediction-error scoring | **CRITICAL** (everything depends on this) | | **Memory Gateway** | `core/memory_gateway.py` | Unified entry point with validation, dedup, routing | HIGH | | **3-Layer Dedup Engine** | `core/memory_dedup.py` | Hash + MinHash LSH + Semantic dedup | HIGH | | **Decay Daemon** | `core/memory_decay_daemon.py` | 6-hour Ebbinghaus decay cycles | MEDIUM | | **Context Selector** | `core/memory_context_selector.py` | Knapsack optimization for context injection | MEDIUM | | **Memory Bus** | `core/memory_bus.py` | Redis pub/sub cross-agent sharing | MEDIUM | | **Decision Advisor** | `core/memory_decision_advisor.py` | Decision objects with outcome tracking | LOWER | | **RLM Integration** | Wire rlm_01-05 to live data | Connect preference learning to actual memory scoring | LOWER (needs data first) | **Total estimated new code**: ~2,400 lines across 12 new files, plus modifications to 6 existing files and 2 database migrations. --- ## 5. The Roadmap: Current State to Living Memory From `MEMORY_ARCHITECTURE_EVOLUTION.md`, the implementation plan is **4 phases over 8 weeks**: ### Phase 1: Foundation (Weeks 1-2) - Build Titan Surprise Engine (replace the 0.5 stub) - Build 3-Layer Dedup Engine - Database migrations for surprise scores and dedup hashes - **Milestone**: Memories routed to correct tiers based on real surprise scores ### Phase 2: Lifecycle (Weeks 3-4) - Build Decay Daemon (6-hour Ebbinghaus cycles) - Build Context Selector (knapsack optimization) - **Milestone**: Memories decay naturally; context windows filled optimally ### Phase 3: Social (Weeks 5-6) - Build Memory Bus (Redis pub/sub) - Build Decision Advisor - Cross-agent memory sharing - **Milestone**: Agents share knowledge in real-time; decisions tracked ### Phase 4: Intelligence (Weeks 7-8) - Wire RLM workers to live preference data - Train reward model on real memory quality ratings - Close the feedback loop (scoring improves from human input) - **Milestone**: Memory system learns and improves autonomously **Current status**: The plan is at **Phase 0** -- design is complete, implementation has not started. --- ## 6. The Titan Memory / Surprise Engine The Titan Memory concept comes from **Google DeepMind** (documented in `KNOWLEDGE_GRAPH/entities/titan_memory.jsonl`): **Core mechanism**: Surprise = prediction error between what the model expected and what actually arrived. High surprise means the information is novel and worth remembering. Low surprise means it is already known and can be discarded. **How it works in Genesis's design**: 1. **Predict** expected content based on current context 2. **Compare** prediction vs actual using embedding cosine distance 3. **Score** = magnitude of prediction error (cosine distance) 4. **Route** based on score: - High surprise (>=0.8) → Long-term/semantic memory (permanent) - Medium surprise (0.5-0.8) → Episodic memory (compressed summaries) - Low surprise (0.3-0.5) → Working memory (may be discarded) - Minimal surprise (<0.3) → Discard (already known) **Current gap**: The file `core/surprise_memory.py` is a **101-line stub that returns 0.5 for everything**. This is the single most critical blocker. With surprise always at 0.5, every memory goes to the same tier, making the entire routing system meaningless. The key innovation from Titan is that the memory module receives **gradient updates during inference**, not just during training. This means the system learns what to remember in real-time, not in batch training jobs. --- ## 7. How memory_digestion.py Works File: `/mnt/e/genesis-system/core/memory_digestion.py` (62 lines) This is a **minimal night-cycle pruning script**, not a real digestion system: ```python # Simplified logic: class MemoryDigestion: def run_nightly(self): # 1. Delete memories with score < 0.4 that are older than 7 days DELETE FROM em_episodic_memories WHERE importance_score < 0.4 AND created_at < NOW() - INTERVAL '7 days' # 2. Placeholder for semantic consolidation self._consolidate_semantic() # Does nothing (pass) ``` It connects to PostgreSQL via Elestio config, runs a single DELETE query, and has a placeholder consolidation method that does nothing. The MEMORY_ARCHITECTURE_EVOLUTION.md plan replaces this with the full **Decay Daemon** that implements Ebbinghaus forgetting curves with importance floors, access-count reinforcement, and tier demotion rather than simple deletion. --- ## 8. The Knowledge Graph's Role in Living Memory The Knowledge Graph (`KNOWLEDGE_GRAPH/`) serves as the **semantic/permanent tier** of the memory hierarchy: - **Axioms** (434+): Validated operational truths learned from evolution cycles. These are the highest-confidence memories that have been verified through Alpha Evolve's recursive improvement loop. - **Entities** (42+): Structured knowledge about systems, frameworks, and concepts (e.g., Titan Memory, RLM Framework, AIVA, voice infrastructure). - **Relationships** (125): How entities connect -- COMPLEMENTS, ENABLES, PROVIDES_MEMORY_FOR, etc. In the living memory architecture, the KG plays three roles: 1. **Permanent storage**: Memories with surprise >= 0.8 get promoted to KG entities/axioms (the semantic tier) 2. **Context enrichment**: The Context Selector draws from KG entities to inject relevant permanent knowledge into agent context windows 3. **Feedback signal**: KG relationships inform the surprise engine -- if information contradicts or extends existing KG entities, surprise is higher The KG is also where the **Bloodstream Memory** vision connects: all agents tap into the KG via MCP, drawing exactly the knowledge they need without stuffing it into their context window. --- ## Architecture Diagram: The Full Living Memory System ``` ┌─────────────────────┐ │ Conversations / │ │ Voice / Events │ └─────────┬───────────┘ │ ┌─────────▼───────────┐ │ Memory Gateway │ ← Unified entry (TO BUILD) │ (validation) │ └─────────┬───────────┘ │ ┌─────────▼───────────┐ │ 3-Layer Dedup │ ← Hash+MinHash+Semantic (TO BUILD) └─────────┬───────────┘ │ ┌─────────▼───────────┐ │ Titan Surprise │ ← Prediction error scoring (TO BUILD) │ Engine │ Currently: stub returning 0.5 └─────────┬───────────┘ │ ┌───────────────┼───────────────┐ │ │ │ ┌─────────▼──┐ ┌────────▼───┐ ┌────────▼───────┐ │ Working │ │ Episodic │ │ Semantic │ │ (Redis) │ │ (Postgres) │ │ (FalkorDB/ │ │ TTL-based │ │ Scored │ │ Qdrant/KG) │ └─────────┬──┘ └────────┬───┘ └────────┬───────┘ │ │ │ └───────────────┼───────────────┘ │ ┌─────────▼───────────┐ │ Decay Daemon │ ← Ebbinghaus curves (TO BUILD) │ (6-hour cycles) │ Currently: nightly DELETE └─────────┬───────────┘ │ ┌───────────────┼───────────────┐ │ │ │ ┌─────────▼──┐ ┌────────▼───┐ ┌────────▼───────┐ │ Context │ │ Memory │ │ Decision │ │ Selector │ │ Bus │ │ Advisor │ │ (knapsack) │ │ (pub/sub) │ │ (outcomes) │ └────────────┘ └────────────┘ └────────────────┘ (TO BUILD) (TO BUILD) (TO BUILD) │ ┌─────────▼───────────┐ │ RLM Workers │ ← BUILT but not connected │ (rlm_01-05) │ │ Preference → │ │ Reward Model → │ │ PPO/DPO → │ │ Constitutional │ └─────────┬───────────┘ │ ┌─────────▼───────────┐ │ FEEDBACK LOOP │ │ Improved scoring │ │ → Better routing │ │ → Better memory │ └─────────────────────┘ ``` --- ## Critical Path Summary The single most important action to make memory "living" is **replacing the surprise stub**. Everything else cascades from real surprise scores: 1. **Build Titan Surprise Engine** → memories route to correct tiers 2. **Build Dedup Engine** → no duplicate memories polluting the system 3. **Build Decay Daemon** → memories naturally age and consolidate 4. **Wire RLM workers** → system learns what humans value 5. **Build Memory Bus** → agents share knowledge in real-time 6. **Build Context Selector** → optimal memory injection into context windows The design is complete (`MEMORY_ARCHITECTURE_EVOLUTION.md`, 1612 lines). The RLM worker code is built (9,114 lines across 5 files). The storage backends are operational (5 systems on Elestio). The gap is **integration and the surprise engine** -- the connective tissue that turns components into a living system. total_tokens: 44083 tool_uses: 35 duration_ms: 309581 Full transcript available at: /tmp/claude-1000/-mnt-e-genesis-system/tasks/a558c4f.output [GENESIS]: RLM research is back. Here's the critical answer to your question: **Yes — the RLM workers ARE the component that makes it living.** And they're already built (9,114 lines of code across 5 modules). They're just not connected yet. **The pipeline for living memory:** ``` Conversation happens → Memory Gateway (validates, deduplicates) → Titan Surprise Engine (scores novelty: "is this new?") → Routes to correct tier (discard/working/episodic/semantic) → Decay Daemon (memories fade unless reinforced) → RLM Workers learn what YOU value (preference learning) → Feedback loop improves scoring → REPEAT ``` **The single biggest blocker:** `surprise_memory.py` is a **101-line stub that returns 0.5 for everything**. Every memory gets the same score, goes to the same tier. The entire routing system is meaningless right now. **What's built vs what's missing:** | Component | Status | |---|---| | 5 RLM Workers (preference, reward, PPO, DPO, constitutional) | **BUILT** (9,114 lines) | | Memory Cortex (orchestrator, tier routing) | **BUILT** (1,065 lines) | | 5 Storage Backends (PG, Qdrant, Redis, FalkorDB, Supermemory) | **OPERATIONAL** | | Knowledge Graph | **OPERATIONAL** (434+ axioms) | | Full Architecture Design | **COMPLETE** (1,612 line plan) | | Titan Surprise Engine | **STUB** — returns 0.5 always | | Memory Gateway + Dedup | **NOT BUILT** | | Decay Daemon | **NOT BUILT** (current: simple DELETE query) | | Context Selector | **NOT BUILT** | | Memory Bus (cross-agent sharing) | **NOT BUILT** | | RLM ↔ Live Data Wiring | **NOT CONNECTED** | **So the product tiers map to this reality:** - **Memory Vault (static, $97/mo)**: Upload → Extract → MCP read. What works TODAY. - **Living Memory (advanced, $297+/mo)**: Surprise engine + decay + RLM feedback loop + real-time learning. Needs ~2,400 lines of new code + integration across ~8 weeks. Still waiting on the third agent (full architecture component map). That'll complete the picture.