Back to Master Pack
04 · Canonical Data Model

Aevum Canonical Data Model

This document defines the authoritative system entities, their relationships, provenance rules, lifecycle states, and model constraints for Aevum 1.0. It is the single source of truth for how information exists, changes, and links inside the product.

Document Type: System Truth
Authority: Product / Architecture / Engineering
Scope: Local-First 1.0 Baseline
Applies To: Storage, Ingestion, Retrieval, Testing

1. Purpose

The canonical data model exists to prevent schema drift, duplicated semantics, and ungoverned entity growth. All modules, APIs, ingestion paths, tests, and compliance controls must conform to this model.

  • No team may invent alternative entity meanings without explicit approval.
  • No UI surface may persist state outside the canonical model without documentation and approval.
  • No test may assert behavior against hidden schema assumptions not defined here.

2. Modeling Principles

  1. One root representation per user-triggered input: no duplicate root nodes for the same first-party foreground capture.
  2. Provenance must be preserved: source type cannot be flattened for convenience.
  3. Derived intelligence is distinct from user-authored thought: system-generated inference must remain distinguishable.
  4. Lifecycle is explicit: entities must have valid creation, update, and archival rules.
  5. Deletion must be possible: data sovereignty requires that user-owned content can be removed cleanly.

3. Entity Map

User-Origin Entities

  • MemoryEntry (root thought, imported item, explicit user log)
  • PersonaProfile
  • SessionState

System-Origin Entities

  • AevumEvent
  • ProcessedEvent
  • GraveyardEvent
  • InsightNode
  • StoredPrompt

4. Core Entities

4.1 MemoryEntry

The canonical record of captured or system-derived memory content. For user-triggered capture, this is the root persisted entry.

{
  "id": "UUID",
  "semanticType": "enum",
  "value": "string",
  "source": "MemorySource",
  "confidence": "Double",
  "createdAt": "Date",
  "updatedAt": "Date",
  "qjlHash": "UInt64?",
  "semanticHashString": "String?",
  "sourceEventIDs": ["UUID"],
  "linkTargetID": "UUID?"
}

Required Rules

  • User-triggered foreground input must create exactly one root MemoryEntry.
  • Derived or linked MemoryEntry records must not pretend to be root user input.
  • `value` is human-readable content, not opaque transport state.

4.2 PersonaProfile

The persisted user persona configuration and evolving cognitive profile baseline.

{
  "id": "UUID",
  "role": "String?",
  "secondaryRole": "String?",
  "personaType": "CognitivePersonaType",
  "secondaryPersonaTypeRaw": "String?",
  "hasCompletedPersonaSelection": "Bool",
  "engagementBaseline": "Float",
  "recurringThemes": ["String"],
  "promptResponseScores": {"String": "Double"}
}
  • `role` is the primary visible persona label.
  • `secondaryRole` is optional.
  • This entity influences companion initialization and grounded response behavior.

4.3 SessionState

The current or most recent interaction state used to shape continuity and response strategy.

{
  "id": "UUID",
  "lastActivityTime": "Date",
  "currentFocus": "String?",
  "engagementLevel": "Float",
  "activeMode": "String?"
}

5. Supporting Entities

Entity Purpose Notes
AevumEventQueue/event representation for ingestion, processing, and enrichment.Supports pending/committed/error lifecycle.
ProcessedEventDurable record of completed event processing.Supports auditability and idempotency.
GraveyardEventRejected or invalid event payload archive.Used for hard validation failures.
InsightNodeSystem-derived insight generated from validated memory patterns.Must remain distinguishable from user input.
StoredPromptPrompt inventory used by proactive or guided companion flows.Not user-authored memory.

6. Provenance Model

Input Provenance

Aevum distinguishes where input came from before mapping it to stored memory source.

InputSource Meaning Canonical Use
textUser-authored typed inputDashboard, onboarding, manual capture
voiceUser-authored dictated/transcribed inputDashboard voice, capture sheet voice, audio import output
importBatchUser-triggered batch importDocuments, OCR batches, archive ingest
systemExplicit system-level action/eventHabit logs, legacy initialization, internal operational entries
deferredEnrichmentAsynchronous post-root processing payloadNon-root enrichment only

Stored Provenance

Input provenance may map to canonical storage provenance.

MemorySource Meaning
userTapUser typed / directly authored text
userVoiceUser voice/dictation/transcription root input
systemInferenceSystem-generated or system-mapped operational record

7. Lifecycle Rules

MemoryEntry Lifecycle

  1. Created as root user or system entry
  2. Optionally linked to event or parent
  3. Optionally enriched by deferred processing
  4. Optionally merged or reinforced if semantically duplicate and eligible
  5. May be deleted under user data sovereignty rules

AevumEvent Lifecycle

  1. Pending
  2. Dequeued
  3. Validated
  4. Committed or Shunted to Graveyard
  • Deferred enrichment events must never become fresh root capture events.
  • Processed events must be idempotent-safe.

8. Relationship Rules

  • A root MemoryEntry may have zero or more linked derived entries.
  • A derived system entry must link back to the root entry that caused it when applicable.
  • InsightNode must reference the memory context it was derived from.
  • PersonaProfile and SessionState influence interpretation and response, but are not themselves memory content.
  • Event-to-memory traceability must remain possible for audit and debugging.

9. Canonical Data Contracts

Contract A — Single Root Write

Each first-party user-triggered capture creates exactly one root MemoryEntry.

Contract B — Deferred Enrichment

Deferred enrichment references a root entry and may add links, metadata, or derived artifacts, but may not create a second root representation of the same capture.

Contract C — Visibility

User-triggered imports must be visible in logs immediately after root persistence.

Contract D — Source Truth

Tests and UI copy must align to canonical provenance mapping. No hidden reinterpretation is allowed.

10. Integrity Constraints

  • Required: no duplicate root node for same first-party foreground capture.
  • Required: no queue-only success path for explicit user-triggered import.
  • Required: no system-derived node may masquerade as user-authored source.
  • Required: merge/deduplication heuristics must be deterministic enough for repeatable validation.
  • Rejected if: tests rely on hidden actor/state bleed or stale provenance assumptions.

11. Retention and Deletion

All primary user-authored memory content is user-owned data. The system must support clear deletion semantics consistent with local-first privacy and future regulatory requirements.

  • User-owned MemoryEntry roots must be deletable.
  • Derived entries linked exclusively to deleted roots should be removable or invalidated according to deletion policy.
  • Operational artifacts retained for diagnostics must be governed by security/privacy policy.

12. Acceptance Criteria

  • Required: entity meanings are stable across product, code, and tests.
  • Required: provenance mapping is explicit and documented.
  • Required: canonical contracts are enforced across ingestion and enrichment paths.
  • Required: no team builds a parallel schema without approval.
  • Rejected if: UI, tests, and storage each interpret the same entity differently.