Skip to content

Observability

Memoturn’s operability contract is a small set of latency SLOs plus one queue-depth signal (segment-shipping backlog). This page lists the targets, what the deployment ships to watch them, and the measured numbers behind them.

The targets the architecture is designed and dashboarded against:

SLOTarget
provision latency p50< 100 ms (target ~10 ms)
cold-wake p50 / p99 (≤16 MB)< 200 ms / < 1 s
hot KV read p50< 1 ms
SQL / document write p50< 5 ms
branch create/rewind p50< 50 ms
replication lag p99 (in-region)< 1 s
lease failover (kill → writes resume)< 15 s
segment-shipping backlog~0 sustained

Segment-shipping backlog is the one to alarm on: sustained backlog means object storage is not keeping up, which widens the Standard-durability RPO window (see Architecture). Durable-mode writes — MEMOTURN_DURABILITY=durable, or the per-request Memoturn-Durability header — are immune to backlog by construction: they ack only after the ship completes.

The Helm deployment design includes an optional observability subchart: kube-prometheus-stack, an OpenTelemetry collector, and Grafana dashboards with the SLO panel above. The current prototype chart ships the data plane and MinIO; the observability subchart lands with the full cell (gateway, control API, etcd) — see Deployment and the Roadmap.

Available today on every node:

  • Structured logs via tracing, filtered with RUST_LOG (default memoturnd=info,memoturn_api=info,memoturn_replication=info).
  • /healthz — the readiness probe the chart wires up.
  • txid on every read response — replication lag is directly observable from the client side by comparing writer and replica txids (see Consistency).
  • Request-surface guardrails — body cap (413), request timeout, global concurrency cap, a control-endpoint rate limit (429), and a per-database write-queue cap (429 + Retry-After), all tunable per node — see Configuration.
  • Write-pressure logs — a database whose write queue is backing up (or shedding) gets a per-database write pressure warning every 30 s with queue depth, sheds, and the group-commit coalescing factor — see Scaling.

Measured, not promised — reference points from the working prototype (in-process object store, so these are engine costs without network):

MetricTargetp50
memory ingest (typed fact, 256-dim embedding, supersession)<10 ms3.9 ms
hybrid recall over 10k memories (FTS5 + topic + ANN, rank-fused)<25 ms11.7 ms
provision database<100 ms17 µs
hot KV read / SQL write / doc insert<1 ms / <5 ms / <5 ms3 µs / 16 µs / 15 µs
segment ship (write + WAL capture + PUT)<10 ms61 µs
branch create (copy-on-write)<50 ms47 µs
cold wake (restore + open + query)<200 ms0.7 ms (+object-store RTT in prod)
10k databases provisioned93 ms (107k/s), hot pool flat

The same operations through a real cell — kind, Helm chart, auth on, MinIO as the object store, measured through kubectl port-forward (which sets a ~1.6 ms network floor):

Metricp50p99
memory ingest (typed fact, namespace token)2.81 ms7.53 ms
hybrid recall @1k memories4.08 ms5.97 ms
provision database1.61 ms3.43 ms
hot SQL write1.59 ms2.83 ms
hot KV write / read1.66 / 1.63 ms3.53 / 2.94 ms
branch create (copy-on-write)3.01 ms3.69 ms
write + segment ship (to MinIO)6.54 ms8.67 ms

The segment-ship row reflects a real object-storage PUT round-trip. On cloud object storage, expect cold wake and segment ship to gain same-region RTTs (~10–40 ms) — still inside targets.

Terminal window
# the benchmark harness behind the prototype table
cargo run --release -p memoturn-bench
# the end-to-end agent-story walkthrough against a running node
scripts/demo.sh

For the Kubernetes numbers, deploy the chart per Deployment and run the HTTP benchmark against the forwarded service:

Terminal window
python3 scripts/bench-http.py http://127.0.0.1:8080 --platform-key ... --n 200

Sources: the README and docs/deployment-proof.md in the repository.