Release history
What shipped in each BaseVault release, ordered newest first.
# BaseVault v0.2.0
Built from b4fb753.
Compared to v0.1.49 — a substantial release. The minor-version bump reflects two landmark changes: the ReAct chat loop is now fully wired, and local chat is functional alongside TEE (Private Cloud) as a real first-class mode. Plus the trust surface materially hardened, and a long list of chat-quality wedges shipped.
Chat is meaningfully better
This release lands the multi-hop ReAct chatbot loop alongside several direct fixes. The chat can now:
- Run in LOCAL mode — local chat is now functional. Embeddings + the sidecar's per-turn loop dispatch through LOCAL when you pick that mode.
- Do multi-step retrieval (ReAct loop) — ask, look at what came back, refine the query, look again — instead of one shot per question.
- Stream answers as they generate, instead of waiting for the full response.
- Refuse cleanly when bound to an empty corpus, instead of silently grounding on an empty store.
- Actually use the model you picked per stage (extract / entities / chat) — previous releases were silently using the same model across stages in some configs.
- Cite cleanly —
[N] bracket citations only when grounded retrieval backs them; integer-bracket refs are clickable in the answer and resolve to the actual source.
- Not parrot itself — refused-turn assistant text is excluded from history so the bot doesn't loop on its own "I don't have a corpus" output across follow-up turns.
- Not leak tool-call JSON into the chat bubble when the model emits prose+JSON+prose (mixed-shape wipe + onset re-detection).
- Dedupe facts at retrieval time, so the same fact isn't returned multiple times.
The vault dropdown also stays fresh after a pipeline run completes — newly-completed runs show up without needing manual refresh.
Trust + privacy posture hardened
- Fireworks and Chutes are gone from the production app — no longer reachable via a runtime mode switch. The production code routes through Tinfoil (Private Cloud) or LOCAL only. Eval-side testing still has them under
app/testing/ (never bundled into the .app).
- Bundle gate at release time asserts the test/eval surface never ships in the .app.
- Fail-closed chat send in Private Cloud mode when attestation is failing or in-flight (mirroring the existing run-gate). No messages leave the app while the trust contract is unproven.
- Per-step attestation logging to
app.log — when attestation hangs, you can now see which step is stuck.
- Crash-on-unknown mode at run start — a typo in your config errors out cleanly instead of silently falling back to LOCAL.
- Tinfoil HTTP wire-capture toggle in Settings → Development for trust-chain investigation (off by default).
Pipeline correctness
- Insight references in actions are now bundled at write time —
Insight [N] in action why text now resolves to the insight's title (with a clickable link in the UI), instead of the dangling positional reference.
- Vision stage in LOCAL mode resolves the right model from your config instead of using a hardcoded fallback.
- Graph edges are always stamped at embedding time; dangling edges dropped.
- Display vs embedding text in the vector store — facts/entities now carry a bare display layer alongside the enriched embedding layer (cleaner rendering in chat citations, no canonical-id slug leakage).
UI fixes
- Click a fact citation in chat → the fact view scrolls to and highlights the right fact, including for in-flight (consolidating) entities where the click used to silently no-op.
- WebKit paint-debt on fact-click navigation fixed (was making the click target appear blank momentarily on certain layouts).
Under the hood
- New per-stage diagnostics rollup with sampled high-volume call detail.
- Release-history page at basevault.ai/releases, sourced from GitHub Releases.
- Eval framework reorganized + unified across the engines (
query, phase, pipeline, chat); all eval outputs now consolidated under ~/.basevault/evals/<modality>/<run_id>/.
- Pre-release smoke checklist + workhorse pipeline test rig — workers now have a standard pre-PR behavioral test.
- Chat conversation exports from Claude.ai, Claude Code, and Codex can now be ingested as journal-equivalent corpus.
Upgrading
Auto-update should pick this up within a few minutes of opening the app. Manual download from basevault.ai.
Maintenance release — re-signed under BaseVault, Inc.'s Apple Developer ID. No functional changes from 0.1.48. The auto-updater will offer it as the latest signed build.
Fixes
- Journal dates — Day One entries now use each entry's own timezone (
timeZoneName) instead of UTC, fixing dates that were off by a day for entries written near midnight.
- Entities reliability — a schema-shaped empty extraction (
{"entities": []}) is now retried instead of being recorded as a false "empty success," so real results aren't silently dropped.
- Sources panel — reference labels use the full panel width and the
· separators no longer wrap to the start of a line.
Facts & entities
- Facts now sort newest-first, each prefixed with a formatted date.
- Duplicate facts consolidated — a fact appearing under multiple categories now shows as a single entry in the entity and facts views.
Run details
- Local-mode fix — runs now show the actual local model in use (previously surfaced cloud models).
- Embeddings observability — embeddings calls show a prompt-token estimate, grouped under a collapsible parent row.
Reliability
- Empty-extraction cache fix — a retried empty extraction is re-queried instead of being served a stale "empty success" from cache, so a real failure is no longer masked.
Under the hood
complete() cross-cutting plumbing consolidated into one chokepoint; internal eval tooling + test-fixture repairs.
What's new since v0.1.45
Local mode
- Downloaded MLX model now counts as setup-done — the Local mode picker no longer stays disabled after you've downloaded a local model.
- Mixed-mode presets removed — simpler, clearer per-step model selection.
Pipeline
- Multi-model scheduling — generic per-stage dispatch with an optional parallel multi-model option (e.g. kimi+glm), spreading high-fan-out stages across models.
- Insight numbering is consistent across the UI, the underlying records, and chat citations.
Eval tooling (internal)
- Agent-drivable perf tool:
run → judge → report, --models, parallel judging, boxed reports.
- Chatbot eval counts a key match in the retrieved grounding block as a pass, not just the answer text.
Under the hood
- Fully-green, deterministic test baseline — hermetic test config, stale-test cleanup, and fail-loud unification of per-stage dispatch.
What's new since v0.1.44
Local mode
- MLX crash on older macOS fixed — the bundled MLX runtime is now pinned to the minimum-supported macOS floor, so the app no longer crashes at launch on macOS < 15.0.
- Local picker only offered when usable — the Easy Wizard and Settings no longer present Local mode unless a local model + MLX are actually available, and Settings/Wizard now share the same readiness check.
Eval tooling (internal)
- Agent-drivable eval runner — scheduler-paced fixture groups, pluggable providers, per-cell outputs, and per-group tables.
- 4-score judge — grounding / quality / schema / combined scoring, with per-fixture custom judge instructions.
Reliability
- Large vaults no longer bog down: the entities stage no longer over-fragments into thousands of tiny model calls (~100× fewer on big corpora).
- Pipeline runs are sturdier under load + slow/failing calls — centralized retry, two-tier per-call timeouts (no indefinite hangs), reasoning auto-off after a load retry, and a reliable backup model tried before any data is dropped.
Quality
- Better default synthesis routing (dedupe → gemma, patterns → kimi) for quality + speed.
- Extraction now captures emotional/affect content, not just dry facts; empty extractions retry instead of silently passing.
Models & trust
- Removed the unusable DeepSeek model; added GLM-5.1 as a selectable model.
- When a model's secure enclave is unavailable, the app names exactly which one is down (e.g. embeddings) instead of a generic failure.
- The image-transcription reasoning toggle now actually takes effect.
Onboarding & diagnostics
- Streamlined Easy Wizard first-run for new users (name-only, no key setup).
- Run diagnostics saved on every run ending — completed, paused, cancelled, or crash.
Under the hood: eval tooling + test-suite hardening.
Pipeline quality
- New default model routing — extract + entities on gpt-oss-120b, dedupe on gemma, patterns on kimi, all with reasoning on — for better extraction coverage and synthesis quality. A restored Reset to defaults button in Settings snaps the per-stage config back to these defaults after experimenting.
- Heavily-mentioned people no longer get walls of repeated text as their description; dedupe compresses them to one clean summary.
- Insights/actions that come back empty now retry once instead of silently producing nothing.
- More reliable handling of oversized requests (cap-hit fallback routes cleanly to a larger-context model).
Chat
- Conversations show readable names ("Conversation 3 · May 21") in the picker while staying stable on disk.
- New Open Chats button in Settings opens your chat folder in Finder.
Run controls & progress
- Skipping a pending request stops it immediately instead of hanging for tens of seconds, with no skip-state flicker.
- The progress bar no longer freezes at "N/N" after pause/resume; the elapsed timer ticks smoothly.
Diagnostics
- Attestation failures now show the full trace instead of a one-line message.
Onboarding
- A bundled key + name-only Easy Wizard lets new users start without manual key entry.
What's new since v0.1.41
Pipeline
- Faster first results — extraction emits a small first batch early instead of waiting for the full corpus split, so the first facts and entities surface sooner on big vaults.
- Deterministic dedupe — same vault, same model, same outputs: dedupe now uses a total-order alias key with a pinned
PYTHONHASHSEED, removing run-to-run shuffle in entity merges.
- WhatsApp / per-doc token brake — the shared token brake is enforced as a true ceiling on WhatsApp-style per-doc splits, so chat-log vaults no longer overshoot it.
Chatbot
- Chat bar redesign — shared two-line dropdown for model/mode, clickable
[N] markers in message bodies that jump to the cited reference, and references display with dates.
- LOOKUP protocol scoped to the decision turn — fresh lookups happen only when the chatbot is deciding what to fetch, not on every follow-up; the no-reuse rule was hardened so the same chunk isn't pulled twice.
UI
- Product icon in the chatbot and header — the BaseVault mark (black rounded square + emerald dot) now sits in the chatbot bar and the landing-page header.
- Entities grouped by type — in the run-details tree, entities are bucketed by type (person, place, org, …) so the list is scannable on dense runs.
- Progress bar — embeddings as one unit — the embeddings stage is a single collective progress unit rather than per-call rows; aborted calls are excluded from the in-flight count.
- Progress bar — live running time — the "running" timer ticks live and matches the "elapsed" formatter (h/m/s) instead of freezing at snapshot time.
- Live wait time on in-flight calls — the run-details view shows each pending call's wait time live, while the call is still waiting on the model.
Internals
- Launch trace coalescing — per-line trace emission is gated, runs polling is merged into the coalescer, and per-tick trace markers are demoted; launch traces are quieter and cheaper on large vaults.
What's new since v0.1.40
Privacy
- Content-free diagnostics — shareable diagnostic exports are routed through a single guarded emitter that is, by construction, incapable of including file contents or prompt text. Earlier exports already avoided content; this release proves it structurally rather than relying on per-call discipline.
What's new since v0.1.39
Chatbot
- Conversation picker — chats now live in per-conversation directories with rename support and last-activity ordering, so you can keep multiple threads side by side.
- Less-resistant persona — questions "about the user" trigger a fresh lookup instead of being deflected; LOOKUP is the default when the answer isn't already in-context.
- Reasoning toggle now wired through — the chatbot's reasoning switch actually controls inference (it didn't before); the dead rerank toggle was removed.
- Per-message copy button — each message has its own copy action, and each resources block is labeled with the source it came from.
- Citations pinned to the run — a chat message's citations always reference the run that produced them, even after later runs change the underlying data.
- Citation parity across hops — clicking a citation highlights the entity (with fade) on any hop, and highlights the whole chunk in the source view; chunker tuned to 512/64.
RAG
- Fail-closed retriever — chunkless and resumed runs no longer silently retrieve zero chunks; the retriever now fails loudly instead of returning empty.
Reliability
- Startup "Attestation Failed" fixed — an intermittent sigstore TUF symlink race on first launch is serialized away.
Trust chain
- Attestation call sites consolidated — verification now happens at exactly three sanctioned points in the inference path; scattered ad-hoc checks were removed.
Identifiers
- Stable 4-letter IDs — run IDs are now stable across restarts, and the same scheme was extended to chats so threads have durable IDs too.