vault_graph()’s control-file filter now works on
Windows. The previous implementation compared dirname(file)
against normalizePath(vault), but dirname() on
Windows uses forward slashes in its output while
normalizePath() defaults to backslashes, so the comparison
was always false on Windows. Control files (schema.md,
index.md, log.md) leaked into the graph, and
the “No pages in vault” guard never fired on an empty vault. Now
normalises vault with winslash = "/" so the
comparison matches dirname()’s output on both
platforms.inst/tinytest/test_vault_graph.R’s empty-vault
regression test is now between the saber-installed and
saber-graph_svg guards so it runs on machines whose saber
is still at the current CRAN release (0.3.0). Without this, the
win-builder farm was the only place exercising the bug.vault_graph() no longer crashes on Windows.
category_from_path() built a regex from the vault path; on
Windows the backslashes were interpreted as regex backreferences,
halting R CMD check on the CRAN win-builder farm with
“Invalid back reference”. The helper now strips the vault prefix by
substring and handles both / and \
separators.inst/tinytest/test_vault_graph.R had a top-of-file
exit_file("installed saber lacks graph_svg(); skipping")
that skipped the entire file when saber was at the current
CRAN release (0.3.0). The category_from_path() regression
tests are now above that guard so they run regardless of which saber is
installed.autoresearch(topic, vault, ...) runs a bounded,
package-owned research workflow into a pensar vault. R controls the
loop, source ingestion, wiki writes, indexing, and logging; model calls
are limited to structured decisions returned as JSON
(plan_queries, select_sources,
extract_claims, analyze_gaps,
plan_pages, revise_page). Multi-round gap
analysis driven by program$max_rounds. Prompt-injection
guards flag fetched source bodies. update = TRUE (default)
preserves user prose on re-runs via a revise_page model
task; the heuristic fallback appends new findings under a dated section
so prose is never lost. Default search backend uses Tavily via
TAVILY_API_KEY; default model backend uses
llm.api when any of ANTHROPIC_API_KEY,
OPENAI_API_KEY, or MOONSHOT_API_KEY is set,
with a deterministic heuristic backend so the full pipeline runs without
LLM access.init_vault(path, adopt = TRUE) adoption is now verified
against six real Obsidian vaults via
inst/tinytest/test_adopt_real.R (gated by
tinytest::at_home() and the PENSAR_TEST_VAULTS
env var). The mechanism was already covered by test_adopt.R
against synthetic directories; the new test adds real-world coverage for
bramses-highly-opinionated-vault-2023,
claude-obsidian, dusk-obsidian-vault,
kepano-obsidian, Obsidian-Vault-Structure, and
obsidian-wiki.adopt-obsidian.md documents adopt-mode
semantics and walks through the six-vault sweep.ingest_url() is now layered on
fetch_url_content() and ingest_url_content()
(both internal), so the autoresearch loop can fetch once and keep the
body in memory for evidence extraction.write_wiki_page() (internal) merges frontmatter on
update instead of clobbering: existing id,
aliases, status, related, and any
custom keys survive an update; tags are set-unioned;
caller-supplied fields replace existing values; body is always replaced.
Refuses writes into adopted vaults unless force = TRUE.
Refuses to overwrite an existing wiki file when
overwrite = FALSE.autoresearch() calls vault_commit() after
writes, matching ingest()’s pattern, so git-backed vaults
stay clean.extract_html_title() handles multi-line HTML titles via
PCRE.inst/skills/pensar/autoresearch/SKILL.md rewritten to
route research requests through autoresearch() rather than
reproducing a manual WebSearch/WebFetch/file-edit loop. The runtime
program ships as machine-readable YAML at
inst/autoresearch/program.yml, overridable by
<vault>/_research/program.yml. Architecture note at
inst/autoresearch/architecture.md.autoresearch(), the
'Claude Code' skill bundle, the seeded
'CLAUDE.md' / 'AGENTS.md' files for
'Codex' compatibility, and adopt mode for existing
'Obsidian' vaults.jsonlite, llm.api,
simplermarkdown. Vignette builder:
simplermarkdown.ingest_agent_context() now resolves
saber::agent_context() dynamically via
getExportedValue() instead of a static reference. Older
saber versions (pre-0.4, including CRAN’s current 0.3.0) that don’t
export agent_context() get a clean error message instead of
tripping R CMD check’s “Missing or unexported object” static analysis
and failing the test suite. Test gates symmetrically so it exercises
either the success path or the missing-export path depending on which
saber is installed.test_vault_graph test similarly gates on
"graph_svg" %in% getNamespaceExports("saber") so the suite
passes cleanly against CRAN saber 0.3.0 (which lacks
graph_svg). vault_graph() itself already gated
its saber::graph_svg() call; only the test needed the
matching guard.A foundation release that fixes a destructive bug in
init_vault(), introduces an adopt mode for existing
Obsidian vaults, adds a registry-based identity layer, ships per-source
manifest bookkeeping, exposes retrieval primitives, brings in URL
ingest, dedup / tag audits, an agent-context snapshot wrapper, and a
markdown skill bundle for autonomous web research.
init_vault() refuses to scaffold into directories that
already contain non-pensar files or a foreign git history. Pass
adopt = TRUE for read-only adoption (below) or
force = TRUE to scaffold anyway. The auto-commit step is
gated separately via a new commit parameter (default
NULL): commits only when the directory was pensar-owned
before scaffolding, never as a side effect of force = TRUE.
Fixes a destructive default where pointing init_vault() at
someone else’s git repo would write scaffolding and an auto-commit into
their history.schema.md as the
load-bearing marker. Top-level raw/ or wiki/
directories without schema.md are treated as foreign.vault_registry(vault, cache, refresh) builds a
data.frame with one row per page: path,
node_id (current link-resolution identity),
page_uid (stable identity from frontmatter id:
/ address:; NA otherwise), title,
aliases, type, category,
tags, sources, links_out,
system_file. Caches in a session env by default;
cache = "user" persists to
tools::R_user_dir("pensar", "cache"). Never writes inside
the vault. Cache invalidates on rename via per-file path+mtime+size
signature.find_page() and all its consumers
(outlinks(), backlinks(), lint())
resolve through the registry: exact path → page_uid →
unique node_id → ambiguous-basename warning → frontmatter
alias. Path-style wikilinks ([[Notes/Foo]]),
.md-suffix links, and #section /
#^block-id anchors all resolve correctly. System files
(schema.md, index.md, log.md,
_proposals/*) are skipped in fuzzy resolution so user pages
always win shadow conflicts.init_vault(adopt = TRUE) for opt-in read-only
adoption of existing Obsidian vaults. Writes only a minimal adopted
schema.md (adopted: true frontmatter),
log.md, and index.md if missing. No
raw//wiki/ scaffolding, no auto-commit, leaves
user content untouched. Pre-existing log.md is preserved.
update_index() and status() switch to
registry-driven enumeration for adopted vaults, grouping by frontmatter
type (falling back to category).
ingest() refuses to write into adopted vaults unless
force = TRUE. Path-disambiguated index links
([[A/Foo]] / [[B/Foo]]) when basenames
collide.manifest_path(), read_manifest(),
update_manifest(). Per-source ingest provenance plus an
opt-in address_map. Lives at
.pensar/manifest.yml. ingest() and
ingest_repo() hook into the manifest after successful
writes with a sha1: content hash. Read-only ops never touch
it. Malformed sub-fields and per-entry records degrade safely instead of
crashing..manifest.json and
.raw/.manifest.json is deferred; would require
jsonlite in Imports.search_pages(query, vault, type, in_body)
substring-matches over title / tags / aliases by default;
in_body = TRUE also scans page bodies. Returns a
matched_in column. Excludes system control files.page_context(name, vault, body_chars) returns a
structured view of one page: frontmatter, body_head, outlinks,
backlinks.related_pages(name, vault, k) ranks top-k by shared
tags + shared outlinks (canonical-path co-citation).recent_activity(vault, days) parses
log.md, newest first.ingest_url(url, vault, type, title, tags) fetches
via curl::curl_fetch_memory() (10s timeout,
follow-redirects, TLS verify on). Refuses non-2xx and content types
outside text/html, text/plain,
text/markdown, application/json,
application/xml, text/xml. HTML responses use
<title> as the page title when none is supplied.
Dedup against the manifest: same URL twice doesn’t re-fetch. Skips and
re-fetches when the recorded file has been deleted or the entry is
malformed.dedup(vault, threshold) proposes candidate
duplicate pages by combining Jaro-Winkler title similarity (60%) and
tag-set Jaccard overlap (40%). Writes to
_proposals/dedup.md. Never auto-merges.tags(vault, taxonomy) audits used tags against an
optional controlled vocabulary at _meta/taxonomy.md
(markdown bullet list). Unknown tags get near-miss suggestions via
Jaro-Winkler. Writes to _proposals/tags.md. Never
auto-renames. Explicit missing taxonomy path errors instead of silently
degrading.ingest_agent_context(agent, vault, ...) wraps
saber::agent_context() to snapshot the live agent context
(memory, project / global instructions, identity files) into the vault
as a raw/chats/ page. Saber stays in Suggests; missing
saber errors with an install hint.inst/skills/pensar/autoresearch/: SKILL.md
driving a bounded 3-round web-research loop (decompose → search/fetch →
gap fill → synthesize) and a configurable
references/program.md. The loop files results through
ingest_url(), dedups concepts with
search_pages(), suggests cross-links via
related_pages(), and refreshes the index plus log on
completion.pensar_skill_path(skill = NULL) returns the
absolute path to the bundle root or a specific skill. Symlink it into an
agent’s skill directory:
ln -s $(Rscript -e 'cat(pensar::pensar_skill_path())') \
~/.claude/skills/pensar.vault_registry(),
update_index(), status(),
backlinks(), outlinks(), lint(),
search_pages(), page_context(),
related_pages(), recent_activity()) never
write vault state. .pensar/ is reserved for vault-owned
bookkeeping; derived caches live in
tools::R_user_dir("pensar", "cache").lint() now reads tag and link data from the registry
instead of re-parsing files, and keys tag-cluster raw pages by relative
path so duplicate basenames in different folders no longer collide and
undercount clusters.outlinks() surfaces ambiguous-target warnings to
interactive callers; backlinks() and lint()
continue to use a muffling resolver helper since they iterate.<dir>/vault/schema.md at each rung, so running pensar
from a project root whose vault lives one level down (e.g.,
cornelius/vault/) resolves correctly.status() records the resolver source on the returned
pensar_status object ($source is one of
"env", "walkup", "walkup-subdir",
"option", "explicit") and surfaces it in the
print method.curl (URL ingest), digest
(registry cache key
stringdist (Jaro-Winkler for dedup and
near-miss tags). saber remains in Suggests.ingest_repo(path) writes per-repo provenance under
raw/repos/<repo>/: briefing.md (saber
digest), ast.md (saber::symbols() output), and
snapshot.md (commit-pinned metadata: SHA, origin URL,
branch, tracked file listing). Wiki pages cite them with path-style
wikilinks like [[corteza/briefing]].name_from_path() is now path-aware: files under
raw/repos/<repo>/ resolve to
<repo>/<basename>, so artifacts named
briefing.md across different repos do not collide. Files
outside raw/repos/ are unchanged.update_index() reports a new Raw: Repos
category.ingest_briefing() is deprecated; calls now warn and
delegate to ingest_repo(path, artifacts = "briefing").migrate_briefings_to_repos(vault, dry_run = TRUE)
moves legacy raw/briefings/*.md content into
raw/repos/<repo>/. Keeps the newest file per
(repo, artifact) pair, drops superseded duplicates by
default, rewrites wikilinks across wiki/*.md. The built-in
rename map handles llamaR -> corteza; pass an extended
map for other renames. Defaults to dry-run; review the plan first.raw/repos/<repo>/<artifact> layout and mark
briefings/ deprecated.vault_export() returns a canonicalized
out_dir so the path is stable across calls. On macOS
tempdir() lives under /var/... which is a
symlink to /private/var/...; normalizePath()
only resolves symlinks for paths that exist, so the first call returned
the unresolved form and the second returned the resolved form, breaking
idempotency. Re-normalizing after dir.create() fixes the
M1mac CRAN check failure.default_vault() and default_site_dir() no
longer fall back to tools::R_user_dir(). Per CRAN policy
pensar will not silently write to the user’s home filespace; if no vault
is configured via PENSAR_VAULT, walk-up
schema.md, or options("pensar.vault"), the
call errors with a setup hint. Pass vault = (or
path = for init_vault()) explicitly to write
to a one-off path. Breaking for users who relied on the implicit
~/.local/share/R/pensar/ fallback – run
use_vault('/path/to/vault') once or set
PENSAR_VAULT to restore the previous behavior.vault_export() now requires either
PENSAR_SITE_DIR or an explicit out_dir =; the
cache fallback is gone for the same reason.LLM Wiki Engine. Description tidied.
Added SystemRequirements: pandoc, git. Dropped unused
jsonlite from Suggests.@examples
block, using tempdir() / tempfile() so nothing
leaks into the user’s home filespace at example time.vault_graph() and ingest_briefing() error
messages reworded to drop the GitHub install URL.default_vault() resolution order changed so
project-local vaults beat a global .Rprofile default. New
order: PENSAR_VAULT env var > walk-up from
getwd() for a schema.md marker > the
options("pensar.vault") value set by
use_vault() > the R_user_dir() fallback.
Previously the option won over the env var, which made
PENSAR_VAULT=... ineffective once use_vault()
ran in .Rprofile. Walk-up is new: cd into a
project vault and the CLI Just Works without unsetting your global
default.vault_graph() renders the vault’s wikilink graph as
static SVG via saber::graph_svg(). Tooltips carry title,
type, date, tags, and a lede from the first meaningful body line. Broken
wikilinks appear as separate nodes. Default viewport 1600x1200 for
denser vaults.default_vault() now honors
options("pensar.vault") and the PENSAR_VAULT
environment variable before falling back to
tools::R_user_dir("pensar", "data"). Previously, the vault
path was hardcoded to the R_user_dir() path with no escape
hatch, so a nicer path like ~/wiki required passing
vault = to every call.use_vault() sets
options("pensar.vault") for the session, mirroring
hacer::use_repo().ingest_briefing() generates a saber briefing via
saber::briefing() and ingests it into the vault. Replaces
the direct cache-file read in inst/scripts/session-start.R
with a real function call, so briefings refresh on ingest instead of
depending on saber’s hook having run first.saber added to Suggests (previously coupled only via
filesystem).init_vault(),
ingest(), update_index(),
log_entry(), status(),
backlinks(), outlinks(),
show_page(), lint(), and
vault_export().