obsidian-knowledge-diff
Diff a document against your Obsidian vault. Feed it a book, article, paper, or report and get a prioritized reading plan showing what’s novel, what deepens your existing knowledge, and what you can skim.
How it works
- Embeds your Obsidian vault notes using a local sentence-transformer model
- Extracts and embeds chunks from a document
- Compares each chunk against your vault via cosine similarity
- Classifies chunks as novel, depth gap, or review
- Optionally suggests Obsidian note titles for novel content using an LLM
Install
Requires uv:
# That's it. uv handles all dependencies automatically.
uv run obsidian-knowledge-diff.py --helpQuick start
# 1. Generate a config file
uv run obsidian-knowledge-diff.py init --vault ~/obsidian-vault
# 2. Run a diff
uv run obsidian-knowledge-diff.py diff ~/books/some-book.pdf
# 3. Open the report
# => ./some-book-diff.mdConfiguration
Config lives at ~/.config/obsidian-knowledge-diff/config.toml (respects XDG_CONFIG_HOME).
Generate one with uv run obsidian-knowledge-diff.py init --vault /path/to/vault, or create it manually:
# Path to your Obsidian vault (required for `diff` command)
vault = "/path/to/your/obsidian-vault"
# Embedding model — runs locally, no API key needed
model = "sentence-transformers/all-MiniLM-L6-v2"
# Chat model for suggesting Obsidian note titles (requires API key)
# Set to "" or use --no-titles to disable
chat_model = "claude-3.5-haiku"
# Similarity thresholds (tuned for MiniLM-L6-v2)
novel_threshold = 0.50
review_threshold = 0.65
# Directories to skip when scanning the vault
skip_dirs = [".obsidian", ".trash", ".git"]All config values can be overridden with CLI flags (e.g. --vault, --model, --novel-threshold).
Commands
diff
Full pipeline: extract document, embed everything, compute diff, generate report.
uv run obsidian-knowledge-diff.py diff document.pdf
uv run obsidian-knowledge-diff.py diff document.pdf --vault ~/other-vault
uv run obsidian-knowledge-diff.py diff document.pdf --no-titles # skip LLM title suggestions
uv run obsidian-knowledge-diff.py diff document.pdf --keep-backmatter # don't filter index/endnotesOutput is written to ./document-name-diff.md in the current directory.
info
Preview document extraction without embedding. Shows title, author, ISBN, TOC, chunking, and back-matter filtering.
uv run obsidian-knowledge-diff.py info document.pdfclear-cache
Wipe the embedding cache at ~/.cache/obsidian-knowledge-diff/.
uv run obsidian-knowledge-diff.py clear-cacheinit
Generate a config file.
uv run obsidian-knowledge-diff.py init --vault ~/obsidian-vault
uv run obsidian-knowledge-diff.py init --force # overwrite existingModels
Embeddings run locally via sentence-transformers — no API key needed. The default all-MiniLM-L6-v2 is small and fast. You can swap in any model supported by llm-sentence-transformers.
Title suggestions use a chat model via the llm library. This requires an API key for whichever model you choose. To set up:
# For Claude (default)
llm install llm-anthropic
llm keys set anthropic
# For OpenAI
llm keys set openai
# then: --chat-model gpt-4.1-nanoOr skip title suggestions entirely with --no-titles.
Thresholds
The default thresholds (0.50 / 0.65) are tuned for all-MiniLM-L6-v2. If you switch to a different embedding model, you’ll likely need to adjust them. Run diff once and check the score histogram at the bottom of the report to calibrate.
License
MIT