Long words (>500 characters) provide less useful context to models.
Dropping very long words allow models to create better embeddings by
passing more of the useful context from the entry to the model
- Previously `model_type' was set in the setup of each `search_type'
- All encoders were of type `SentenceTransformer'
- All cross_encoders were of type `CrossEncoder'
- Now `encoder-type' can be configured via the new `encoder_type' field
in `TextSearchConfig' under `search-type` in `khoj.yml`.
- All the specified `encoder-type' class needs is an `encode' method
that takes entries and returns embedding vectors
- Ensure all tensors are on MPS device before doing operations across them
- Background
- GPU is used by default for Khoj on MacOS now
- Needed PyTorch > 1.13.0 on Macs to use GPU, which we do now
- MPS should speed up search and indexing on MacOS
Fix usage warning for unescaped single quote in `khoj.el' docstring.
Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs
⛔ Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
### Plugin Features
- Search Obsidian notes using Khoj
*Provide Natural language search on your (markdown) notes in Obsidian Vault*
- Show search results as rendered Markdown
*Improve legibility of the results*
- Jump to selected note from search result in Khoj search modal
*Simplify seeing result within its original note context*
- Automatically configure khoj to index markdown files in current vault
*Reduce khoj setup steps for plugin users by using reasonable defaults*
- Code updates the markdown config in `khoj.yml` and triggers index update
- It can be configured by user in khoj plugin settings, if required
- Add Demo and detailed Readme for the Obsidian plugin
*Ease setup and usage. Give context about capabilities*
### Miscellaneous
- (Try) Keep a mono repo until the Khoj project is mature enough
to reduce maintainance burden
### Commits Details
- 0e39e0f Add details about the Khoj Obsidian plugin to the main Readme
- cd8b918 Add `manifest.json`, `versions.json` of Obsidian plugin to project root
- 66ccd0c Create Obsidian plugin for Khoj
- Add Khoj in Obsidian Demo
- Update Interfaces Screenshot to include Obsidian Plugin Screenshot
- Update .gitignore to ignore obsidian plugin ignorelist
Section the .gitignore for better readability
- Update the Setup, Usage instructions to include information about
the Obsidian plugin
- Obsidian provides limited support for plugins in larger repositories.
Currently, it does not have a way to specify the directory of a plugin
So it expects the plugins `manifest.json' and `versions.json' to be at
project root
- While this unnecessarily litters the codebase. It is the (current)
required tradeoff for keeping the core plugins in a mono repo
- Features
- Search using Khoj from within the Obsidian app
Allow Natural language search on your (markdown) notes in Obsidian Vault
- Show search results as rendered (instead of raw) Markdown
Improve legibility of the results
- Jump to selected note from search result in Khoj search modal
Simplify seeing result within its original note context
- Automatically configure khoj to index markdown files in current vault
Reduce khoj setup steps for plugin users by using reasonable defaults
- Code updates the markdown config in khoj.yml and triggers index update
- It can be configured by user in khoj plugin settings, if required
- Add Demo and detailed Readme for the Obsidian plugin
Ease setup and usage. Give context about capabilities
- Miscellaneous
- Trying keep a mono repo until the Khoj project is mature enough
to reduce maintainance burden
This can ease configuring khoj from the different interfaces
- Don't need to know all the (default) config used by khoj.
- Just get default config by calling the above API endpoint.
- Then modify desired portions and call POST /api/config/data to
configure khoj.
- Start khoj server (in non-GUI mode) without needing config file
already instantiated.
- But throw warning to configure khoj to use it
- This allows plugins to configure the app via the /config/data APIs
- To be used by the Khoj obsidian plugin to configure markdown content
in khoj
- c535953 Update index automatically in non GUI mode too
- 701d92e Lock the index before updating it via API or Scheduler
- 3b0783a Automate updating embeddings, search index on a hourly schedule
Resolves#106
- Poll scheduler every minute using threading.Timer
- Use 60 seconds polling interval to avoid fork bombing
- Schedule next via the same poll scheduler
- Allow clean program interrupt by running scheduler in daemon mode
- There are 3 paths to updating/setting the index (stored in state.model)
- App start
- API
- Scheduler
- Put all updates to the index behind a lock. As multiple updates path
that could (potentially) run at the same time (via API or Scheduler)
- Remove property drawer from test entry for max_words splitting test
- Property drawer is not required for the test
- Keep minimal test case to reduce chance for confusion
- Required because entries are now split by the max_word count supported
by the ML models
- This would now result in potentially duplicate hits, entries being
returned to user
- Do deduplication after ranking to get the top ranked deduplicated
results
- The instructions suggest installing khoj-assistant via pip install.
This installs the latest tagged/release version of khoj
- To match that version user should install khoj.el from MELPA stable
instead of MELPA
- Issue
ML Models truncate entries exceeding some max token limit.
This lowers the quality of search results
- Fix
Split entries by max tokens before indexing.
This should improve searching for content in longer entries.
- Miscellaneous
- Test method to split entries by max tokens
Update readme to ask user to install khoj.el from MELPA when a
pre-release version of the main khoj app is installed. Else install
khoj.el from MELPA Stable