- Move green server connected dot to the bottom. Show status when
disconnected from server
- Move "New conversation" button to right of the "Conversation" title
- Center alignment of the new conversation and connection status buttons
- Overview
- Extract more structured date variants (e.g with dot(.) & slash(/) separators, 2-digit year)
- Extract some natural, partial dates as well from entries
- Capability
Add ability to extract the following additional date forms:
- Natural Dates: 21st April 2000, February 29 2024
- Partial Natural Dates: March 24, Mar 2024
- Structured Dates: 20/12/24, 20.12.2024, 2024/12/20
Note: Previously only YYYY-MM-DD ISO-8601 structured date form was extracted for date filters
- Performance
Using regexes is MUCH faster than using the `dateparser' python library
It's a little crude but gives acceptable performance for large datasets
## Benefits
- Support all GGUF format chat models
- Support more GPUs like AMD, Nvidia, Mac, Vulcan (previously just Vulcan, Mac)
- Support more capabilities like larger context window, schema enforcement, speculative decoding etc.
## Changes
### Major
- Use llama.cpp for offline chat models
- Support larger context window
- Automatically apply appropriate chat template. So offline chat models not using llama2 format are now supported
- Use better default offline chat model, NousResearch/Hermes-2-Pro-Mistral-7B
- Enable extract queries actor to improve notes search with offline chat
- Update documentation to use llama.cpp for offline chat in Khoj
### Minor
- Migrate to use NouseResearch's Hermes-2-Pro 7B as default offline chat model in khoj.yml
- Rename GPT4AllChatProcessor to OfflineChatProcessor Config, Model
- Only add location to image prompt generator when location known
- Much faster than using dateparser
- It took 2x-4x for improved regex to extracts 1-15% more dates
- Whereas It took 33x to 100x for dateparser to extract 65% - 400% more dates
- Improve date extractor tests to test deduping dates, natural,
structured date extraction from content
- Extract some natural, partial dates and more structured dates
Using regex is much faster than using dateparser. It's a little
crude but should pay off in performance.
Supports dates of form:
- (Day-of-Month) Month|AbbreviatedMonth Year|2DigitYear
- Month|AbbreviatedMonth (Day-of-Month) Year|2DigitYear
Previously we just extracted dates in YYYY-MM-DD format from content
for date filterings during search.
Use dateparser to extract dates across locales and natural language
This should improve notes returned as context when chat searches
knowledge base with date filters
Fallback to regex for date parsing from content if dateparser fails
- Limit natural date extractor capabilities to improve performance
- Assume language is english
Language detection otherwise takes a REALLY long time
- Do not extract unix timestamps, timezone
- This isn't required, as just using date and approximating dates as UTC
- When setting up the default agent, configure every conversation that doesn't have an agent to use the Khoj agent
- Fix reverse migration for the locale removal migration
Previously we were skipping the extract questions step for offline
chat as default offline chat model wasn't good enough to output proper
json given the time it took to extract questions.
The new default offline chat models gives json much more regularly and
with date filters, so the extract questions step becomes useful given
the impact on latency
- How to pip install khoj to run offline chat on GPU
After migration to llama-cpp-python more GPU types are supported but
require build step so mention how
- New default offline chat model
- Where to get supported chat models from on HuggingFace
- Benefits of moving to llama-cpp-python from gpt4all:
- Support for all GGUF format chat models
- Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac)
- Supports models with more capabilities like tools, schema
enforcement, speculative ddecoding, image gen etc.
- Upgrade default chat model, prompt size, tokenizer for new supported
chat models
- Load offline chat model when present on disk without requiring internet
- Load model onto GPU if not disabled and device has GPU
- Load model onto CPU if loading model onto GPU fails
- Create helper function to check and load model from disk, when model
glob is present on disk.
`Llama.from_pretrained' needs internet to get repo info from
HuggingFace. This isn't required, if the model is already downloaded
Didn't find any existing HF or llama.cpp method that looked for model
glob on disk without internet