* Working example with LlamaV2 running locally on my machine
- Download from huggingface
- Plug in to GPT4All
- Update prompts to fit the llama format
* Add appropriate prompts for extracting questions based on a query based on llama format
* Rename Falcon to Llama and make some improvements to the extract_questions flow
* Do further tuning to extract question prompts and unit tests
* Disable extracting questions dynamically from Llama, as results are still unreliable
OpenAI conversation processor schema had updated but conftest hadn't
been updated to reflect the same.
Update conftest setup of conversation processor to fix this
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
Ensure order of new embedding insertion on incremental update
does not affect the order and value of existing embeddings when
normalization is turned off
Asymmetric was older name used to differentiate between symmetric,
asymmetric search.
Now that text search just uses asymmetric search stick to simpler name
- Current incorrect behavior:
All entries with duplicate compiled form are kept on regenerate
but on update only the last of the duplicated entries is kept
This divergent behavior is not ideal to prevent index corruption
across reconfigure and update
* Update the /chat endpoint to conditionally support streaming
- If streams are enabled, return the threadgenerator as it does currently
- If stream is disabled, return a JSON response with the response/compiled references separated out
- Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian
- Rename chat/init/ to chat/history
* Update khoj.el to use the /history endpoint
- Update corresponding unit tests to use stream=true
* Remove & from call to /chat for obsidian
* Abstract functions out into a helpers.py file and clean up some of the error-catching
- Fix testing gpt converse method after it started streaming responses
- Pass stop in model_kwargs dictionary and api key in openai_api_key
parameter to chat completion methods. This should resolve the arg
warning thrown by OpenAI module
The previous json parsing was failing to handle questions with date
filters
Fix the chat actor tests to run without throwing error with freezegun
complaining about importing transformers.local_llama model
Remove quote escapes from date filter examples provided to
extract_questions actor
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.
Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Cleaning up that code will reduce number of content types for khoj to
manage.
- Khoj chat will now respond to general queries if:
1. no relevant reference notes available or
2. when explicitly induced by prefixing the chat message with "@general"
- Previously Khoj Chat would a lot of times refuse to respond to
general queries not answerable from reference notes or chat history
- Make chat quality tests more robust
- Add more equivalent chat response options refusing to answer
- Force haiku writing to not give any preable, just the haiku
Previously filename was appended to the end of the compiled entry.
This didn't provide appropriate structured context
Test filename getting prepended as heading to compiled entry
All compiled snippets split by max tokens (apart from first) do not
get the heading as context.
This limits search context required to retrieve these continuation
entries
- Explicity split entry string by space during split by max_tokens
- Prevent formatting of compiled entry from being lost
- The formatting itself contains useful information
No point in dropping the formatting unnecessarily,
even if (say) the currrent search models don't account for it (yet)
Append originating filename to compiled string of each entry for
better search quality by providing more context to model
Update markdown_to_jsonl tests to ensure filename being added
Resolves#142
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
argument to generate_chatml_messages_with_context method