- GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all
- Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example.
- Update all setup data in conftest.py to use new client-server indexing pattern
* Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions
* Move configure_search method into indexer
* Add conditional installation for gpt4all
* Add hint to go to localhost:42110 in the docs. Addresses #477
* Remove PySide, gui option from code
* Remove pyside 6 dependency from code
* Remove workflows which build desktop applications
* Update unit tests and update line in documentation
* Remove additional references to pyinstaller, gui
* Add uninstall steps to normal uninstall instructions
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Update unit tests to fix with new application design
* Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request
* Use state.host and state.port for configuring the URL for the indexer
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
- Make Khoj ask clarifying questions when answer not in provided context
- Add default conversation command to auto switch b/w general, notes modes
- Show filtered list of commands available with the currently input text
- Use general prompt when no references found and not in Notes mode
- Test general and notes slash commands in offline chat director tests
* Store conversation command options in an Enum
* Move to slash commands instead of using @ to specify general commands
* Calculate conversation command once & pass it as arg to child funcs
* Add /notes command to respond using only knowledge base as context
This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base
* Test general and notes slash commands in openai chat director tests
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
* Store conversation command options in an Enum
* Move to slash commands instead of using @ to specify general commands
* Calculate conversation command once & pass it as arg to child funcs
* Add /notes command to respond using only knowledge base as context
This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base
* Test general and notes slash commands in openai chat director tests
* Update gpt4all tests to use md configuration
* Add a /help tooltip
* Add dynamic support for describing slash commands. Remove default and treat notes as the default type
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
* Allow indexing to continue even if there's an issue parsing a particular org file
* Use approximation in pytorch comparison in text_search UT, skip additional file parser errors for org files
* Change error of expected failure
* Add support for indexing plaintext files
- Adds backend support for parsing plaintext files generically (.html, .txt, .xml, .csv, .md)
- Add equivalent frontend views for setting up plaintext file indexing
- Update config, rawconfig, default config, search API, setup endpoints
* Add a nifty plaintext file icon to configure plaintext files in the Web UI
* Use generic glob path for plaintext files. Skip indexing files that aren't in whitelist
* Add support for starting a new line with shift-enter
* Remove useless comments. Set font-size: medium.
* Update src/khoj/interface/web/chat.html
Update the styling to have the padding, margin and line-height like before.
Co-authored-by: Debanjum <debanjum@gmail.com>
* Update src/khoj/interface/web/chat.html
Make the chat-body scroll to the bottom after resizing
Co-authored-by: Debanjum <debanjum@gmail.com>
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Previously the GUI mode (with khoj --gui or using the desktop app) would open the web interface in the users default web browser. Now the web interface is just rendered within the app itself using PyQT's Webview. This gives it a more proper app like feel
- Opens settings page on first run and landing page after in GUI mode
Previously was only opening the GUI on linux after first run as it
doesn't have a system tray
- Both the views are from the web interface but are rendered within
the app instead of the browser
* Add checksums to verify the correct model is downloaded as expected
- This should help debug issues related to corrupted model download
- If download fails, let the application continue
* If the model is not download as expected, add some indicators in the settings UI
* Add exc_info to error log if/when download fails for llamav2 model
* Simplify checksum checking logic, update key name in model state for web client
# Incoming
## Major
### Fix Prompt Size Exceeded Issue
- Fix issues related to prompt size, Closes#386. Use the correct tokenizer to calculate whether the input needs to be truncated or not.
### Improve Llama 2 Model Download
- Use the correct download link for LlamaV2 -- should have been using the small model, but was using the medium
- Add better downloading logic to retry download if it failed, Closes#379
### Fix Segmentation Fault due to Race
- Add a lock around generating chat responses from the offline model to avoid segmentation faults. Closes#367.
- Add a loading symbol to the web chat UI when the model is thinking. Closes#392
### Improve Chat Response Latency
- Improve performance of offline chat by increasing batch size (via `n_batch`) to automatically engage more cores/GPU, using smaller model and fixing prompt vs response token generation numbers. Closes#363
### Fix Fake Dialogue Continuation
- Fix formatting of user query with offline chat, this was contributing to #398
- Stop Llama 2 from Creating Fake Dialogue Continuations. Closes#398
## Minor
- Improve default message for Chat window on web when it's not configured. Include hint to use offline chat.
- Add null check in `perform_chat_checks` method
- Add offline chat director unit tests
## Performance Analysis (Time to First Token)
| | v0.10.0 | this branch |
|-|-|-|
| Query 1 | 52s | 28s |
| Query 2 | 33s| 42s |
| Query 3 | 67s| 38s|
It would previously some times start generating fake dialogue with
it's internal prompt patterns of <s>[INST] in responses.
This is a jarring experience. Stop generation response when hit <s>
Resolves#398
- Use same batch_size in extract question actor as the chat actor
- Log final location the chat model is to be stored in, instead of
it's temp filename while it is being downloaded
- Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S
- Use a proper Llama Tokenizer for counting tokens for truncation with Llama
- Add additional null checks when running
Previously the system message was getting dropped when the context
size with chat history would be more than the max prompt size
supported by the cat model
Now only the previous chat messages are dropped or the current
message is truncated but the system message is kept to provide
guidance to the chat model
* Add support for configuring/using offline chat from within Obsidian
* Fix type checking for search type
* If Github is not configured, /update call should fail
* Fix regenerate tests same as the update ones
* Update help text for offline chat in obsidian
* Update relevant description for Khoj settings in Obsidian
* Simplify configuration logic and use smarter defaults
- Configure using Offline Chat from Emacs:
- Enable, Disable Offline Chat from Emacs
- Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup
- Benefits: Offline chat models are better for privacy but not great at answering questions
* Let Offline chat override OpenAI API settings
* Download the offline model whenever offline chat is enabled
* Add progressbar for download for llamav2 model to track progress
* Change ordering of n due to switch of default processor
* Flip ordering of offline/openai checks when extracting questions from query
* Working example with LlamaV2 running locally on my machine
- Download from huggingface
- Plug in to GPT4All
- Update prompts to fit the llama format
* Add appropriate prompts for extracting questions based on a query based on llama format
* Rename Falcon to Llama and make some improvements to the extract_questions flow
* Do further tuning to extract question prompts and unit tests
* Disable extracting questions dynamically from Llama, as results are still unreliable
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
* Add docs for more organized, accessible information detailing Khoj setup
* Delete duplicated files
* Add a coverpage without enabling it. Add logo and theme
* Remove obsidian README.md
* Add plausible script to index.html via docsify
## Stabilize and Simplify Content Indexing
### Major Updates
- 9bcca43 Unify logic to update entries when indexing from scratch or incrementally
- 89c7819 Unify logic to update embeddings when indexing from scratch or incrementally
- 6a0297c Stable sort new entries when marking entries for update
- 58d86d7 Unify logic to configure server from API or on server start
- Create tests to ensure old entries, embeddings in index are unaffected on adding new entries
- Refer: 1482fd4, 7669b85, 88d1a29
- ad41ef3 Make normalization of embeddings configurable to test this in c73feeb
### Minor Updates
- 1673bb5 Add todo state to compiled form of each entry
- 6e70b91 Remove unused `dump_jsonl` helper method
- 7ad9603 Improve naming of lock
- b02323a Improve naming text search test methods
Resolves#190
Previous regenerate mechanism did not deduplicate entries with same key
So entries looked different between regenerate and update
Having single func, mark_entries_for_update, to handle both scenarios
will avoid this divergence
Update all text_to_jsonl methods to use the above method for
generating index from scratch
Reuse Search Models across Content Types to reduce Memory Consumption
- Memory consumption now only scales with search models used, not with content types.
Previously each content type had it's own copy of the search ML models.
That'd result in 300+ Mb per enabled text content type
- Split model state into 2 separate state objects, `search_models` and `content_index`.
This allows loading text_search and image_search models first
and then reusing them across all content_types in content_index
- The change should cut down memory utilization quite a bit for most users.
I see a >50% drop in memory utilization on my Khoj instance.
But this will vary for each user based on the amount of content indexed vs number of plugins enabled.
- This change does not solve the RAM utilization scaling with size of the index,
as the whole content index is still kept in RAM while Khoj is running
Should help with #195, #301 and #303
Wrap acquire/release locks in try/catch/finally when updating content
index and search models to prevent lock not being released on error
and causing a deadlock
* Add additional telemetry in order to understand which data sources are the most useful
* Make actions side by side in the configuration page
* Restore main run command
* Update links to point to wiki pages for Github, Notion integrations
* Stanardize nomenclature of the api_type to use _config suffix
Remove header fields that aren't actually helpful for understanding config usage
- Memory consumption now only scales with search models used, not with
content types as well. Previously each content type had it's own
copy of the search ML models. That'd result in 300+ Mb per enabled
content type
- Split model state into 2 separate state objects, `search_models' and
`content_index'.
This allows loading text_search and image_search models first and then
reusing them across all content_types in content_index
- This should cut down memory utilization quite a bit for most users.
I see a ~50% drop in memory utilization.
This will, of course, vary for each user based on the amount of
content indexed vs number of plugins enabled
- This does not solve the RAM utilization scaling with size of the index.
As the whole content index is still kept in RAM while Khoj is running
Should help with #195, #301 and #303
My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.
- Provide more details on what clicking configure, initialize buttons
or changing the results count slider does
- This shows up on user hovering over those buttons
* For the demo instance, re-instate the scheduler, but infrequently for api updates
- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session
* Conditionally skip updating the index altogether if it's a demo isntance
* Add backend support for Notion data parsing
- Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token
- Make corresponding updates to the default config, raw config to support the new notion addition
* Add corresponding views to support configuring Notion from the web-based settings page
- Support backend APIs for deleting/configuring notion setup as well
- Streamline some of the index updating code
* Use defaults for search and chat queries results count
* Update pagination of retrieving pages from Notion
* Update state conversation processor when update is hit
* frequency_penalty should be passed to gpt through kwargs
* Add check for notion in render_multiple method
* Add headings to Notion render
* Revert results count slider and split Notion files by blocks
* Clean/fix misc things in the function to update index
- Use the successText and errorText variables appropriately
- Name parameters in function calls
- Add emojis, woohoo
* Clean up and further modularize code for processing data in Notion
* Add langchain static files and pytorch metadata to Khoj native app
* Add pillow static files, metadata & hidden imports to Khoj native app
* Fix path to web interface static files on Khoj native app
* Add tiktoken hidden imports to make chat work from Khoj native app
* Fix Khoj native app to run with GUI mode enabled
This got broken when we moved from using the --no-gui flag to using
--gui in https://github.com/khoj-ai/khoj/pull/263
* Update the /chat endpoint to conditionally support streaming
- If streams are enabled, return the threadgenerator as it does currently
- If stream is disabled, return a JSON response with the response/compiled references separated out
- Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian
- Rename chat/init/ to chat/history
* Update khoj.el to use the /history endpoint
- Update corresponding unit tests to use stream=true
* Remove & from call to /chat for obsidian
* Abstract functions out into a helpers.py file and clean up some of the error-catching
Deprecate usage of the older gpt3 models in-place of the newer chat
based models
- text-davinci-003 is only 50% cheaper than gpt4 and less reliable for
question extraction
- Using gpt-3.50turbo for summarization should reduce cost of chat
- Keep conversation.chat_session as a list instead of a string
- Update completion_with_backoff func to use ChatML format
- Fix testing gpt converse method after it started streaming responses
- Pass stop in model_kwargs dictionary and api key in openai_api_key
parameter to chat completion methods. This should resolve the arg
warning thrown by OpenAI module
The previous json parsing was failing to handle questions with date
filters
Fix the chat actor tests to run without throwing error with freezegun
complaining about importing transformers.local_llama model
Remove quote escapes from date filter examples provided to
extract_questions actor
- Before
Only the search interface had the results count configuration option
- After
- The results count is set on the settings page instead of the
search page
- Both search and chat can use the configured results count instead
of just search
* For the demo instance, re-instate the scheduler, but infrequently for api updates
- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session
* Conditionally skip updating the index altogether if it's a demo isntance
- What
- Stream chat responses from OpenAI API to Web, Obsidian clients
- Implement using a callback function which manages a queue where new tokens can be placed as they come on. As the thread is read from, tokens are removed.
- When the final token has been processed, add the `compiled_references` to the queue to be rendered by the `chat` client
- When the thread has been closed, save the accumulated conversation log in the user's history using a `partial func`
- Incrementally decode tokens on the front end and add them as they appear from the streamed response
- Why
This significantly reduces perceived latency and OpenAI API request timeouts for Chat
Closes https://github.com/khoj-ai/khoj/issues/257
- I needed to installed node-fetch to accomplish this, as the built-in request object from Obsidian doesn't seem to support streaming and the built-in fetch object is very sensitive to any and all cross origin requests
Removing unused content types will reduce khoj code to manage
- 0f993b3 Drop support for Ledger as a separate content type
Khoj will soon get a generic text indexing content type in Index plain text files #237.
This along with a file filter should suffice for searching through Ledger transactions
- c9db532 Remove unused org-music as an indexable content type from Khoj
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.
Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Cleaning up that code will reduce number of content types for khoj to
manage.
- Add one-click disablement
- Remove fields that probably don't need to be edited (our implementation details)
- Add a green tick if a given field is configured
- In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network)
- Prevent re-indexing for Github data if this is a demo instance
- Fix up some issues with the CSS which made settings page small in mobile
- In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page
- Break out of rendering list if at end of org block in org.js
- This would previous hang rendering results in web interface
Should try fix this upstream in org.js as well
- Previously Khoj could only support Python upto 3.10 due to pytorch.
But lots of folks had python 3.11 installed by default on their machines.
This required installing python 3.10 and dealing with virtual envs.
With Torch >= 2.0.1 now able to support python 3.11, at least one
class of installation troubles for Khoj should drop. See
https://github.com/pytorch/pytorch/issues/86566 for reference
- Preliminary testing indicates using the new torch 2.x may reduce
search time by 25% (from 80ms to 60ms on Mac M1)
- Update Docs to not require mentioning python <=3.10 required
- Update Github test workflow to run khoj tests with python 3.11 too
- Use a request session to reduce the overhead of setting up a new connection with the Github URL each request
- Use the streaming feature for the REST api to reduce some of the memory footprint
- Set image_search.query to async to use it with multi-threading
This is same as text_search.query being set to an async method
- Exit search early if no search_model is defined in state.model