Reuse Search Models across Content Types to reduce Memory Consumption
- Memory consumption now only scales with search models used, not with content types.
Previously each content type had it's own copy of the search ML models.
That'd result in 300+ Mb per enabled text content type
- Split model state into 2 separate state objects, `search_models` and `content_index`.
This allows loading text_search and image_search models first
and then reusing them across all content_types in content_index
- The change should cut down memory utilization quite a bit for most users.
I see a >50% drop in memory utilization on my Khoj instance.
But this will vary for each user based on the amount of content indexed vs number of plugins enabled.
- This change does not solve the RAM utilization scaling with size of the index,
as the whole content index is still kept in RAM while Khoj is running
Should help with #195, #301 and #303
Wrap acquire/release locks in try/catch/finally when updating content
index and search models to prevent lock not being released on error
and causing a deadlock
* Add additional telemetry in order to understand which data sources are the most useful
* Make actions side by side in the configuration page
* Restore main run command
* Update links to point to wiki pages for Github, Notion integrations
* Stanardize nomenclature of the api_type to use _config suffix
Remove header fields that aren't actually helpful for understanding config usage
- Memory consumption now only scales with search models used, not with
content types as well. Previously each content type had it's own
copy of the search ML models. That'd result in 300+ Mb per enabled
content type
- Split model state into 2 separate state objects, `search_models' and
`content_index'.
This allows loading text_search and image_search models first and then
reusing them across all content_types in content_index
- This should cut down memory utilization quite a bit for most users.
I see a ~50% drop in memory utilization.
This will, of course, vary for each user based on the amount of
content indexed vs number of plugins enabled
- This does not solve the RAM utilization scaling with size of the index.
As the whole content index is still kept in RAM while Khoj is running
Should help with #195, #301 and #303
My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.
- Provide more details on what clicking configure, initialize buttons
or changing the results count slider does
- This shows up on user hovering over those buttons
* For the demo instance, re-instate the scheduler, but infrequently for api updates
- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session
* Conditionally skip updating the index altogether if it's a demo isntance
* Add backend support for Notion data parsing
- Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token
- Make corresponding updates to the default config, raw config to support the new notion addition
* Add corresponding views to support configuring Notion from the web-based settings page
- Support backend APIs for deleting/configuring notion setup as well
- Streamline some of the index updating code
* Use defaults for search and chat queries results count
* Update pagination of retrieving pages from Notion
* Update state conversation processor when update is hit
* frequency_penalty should be passed to gpt through kwargs
* Add check for notion in render_multiple method
* Add headings to Notion render
* Revert results count slider and split Notion files by blocks
* Clean/fix misc things in the function to update index
- Use the successText and errorText variables appropriately
- Name parameters in function calls
- Add emojis, woohoo
* Clean up and further modularize code for processing data in Notion
* Add langchain static files and pytorch metadata to Khoj native app
* Add pillow static files, metadata & hidden imports to Khoj native app
* Fix path to web interface static files on Khoj native app
* Add tiktoken hidden imports to make chat work from Khoj native app
* Fix Khoj native app to run with GUI mode enabled
This got broken when we moved from using the --no-gui flag to using
--gui in https://github.com/khoj-ai/khoj/pull/263
* Update the /chat endpoint to conditionally support streaming
- If streams are enabled, return the threadgenerator as it does currently
- If stream is disabled, return a JSON response with the response/compiled references separated out
- Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian
- Rename chat/init/ to chat/history
* Update khoj.el to use the /history endpoint
- Update corresponding unit tests to use stream=true
* Remove & from call to /chat for obsidian
* Abstract functions out into a helpers.py file and clean up some of the error-catching
Deprecate usage of the older gpt3 models in-place of the newer chat
based models
- text-davinci-003 is only 50% cheaper than gpt4 and less reliable for
question extraction
- Using gpt-3.50turbo for summarization should reduce cost of chat
- Keep conversation.chat_session as a list instead of a string
- Update completion_with_backoff func to use ChatML format
- Fix testing gpt converse method after it started streaming responses
- Pass stop in model_kwargs dictionary and api key in openai_api_key
parameter to chat completion methods. This should resolve the arg
warning thrown by OpenAI module
The previous json parsing was failing to handle questions with date
filters
Fix the chat actor tests to run without throwing error with freezegun
complaining about importing transformers.local_llama model
Remove quote escapes from date filter examples provided to
extract_questions actor
- Before
Only the search interface had the results count configuration option
- After
- The results count is set on the settings page instead of the
search page
- Both search and chat can use the configured results count instead
of just search
* For the demo instance, re-instate the scheduler, but infrequently for api updates
- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session
* Conditionally skip updating the index altogether if it's a demo isntance
- What
- Stream chat responses from OpenAI API to Web, Obsidian clients
- Implement using a callback function which manages a queue where new tokens can be placed as they come on. As the thread is read from, tokens are removed.
- When the final token has been processed, add the `compiled_references` to the queue to be rendered by the `chat` client
- When the thread has been closed, save the accumulated conversation log in the user's history using a `partial func`
- Incrementally decode tokens on the front end and add them as they appear from the streamed response
- Why
This significantly reduces perceived latency and OpenAI API request timeouts for Chat
Closes https://github.com/khoj-ai/khoj/issues/257
- I needed to installed node-fetch to accomplish this, as the built-in request object from Obsidian doesn't seem to support streaming and the built-in fetch object is very sensitive to any and all cross origin requests
Removing unused content types will reduce khoj code to manage
- 0f993b3 Drop support for Ledger as a separate content type
Khoj will soon get a generic text indexing content type in Index plain text files #237.
This along with a file filter should suffice for searching through Ledger transactions
- c9db532 Remove unused org-music as an indexable content type from Khoj
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.
Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Cleaning up that code will reduce number of content types for khoj to
manage.
- Add one-click disablement
- Remove fields that probably don't need to be edited (our implementation details)
- Add a green tick if a given field is configured
- In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network)
- Prevent re-indexing for Github data if this is a demo instance
- Fix up some issues with the CSS which made settings page small in mobile
- In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page
- Break out of rendering list if at end of org block in org.js
- This would previous hang rendering results in web interface
Should try fix this upstream in org.js as well
- Previously Khoj could only support Python upto 3.10 due to pytorch.
But lots of folks had python 3.11 installed by default on their machines.
This required installing python 3.10 and dealing with virtual envs.
With Torch >= 2.0.1 now able to support python 3.11, at least one
class of installation troubles for Khoj should drop. See
https://github.com/pytorch/pytorch/issues/86566 for reference
- Preliminary testing indicates using the new torch 2.x may reduce
search time by 25% (from 80ms to 60ms on Mac M1)
- Update Docs to not require mentioning python <=3.10 required
- Update Github test workflow to run khoj tests with python 3.11 too
- Use a request session to reduce the overhead of setting up a new connection with the Github URL each request
- Use the streaming feature for the REST api to reduce some of the memory footprint
- Set image_search.query to async to use it with multi-threading
This is same as text_search.query being set to an async method
- Exit search early if no search_model is defined in state.model
- So when searching across content types (with content-type = "all")
org-mode results get rendered differently than markdown, PDF etc. results
- Set div class for each result separately instead of a single uber div
for styling. This allows styling div of each result based on the
content-type of that result
- No need to create placeholder "all" content type on web interface as
server is passing an all content type by itself
- Add cards to configure each of the Github repositories
- Fix a bug in the API which caused all other settings to be wiped when updating one of the content types
- Provide an error message to the user if they have a misconfiguration in their chat settings
- Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view
- Support indexing a list of repositories
- Show success/failure status message much closer to the save button
Previously status message was shown on top of the page, which wasn't
always in view and wasn't easily seen
- Improve the status message to more clearly show next steps on success