Limit file types to sync with Khoj from Obsidian to:
- Avoid hitting per user index-able data limits, especially for folks on the Khoj cloud free tier. E.g by excluding images in Obsidian vault from being synced
- Improve context used by Khoj to generate responses
When user exceeds data sync limits. Show error notice with
- Link to web app settings page to upgrade subscription
- Link to Khoj plugin settings in Obsidian to configure file types to
sync from vault to Khoj
Previously chat stream iterator wasn't closed when response streaming
for offline chat model threw an exception.
This would require restarting the application. Now application doesn't
hang even if current response generation fails with exception
GPT-4o-mini is cheaper, smarter and can hold more context than
GPT-3.5-turbo. In production, we also default to gpt-4o-mini, so makes
sense to upgrade defaults and tests to work with it
- Background
Llama.cpp allows enforcing response as json object similar to OpenAI
API. Pass expected response format to offline chat models as well.
- Overview
Enforce json output to improve intermediate step performance by
offline chat models. This is especially helpful when working with
smaller models like Phi-3.5-mini and Gemma-2 2B, that do not
consistently respond with structured output, even when requested
- Details
Enforce json response by extract questions, infer output offline
chat actors
- Convert prompts to output json objects when offline chat models
extract document search questions or infer output mode
- Make llama.cpp enforce response as json object
- Result
- Improve all intermediate steps by offline chat actors via json
response enforcement
- Avoid the manual, ad-hoc and flaky output schema enforcement and
simplify the code
This is a more robust way to extract json output requested from
gemma-2 (2B, 9B) models which tend to return json in md codeblocks.
Other models should remain unaffected by this change.
Also removed request to not wrap json in codeblocks from prompts. As
code is doing the unwrapping automatically now, when present
- Allow free tier users to have unlimited chats with default chat model. It'll only be rate-limited and at the same rate as subscribed users
- In the server chat settings, replace the concept of default/summarizer models with default/advanced chat models. Use the advanced models as a default for subscribed users.
- For each `ChatModelOption' configuration, allow the admin to specify a separate value of `max_tokens' for subscribed users. This allows server admins to configure different max token limits for unsubscribed and subscribed users
- Show error message in web app when hit rate limit or other server errors