# Summary of Changes
* New UI to show preview of image uploads
* ChatML message changes to support gpt-4o vision based responses on images
* AWS S3 image uploads for persistent image context in conversations
* Database changes to have `vision_enabled` option in server admin panel while configuring models
* Render previously uploaded images in the chat history, show uploaded images for pending msgs
* Pass the uploaded_image_url through to subqueries
* Allow image to render upon first message from the homepage
* Add rendering support for images to shared chat as well
* Fix some UI/functionality bugs in the share page
* Convert user attached images for chat to webp format before upload
* Use placeholder to attached image for data source, response mode actors
* Update all clients to call /api/chat as a POST instead of GET request
* Fix copying chat messages with images to clipboard
TLDR; Add vision support for openai models on Khoj via the web UI!
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
Limit file types to sync with Khoj from Obsidian to:
- Avoid hitting per user index-able data limits, especially for folks on the Khoj cloud free tier. E.g by excluding images in Obsidian vault from being synced
- Improve context used by Khoj to generate responses
When user exceeds data sync limits. Show error notice with
- Link to web app settings page to upgrade subscription
- Link to Khoj plugin settings in Obsidian to configure file types to
sync from vault to Khoj
Previously chat stream iterator wasn't closed when response streaming
for offline chat model threw an exception.
This would require restarting the application. Now application doesn't
hang even if current response generation fails with exception
GPT-4o-mini is cheaper, smarter and can hold more context than
GPT-3.5-turbo. In production, we also default to gpt-4o-mini, so makes
sense to upgrade defaults and tests to work with it
- Background
Llama.cpp allows enforcing response as json object similar to OpenAI
API. Pass expected response format to offline chat models as well.
- Overview
Enforce json output to improve intermediate step performance by
offline chat models. This is especially helpful when working with
smaller models like Phi-3.5-mini and Gemma-2 2B, that do not
consistently respond with structured output, even when requested
- Details
Enforce json response by extract questions, infer output offline
chat actors
- Convert prompts to output json objects when offline chat models
extract document search questions or infer output mode
- Make llama.cpp enforce response as json object
- Result
- Improve all intermediate steps by offline chat actors via json
response enforcement
- Avoid the manual, ad-hoc and flaky output schema enforcement and
simplify the code