Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of an agent.
The script uses Gemini as an LLM Judge to evaluate Khoj responses to
the FRAMES benchmark prompts against the ground truth provided by it.
Overview
---
- Put context into separate user message before sending to chat model.
This should improve model response quality and truncation logic in code
- Pass online context from chat history to chat model for response.
This should improve response speed when previous online context can be reused
- Improve format of notes, online context passed to chat models in prompt.
This should improve model response quality
Details
---
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
* Create explicit flow to enable the free trial
The current design is confusing. It obfuscates the fact that the user is on a free trial. This design will make the opt-in explicit and more intuitive.
* Use the Subscription Type enum instead of hardcoded strings everywhere
* Use length of free trial in the frontend code as well
Given the LLM landscape is rapidly changing, providing a good default
set of options should help reduce decision fatigue to get started
Improve initialization flow during first run
- Set Google, Anthropic Chat models too
Previously only Offline, Openai chat models could be set during init
- Add multiple chat models for each LLM provider
Interactively set a comma separated list of models for each provider
- Auto add default chat models for each provider in non-interactive
model if the {OPENAI,GEMINI,ANTHROPIC}_API_KEY env var is set
- Do not ask for max_tokens, tokenizer for offline models during
initialization. Use better defaults inferred in code instead
- Explicitly set default chat model to use
If unset, it implicitly defaults to using the first chat model.
Make it explicit to reduce this confusion
Resolves#882
# Summary of Changes
* New UI to show preview of image uploads
* ChatML message changes to support gpt-4o vision based responses on images
* AWS S3 image uploads for persistent image context in conversations
* Database changes to have `vision_enabled` option in server admin panel while configuring models
* Render previously uploaded images in the chat history, show uploaded images for pending msgs
* Pass the uploaded_image_url through to subqueries
* Allow image to render upon first message from the homepage
* Add rendering support for images to shared chat as well
* Fix some UI/functionality bugs in the share page
* Convert user attached images for chat to webp format before upload
* Use placeholder to attached image for data source, response mode actors
* Update all clients to call /api/chat as a POST instead of GET request
* Fix copying chat messages with images to clipboard
TLDR; Add vision support for openai models on Khoj via the web UI!
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
GPT-4o-mini is cheaper, smarter and can hold more context than
GPT-3.5-turbo. In production, we also default to gpt-4o-mini, so makes
sense to upgrade defaults and tests to work with it
- Major
- Improve doc search actor performance on vague, random or meta questions
- Pass user's name to document and online search actors prompts
- Minor
- Fix and improve openai chat actor tests
- Remove unused max tokns arg to extract qs func of doc search actor
### Overview
Support exclude file filter in user search queries
### Details
- All of the exclude file filter terms need to be satisfied
- Any one of the include file filter terms should be satisfied
### Example
- **Search Query**: *what happened yesterday? -file:"tasks.org" -file:"work.md" file:"diary.org" file:"journal.org*
- **Behavior**: Query will try find relevant notes in any of `journal.org` or `diary.org` and not in `tasks.org` and not in `work.md`
### Details
* Add support for exclusion file filters
* Translate file filter to valid Django DB entry filter regex
* Exclude all files when multiple exclude file filter in query
Previously we were applying an "Or" filter, which would exclude any
file mentioned in a query with multiple exclude file filter.
This is not what we naturally mean when we ask excluding a file in a query
* Rename, rearrange, deduplicate and add file filter tests
Closes#728
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
- Use new form of passing doc references to now passing chat actor
test
- Fix message list generation from conversation logs provided
Strangely the parent conversation_log gets passed down to
message_to_log func when the kwarg is not explicitly specified
Pull out /api/configure/content API endpoints into /api/content to
allow for more logical organization of API path hierarchy
This should make the url more succinct and API request intent more
understandable by using existing HTTP method semantics along with the
path.
The /configure URL path segment was either
- redundant (e.g POST /configure/notion) or
- incorrect (e.g GET /configure/files)
Some example of naming improvements:
- GET /configure/types -> GET /content/types
- GET /configure/files -> GET /content/files
- DELETE /configure/files -> DELETE /content/files
This should also align, merge better the the content indexing API
triggered via PUT, PATCH /content
Refactor Flow
1. Rename /api/configure/types -> /api/content/types
2. Rename /api/configure -> /api
3. Move /api/content to api_content from under api_config
- Remove unused full_corpus boolean. The full_corpus=False code path
wasn't being used (accept for in a test)
- The full_corpus=True code path used was ignoring file deletion
requests sent by clients during sync. Unclear why this was done
- Added unit test to prevent regression and show file deletion by
clients during sync not ignored now
- This utilizes PUT, PATCH HTTP method semantics to remove need for
the "regenerate" query param and "/update" url suffix
- This should make the url more succinct and API request intent more
understandable by using existing HTTP method semantics
- Split once by heading (=first_non_empty) to extract current section body
Otherwise child headings with same prefix as current heading will
cause the section split to go into infinite loop
- Also add check to prevent getting into recursive loop while trying
to split entry into sub sections
Update offline, openai chat actor, director tests to not require
Serper to run the online command tests
Update documentation for self-hosted online search to mention no setup
is required by default. But improvements can be made by using
Serper.dev or Olostep
- Added support for uploading .jpeg, .jpg, and .png files to Khoj from Web, Desktop app
- Updating indexer to generate raw text and entries using RapidOCR
- Details
* added support for indexing images via ocr
* fixed pyproject.toml
* Update src/khoj/processor/content/images/image_to_entries.py
Co-authored-by: Debanjum <debanjum@gmail.com>
* Update src/khoj/processor/content/images/image_to_entries.py
Co-authored-by: Debanjum <debanjum@gmail.com>
* removed redudant try except blocks
* updated desktop js file to support image formats
* added tests for jpg and png
* Fix processing for image to entries files
* Update unit tests with working image indexer
* Change png test from version verificaition to open-cv verification
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Co-authored-by: sabaimran <narmiabas@gmail.com>
* Uses entire file text and summarizer model to generate document summary.
* Uses the contents of the user's query to create a tailored summary.
* Integrates with File Filters #788 for a better UX.
# Major
- Disambiguate Text output mode to disambiguate from Default data source lookup
- Fix showing headings in intermediate step in generating chat response
- Remove "Path" prefix from org ancestor heading in compiled entry
# Minor
- Fix OpenAI chat actor, director unit tests
- The task scheduling actor was having trouble calculating the
timezone. Giving the actor a scratchpad to improve correctness by
thinking step by step
- Add more examples to reduce chances of the inferred query looping to
create another reminder instead of running the query and sharing
results with user
- Improve task scheduling chat actor test with more tests and
by ensuring unexpected words not present in response
There's a difference between running a scheduled task and notifying
the user about the results of running the scheduled task.
Decide to notify the user only when the results of running the
scheduled task satisfy the user's requirements.
Use sync version of send_message_to_model_wrapper for scheduled tasks