Currently, the personality of the agent is only included in the final response that it returns to the user. Historically, this was because models were quite bad at navigating the additional context of personality, and there was a bias towards having more control over certain operations (e.g., tool selection, question extraction).
Going forward, it should be more approachable to have prompts included in the sub tasks that Khoj runs in order to response to a given query. Make this possible in this PR. This also sets us up for agent creation becoming available soon.
Create custom agents in #928
Agents are useful insofar as you can personalize them to fulfill specific subtasks you need to accomplish. In this PR, we add support for using custom agents that can be configured with a custom system prompt (aka persona) and knowledge base (from your own indexed documents). Once created, private agents can be accessible only to the creator, and protected agents can be accessible via a direct link.
Custom tool selection for agents in #930
Expose the functionality to select which tools a given agent has access to. By default, they have all. Can limit both information sources and output modes.
Add new tools to the agent modification form
## Overview
Add user country code as context for doing online search with serper.dev API.
This should find more user relevant results from online searches by Khoj
## Details
### Major
- Default to using system clock to infer user timezone on js clients
- Infer country from timezone when only timezone received by chat API
- Localize online search results to user country when location available
### Minor
- Add `__str__` func to `LocationData` class to deduplicate location string generation
Make all the scroll actions just use requestAnimationFrame instead of
setTimeout. It better aligns with browser rendering loop, so better
for UX changes than setTimeout
Using system clock to infer user timezone on clients makes Khoj
more robust to provide location aware responses.
Previously only ip based location was used to infer timezone via API.
This didn't provide any decent fallback when calls to ipapi failed or
Khoj was being run in offline mode
Timezone is easier to infer using clients system clock. This can be
used to infer user country name, country code, even if ip based
location cannot be inferred.
This makes using location data to contextualize Khoj's responses more
robust. For example, online search results are retrieved for user's
country, even if call to ipapi.co for ip based location fails
Get country code to server chat api from i.p location check on clients.
Use country code to get country specific online search results via Serper.dev API
Previously the location string from location data was being generated
wherever it was being used.
By adding a __str__ representation to LocationData class, we can
dedupe and simplify the code to get the location string
- Use tabs for GPU/CPU type khoj being install on
- Update CMAKE flags to use to install Khoj with correct GPU support
Previous flags used DLLAMA, this has been updated to use DGGML now
in llama.cpp
Remove unnecessary "Inferred Query" heading prefix to image generation prompt
used by Khoj. The inferred query in chat message has a heading of it's
own, so avoid two headings for the image prompt
The problem was the tool tip was visible on hover, but it was slow, so before the tool tip popped up, the user would click on the button and this stopped the tool tip from popping up.
So i reduced the popup delay to 10ms. now as soon as user hovers over the button, they will see that its a feature coming soon!
Improve Scrolling on Chat page of Web app
- Details
1. Only auto scroll Khoj's streamed response when scroll is near bottom of page
Allows scrolling to other messages in conversation while Khoj is formulating and streaming its response
2. Add button to scroll to bottom of the chat page
3. Scroll to most recent conversation turn on conversation first load
It's a better default to anchor to most recent conversation turn (i.e most recent user message)
4. Smooth scroll when Khoj's chat response is streamed
Previously the scroll would jitter during response streaming
5. Anchor scroll position when fetch and render older messages in conversation
Allow users to keep their scroll position when older messages are fetched from server and rendered
Resolves#758
* Update the conversation_id primary key field to be a uuid
- update associated API endpoints
- this is to improve the overall application health, by obfuscating some information about the internal database
- conversation_id type is now implicitly a string, rather than an int
- ensure automations are also migrated in place, such that the conversation_ids they're pointing to are now mapped to the new IDs
* Update client-side API calls to correctly query with a string field
* Allow modifying of conversation properties from the chat title
* Improve drag and drop file experience for chat input area
* Use a phosphor icon for the copy to clipboard experience for code snippets
* Update conversation_id parameter to be a str type
* If django_apscheduler is not in the environment, skip the migration script
* Fix create automation flow by storing conversation id as string
The new UUID used for conversation id can't be directly serialized.
Convert to string for serializing it for later execution
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
## Improve
- Intelligently initialize a decent default set of chat model options
- Create non-interactive mode. Auto set default server configuration on first run via Docker
## Fix
- Make RapidOCR dependency optional as flaky requirements causing docker build failures
- Set default openai text to image model correctly during initialization
## Details
Improve initialization flow during first run to remove need to configure Khoj:
- Set Google, Anthropic Chat models too
Previously only Offline, Openai chat models could be set during init
- Add multiple chat models for each LLM provider
Interactively set a comma separated list of models for each provider
- Auto add default chat models for each provider in non-interactive
model if the `{OPENAI,GEMINI,ANTHROPIC}_API_KEY' env var is set
- Used when server run via Docker as user input cannot be processed to configure server during first run
- Do not ask for `max_tokens', `tokenizer' for offline models during
initialization. Use better defaults inferred in code instead
- Explicitly set default chat model to use
If unset, it implicitly defaults to using the first chat model.
Make it explicit to reduce this confusion
Resolves#882
The chat model initialize interaction flow is fairly similar across
the chat model providers.
This should simplify adding new chat model providers and reduce
chances of bugs in the interactive chat model initialization flow.
This is an initial pass to add documentation for all the knobs
available on the Khoj Admin panel.
It should shed some light onto what each admin setting is for and how
they can be customized when self hosting.
Resolves#831
- Improve Self Hosting Docker Instructions
- Ask to Install Docker Desktop to not require separate
docker-compose install and unify the instruction across OS
- To Self Host on Windows, ask to use Docker Desktop with WSL2 backend
- Use nested Tab grouping to split Docker vs Pip Self Host Instructions
- Reduce Self Host Setup Steps in Documentation after code simplification
- First run now avoids need to configure Khoj via admin panel
- So move the chat model config steps into optional post setup
config section
- Improve Instructions to Configure chat models on First Run
- Compress configuring chat model providers into a Tab Group
- Add Documentation for Remote Access under Advanced Self Hosting
Given the LLM landscape is rapidly changing, providing a good default
set of options should help reduce decision fatigue to get started
Improve initialization flow during first run
- Set Google, Anthropic Chat models too
Previously only Offline, Openai chat models could be set during init
- Add multiple chat models for each LLM provider
Interactively set a comma separated list of models for each provider
- Auto add default chat models for each provider in non-interactive
model if the {OPENAI,GEMINI,ANTHROPIC}_API_KEY env var is set
- Do not ask for max_tokens, tokenizer for offline models during
initialization. Use better defaults inferred in code instead
- Explicitly set default chat model to use
If unset, it implicitly defaults to using the first chat model.
Make it explicit to reduce this confusion
Resolves#882
This should configure Khoj with decent default configurations via
Docker and avoid needing to configure Khoj via admin page to start
using dockerized Khoj
Update default max prompt size set during khoj initialization
as online chat model are cheaper and offline chat models have larger
context now
RapidOCR depends on OpenCV which by default requires a bunch of GUI
paramters. This system package dependency set (like libgl1) is flaky
Making the RapidOCR dependency optional should allow khoj to be more
resilient to setup/dependency failures
Trade-off is that OCR for documents may not always be available and
it'll require looking at server logs to find out when this happens