- Store scheduled job state in Postgres so job schedules persist
across app restarts
- Use Process Locks to only allow single worker to process a given job
type. This prevents duplicating job runs across all workers
- Detect when user intends to schedule a task, aka reminder
Add new output mode: reminder. Add example of selecting the reminder
output mode
- Extract schedule time (as cron timestring) and inferred query to run
from user message
- Use APScheduler to call chat with inferred query at scheduled time
- Handle reminder scheduling from both websocket and http chat requests
- Support constructing scheduled task using chat history as context
Pass chat history to scheduled query generator for improved context
for scheduled task generation
Previously the make delete API response failed, after deleting token.
Required a page refresh to see that the API token was actually gone.
This was happening because the response type of the delete token API
endpoint isn't a string, so it failed FastAPI response validation
checks.
- Allow self-hosted users to customize their open ai base url. This allows you to easily use a proxy service and extend support for other models.
- This also includes a migration that associates any existing openai chat model configuration with an openai processor configuration
- Make changing model a paid/subscriber feature
- Removes usage of langchain's OpenAI wrapper for better control over parsing input/output
- Allow passing completion args through completion_with_backoff
- Pass model_kwargs in a separate arg to simplify this
- Pass model in `model_name' kwarg from the send_message_to_model func
`model_name' kwarg is used by langchain, not `model' kwarg
- Make valid file extension checking case insensitive on Desktop app
- Skip indexing non-existent folders on Desktop app
- Pass auth headers to fix lazy load of chat messages on Desktop app
- Set chat-message height to height of content in web, desktop
Previous cross-encoder model was a few years old, newer models should
have improved in quality. Model size increases by 50% compared to
previous for better performance, at least on benchmarks
Most newer, better embeddings models add a query, docs prefix when
encoding. Previously Khoj admins couldn't configure these, so it
wasn't possible to use these newer models.
This change allows configuring the kwargs passed to the query, docs
encoders by updating the search config in the database.
Improve tool, online search, webpage links, docs search chat actor
prompts. Ensure works with hermes-2-pro and llama-3.
Be more specific about generating JSON and not saying anything else.
- Improve extract question prompts to explicitly request JSON list
- Use llama-3 chat format if HF repo_id mentions llama-3. The
llama-cpp-python logic for detecting when to use llama-3 chat format
isn't robust enough currently
* Changed the styling of the link that takes a user to the settings page into a button
* added an indicator that shows if a user is connected to the server or not
* made a class name more descriptive and also made the text in first run message more intuitive
* changed the command to install dependencies in the README.md
* changed the class name of the first run message text to be more descriptive
* added icons in the desktop UI that shows if a file is synced successfully or not
* made the link class name in the homepage more descriptive
* fixed the hover issue on status box in the chat header pane
* fixed hovering issue on status box on macOS
- User configured max tokens limits weren't being passed to
`send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
on machine
- Use min of app configured max tokens, vram based max tokens and
model context window
- User configured max tokens limits weren't being passed to
`send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
on machine
- Use min of app configured max tokens, vram based max tokens and
model context window