- Previously online chat actors, director tests only worked with openai.
This change allows running them for any supported onlnie provider
including Google, Anthropic and Openai.
- Enable online/offline chat actor, director in two ways:
1. Explicitly setting KHOJ_TEST_CHAT_PROVIDER environment variable to
google, anthropic, openai, offline
2. Implicitly by the first API key found from openai, google or anthropic.
- Default offline chat provider to use Llama 3.1 3B for faster, lower
compute test runs
GPT-4o-mini is cheaper, smarter and can hold more context than
GPT-3.5-turbo. In production, we also default to gpt-4o-mini, so makes
sense to upgrade defaults and tests to work with it
- This utilizes PUT, PATCH HTTP method semantics to remove need for
the "regenerate" query param and "/update" url suffix
- This should make the url more succinct and API request intent more
understandable by using existing HTTP method semantics
* Uses entire file text and summarizer model to generate document summary.
* Uses the contents of the user's query to create a tailored summary.
* Integrates with File Filters #788 for a better UX.
* Add support for using OAuth2.0 in the Notion integration
* Add notion to the admin page
* Remove unnecessary content_index and image search/setup references
* Trigger background job to start indexing Notion after user configures it
* Add a log line when a new Notion integration is setup
* Fix references to the configure_content methods
* Initial pass at backend changes to support agents
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications
* Customize default behaviors for conversations without agents or with default agents
* Use agent_id for getting correct agent
* Merge migrations
* Simplify some variable definitions, add additional security checks for agents
* Rename agent.tuning -> agent.personality
Calls by unauthenticated users were failing at API rate limiter as it
failed to access user info object. This is a bug.
API rate limiter should short-circuit for unauthenicated users so a
proper Forbidden response can be returned by API
Add regression test to verify that unauthenticated users get 403
response when calling the /chat API endpoint
- All search models are loaded into memory, and stored in a dictionary indexed by name
- Still need to add database migrations and create a UI for user to select their choice. Presently, it uses the default option
- Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder
- Update associated file paths and system references
### Overview
The parent hierarchy of org-mode entries can store important context.
This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index
### Details
- Test search uses ancestor headings as context for improved results
- Add ancestor headings of each org-mode entry to their compiled form
- Track ancestor headings for each org-mode entry in org-node parser
Resolves#85
- Upgrade FastAPI to >= latest version. Required upgrade of FastAPI.
Earlier version didn't support wrapping common query params in class
- Use per fixture app instead of a global FastAPI app in conftest
- Upgrade minimum required Django version
- Fix no notes chat director test with updated no notes message
No notes message was updated in commit 118f1143
- Update test data to add deeper outline hierarchy for testing
hierarchy as context
- Update collateral tests that need count of entries updated, deleted
asserts to be updated
- Expose ability to modify search model via Django admin interface
- Previously the bi_encoder and cross_encoder models to use were set
in code
- Now it's user configurable but with a default config generated by
default
- Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code.
- To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match.
- Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings
### ✨ New
- Use API keys to authenticate from Desktop, Obsidian, Emacs clients
- Create API, UI on web app config page to CRUD API Keys
- Create user API keys table and functions to CRUD them in Database
### 🧪 Improve
- Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality
- Only load chat model to GPU if enough space, throw error on load failure
- Show encoding progress, truncate headings to max chars supported
- Add instruction to create db in Django DB setup Readme
### ⚙️ Fix
- Fix error handling when configure offline chat via Web UI
- Do not warn in anon mode about Google OAuth env vars not being set
- Fix path to load static files when server started from project root
- Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration.
- There does _seem_ to be some regression in chat quality, which is most likely attributable to search results.
This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.
- Make most routes conditional on authentication *if anonymous mode is not enabled*. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions.
- Add a basic login page and add routes for redirecting the user if logged in
- Partition configuration for indexing local data based on user accounts
- Store indexed data in an underlying postgres db using the `pgvector` extension
- Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time
- Apply filters using SQL queries
- Start removing many server-level configuration settings
- Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB.
- Update the Docker image and docker-compose.yml to work with the new application design
GPT4all now supports gguf llama.cpp chat models. Latest
GPT4All (+mistral) performs much at least 3x faster.
On Macbook Pro at ~10s response start time vs 30s-120s earlier.
Mistral is also a better chat model, although it hallucinates more
than llama-2
This provides flexibility to use non 1st party supported chat models
- Create migration script to update khoj.yml config
- Put `enable_offline_chat' under new `offline-chat' section
Referring code needs to be updated to accomodate this change
- Move `offline_chat_model' to `chat-model' under new `offline-chat' section
- Put chat `tokenizer` under new `offline-chat' section
- Put `max_prompt' under existing `conversation' section
As `max_prompt' size effects both openai and offline chat models