Given the LLM landscape is rapidly changing, providing a good default
set of options should help reduce decision fatigue to get started
Improve initialization flow during first run
- Set Google, Anthropic Chat models too
Previously only Offline, Openai chat models could be set during init
- Add multiple chat models for each LLM provider
Interactively set a comma separated list of models for each provider
- Auto add default chat models for each provider in non-interactive
model if the {OPENAI,GEMINI,ANTHROPIC}_API_KEY env var is set
- Do not ask for max_tokens, tokenizer for offline models during
initialization. Use better defaults inferred in code instead
- Explicitly set default chat model to use
If unset, it implicitly defaults to using the first chat model.
Make it explicit to reduce this confusion
Resolves#882
This should configure Khoj with decent default configurations via
Docker and avoid needing to configure Khoj via admin page to start
using dockerized Khoj
Update default max prompt size set during khoj initialization
as online chat model are cheaper and offline chat models have larger
context now
- Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests
- Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same
- Partition configuration for indexing local data based on user accounts
- Store indexed data in an underlying postgres db using the `pgvector` extension
- Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time
- Apply filters using SQL queries
- Start removing many server-level configuration settings
- Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB.
- Update the Docker image and docker-compose.yml to work with the new application design
* Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions
* Move configure_search method into indexer
* Add conditional installation for gpt4all
* Add hint to go to localhost:42110 in the docs. Addresses #477
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.
Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Cleaning up that code will reduce number of content types for khoj to
manage.
- That is, sample_config.yml is renamed to khoj_sample.yml
- This makes the application config filename less generic,
more easily identifiable with the application
- Update docs, app accordingly
- Keeps directory paths consistent between host and container volumes
- Consistency simplifies documentation and updates required to setup
sample_config.yml for local installation
- Mount the local directory to /app
- Reformat the file paths to generically indicate what their purpose is
- Add comments to assist users who wasnt to modify properties themselves
- Add a Dockerfile which uses an Ubuntu image to install relevant dependencies (exif) and uses a Miniconda image for setting up/reusing the conda environment
- Add a dummy docker-compose file