- Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests
- Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes
- Chat actors are narrow agents (prompt + ML model)
Chat actors are different from the Chat director. who orchestrates
the narrow actor agents to synthesize final response to the user
- Test Chat Actor Capabilities
1. Answer from retrieved notes
2. Answer from chat history
3. Answer general questions
4. Carry out multi-turn conversation
5. Say don't know when answer not in provided context
6. Answers that require current date awareness
7. Date-aware aggregation across multiple different notes
8. Ask clarification questions if no unambiguous answer in provided context
This test is expected to fail as the chat is not capable of doing
this consistently yet. But having the test allows assessing chat quality
- Use Openai API Key from OPENAI_API_KEY environment variable
- Gitignore .env file, python virtualenv directory
Put OpenAI API Key in .env file to run chatbot tests via vscode
The .env file is default location for importing env vars
- Why
The khoj pypi packages should be installed in `khoj' directory.
Previously it was being installed into `src' directory, which is a
generic top level directory name that is discouraged from being used
- Changes
- move src/* to src/khoj/*
- update `setup.py' to `find_packages' in `src' instead of project root
- rename imports to form `from khoj.*' in complete project
- update `constants.web_directory' path to use `khoj' directory
- rename root logger to `khoj' in `main.py'
- fix image_search tests to use the newly rename `khoj' logger
- update config, docs, workflows to reference new path `src/khoj'
- Add Khoj in Obsidian Demo
- Update Interfaces Screenshot to include Obsidian Plugin Screenshot
- Update .gitignore to ignore obsidian plugin ignorelist
Section the .gitignore for better readability
- Update the Setup, Usage instructions to include information about
the Obsidian plugin
- That is, sample_config.yml is renamed to khoj_sample.yml
- This makes the application config filename less generic,
more easily identifiable with the application
- Update docs, app accordingly
- Allow viewing image results returned by Semantic Search.
Until now there wasn't any interface within the app to view image
search results. For text results, we at least had the emacs interface
- This should help with debugging issues with image search too
For text the Swagger interface was good enough
- Put test data for each content type into separate directories
- Makes config.yml for docker and local host consistent
- Prepending tests to /data in sample_config.yml makes application
run on local host using test data
- Allows mounting separate volume for each content type in docker-compose
- Ignore gitignore to only add tests content, not generated models or embeddings