* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.
Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
Combine hand-written custom notes and PG essays with personal
content to bulk up notes count
Delete old documentation markdown as not a representative dataset for
application (which is more tuned for personal notes)
- What
- The Emacs and Obsidian interfaces stay in their original
directories under src/
- src/khoj now only contains code meant for pypi packaging
- Benefits
- This avoids having to update khoj MELPA, Obsidian plugin config as
the Emacs, Obsidian code is under their original directories
- It separates the code in src/khoj meant for python packaging from
code for external interfaces like Emacs and Obsidian
- Why
The khoj pypi packages should be installed in `khoj' directory.
Previously it was being installed into `src' directory, which is a
generic top level directory name that is discouraged from being used
- Changes
- move src/* to src/khoj/*
- update `setup.py' to `find_packages' in `src' instead of project root
- rename imports to form `from khoj.*' in complete project
- update `constants.web_directory' path to use `khoj' directory
- rename root logger to `khoj' in `main.py'
- fix image_search tests to use the newly rename `khoj' logger
- update config, docs, workflows to reference new path `src/khoj'
- Reason
- All clients that currently consume the API are part of Khoj
- Any breaking API changes will be fixed in clients immediately
- So decoupling client from API is not required
- This removes the burden of maintaining muliple versions of the API
- Split router.py into v1.0, beta and frontend (no-prefix) api modules
under new router package. Version tag in main.py via prefix
- Update frontends to use the versioned api endpoints
- Update tests to work with versioned api endpoints
- Update docs to mentioned, reference only versioned api endpoints
- Pillow already supports reading XMP metadata from Images
- Removes need to maintain my fork of unmaintained PyExiftool
- This also removes dependency on system Exiftool package for
XMP metadata extraction
- Add test to verify XMP metadata extracted from test images
- Remove references to Exiftool from Documentation
- Update existings code, tests to process input-filters as list
instead of str
- Test `text_to_jsonl' get files methods to work with combination of
`input-files' and `input-filters'
Resolves#84
- That is, sample_config.yml is renamed to khoj_sample.yml
- This makes the application config filename less generic,
more easily identifiable with the application
- Update docs, app accordingly
- The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1]
- It has the right mix of model query speed, size and performance on benchmarks
- On hugging face it has way more downloads and likes than the msmarco model[2]
- On very preliminary evaluation of the model
- It doubles the encoding speed of all entries (down from ~8min to 4mins)
- It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier)
[1]: https://www.sbert.net/docs/pretrained_models.html
[2]: https://huggingface.co/sentence-transformers
- Put test data for each content type into separate directories
- Makes config.yml for docker and local host consistent
- Prepending tests to /data in sample_config.yml makes application
run on local host using test data
- Allows mounting separate volume for each content type in docker-compose
- Ignore gitignore to only add tests content, not generated models or embeddings