[[https://github.com/debanjum/khoj/actions/workflows/test.yml/badge.svg]] [[https://github.com/debanjum/khoj/actions/workflows/build.yml/badge.svg]] * Khoj /Allow natural language search on user content like notes, images, transactions using transformer ML models/ User can interface with Khoj via the API or [[./src/interface/emacs/khoj.el][Emacs]]. All search is done locally[[https://github.com/debanjum/khoj#miscellaneous][*]] ** Demo https://user-images.githubusercontent.com/6413477/168417719-8a8bc4e5-8404-42b2-89a7-4493e3d2582c.mp4 ** Setup *** 1. Clone #+begin_src shell git clone https://github.com/debanjum/khoj && cd khoj #+end_src *** 2. Configure - [Required] Update [[./docker-compose.yml][docker-compose.yml]] to mount your images, (org-mode or markdown) notes and beancount directories - [Optional] Edit application configuration in [[./config/sample_config.yml][sample_config.yml]] *** 3. Run #+begin_src shell docker-compose up -d #+end_src /Note: The first run will take time. Let it run, it's mostly not hung, just generating embeddings/ ** Use - *Khoj via API* - See [[http://localhost:8000/docs][Khoj API Docs]] - [[http://localhost:8000/search?q=%22what%20is%20the%20meaning%20of%20life%22][Query]] - [[http://localhost:8000/regenerate?t=ledger][Regenerate Embeddings]] - [[https://localhost:8000/ui][Configure Application]] - *Khoj via Emacs* - [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#installation][Install]] [[./src/interface/emacs/khoj.el][khoj.el]] - Run ~M-x khoj ~ ** Run Unit tests #+begin_src shell pytest #+end_src ** Upgrade #+begin_src shell docker-compose build --pull #+end_src ** Troubleshooting - Symptom: Errors out with "Killed" in error message - Fix: Increase RAM available to Docker Containers in Docker Settings - Refer: [[https://stackoverflow.com/a/50770267][StackOverflow Solution]], [[https://docs.docker.com/desktop/mac/#resources][Configure Resources on Docker for Mac]] - Symptom: Errors out complaining about Tensors mismatch, null etc - Mitigation: Delete content-type > image section from docker_sample_config.yml ** Miscellaneous - The experimental [[localhost:8000/chat][chat]] API endpoint uses the [[https://openai.com/api/][OpenAI API]] - It is disabled by default - To use it add your ~openai-api-key~ to config.yml ** Development Setup *** Setup on Local Machine **** 1. Install Dependencies 1. Install Python3 [Required] 2. [[https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html][Install Conda]] [Required] 3. Install Exiftool [Optional] #+begin_src shell sudo apt-get -y install libimage-exiftool-perl #+end_src **** 2. Install Khoj #+begin_src shell git clone https://github.com/debanjum/khoj && cd khoj conda env create -f config/environment.yml conda activate khoj #+end_src **** 3. Configure - Configure files/directories to search in ~content-type~ section of ~sample_config.yml~ - To run application on test data, update file paths containing ~/data/~ to ~tests/data/~ in ~sample_config.yml~ - Example replace ~/data/notes/*.org~ with ~tests/data/notes/*.org~ **** 4. Run Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML #+begin_src shell python3 -m src.main -c=config/sample_config.yml -vv #+end_src *** Upgrade On Local Machine #+begin_src shell cd khoj git pull origin master conda deactivate khoj conda env update -f config/environment.yml conda activate khoj #+end_src ** Acknowledgments - [[https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1][Multi-QA MiniLM Model]] for Asymmetric Text Search. See [[https://www.sbert.net/examples/applications/retrieve_rerank/README.html][SBert Documentation]] - [[https://github.com/openai/CLIP][OpenAI CLIP Model]] for Image Search. See [[https://www.sbert.net/examples/applications/image-search/README.html][SBert Documentation]] - Charles Cave for [[http://members.optusnet.com.au/~charles57/GTD/orgnode.html][OrgNode Parser]] - Sven Marnach for [[https://github.com/smarnach/pyexiftool/blob/master/exiftool.py][PyExifTool]]