mirror of
https://github.com/khoj-ai/khoj.git
synced 2024-11-24 16:05:07 +01:00
4.3 KiB
4.3 KiB
Khoj
Allow natural language search on user content like notes, images, transactions using transformer ML models
User can interface with Khoj via Web, Emacs or the API. All search is done locally*
Demo
https://user-images.githubusercontent.com/6413477/168417719-8a8bc4e5-8404-42b2-89a7-4493e3d2582c.mp4
Setup
1. Clone
git clone https://github.com/debanjum/khoj && cd khoj
2. Configure
- [Required] Update docker-compose.yml to mount your images, (org-mode or markdown) notes and beancount directories
- [Optional] Edit application configuration in
sample
config.yml
3. Run
docker-compose up -d
Note: The first run will take time. Let it run, it's mostly not hung, just generating embeddings
Use
- Khoj via API
- Khoj via Emacs
Run Unit tests
pytest
Upgrade
docker-compose build --pull
Troubleshooting
- Symptom: Errors out with "Killed" in error message
- Fix: Increase RAM available to Docker Containers in Docker Settings
- Refer: StackOverflow Solution, Configure Resources on Docker for Mac
- Symptom: Errors out complaining about Tensors mismatch, null etc
- Mitigation: Delete content-type > image section from
docker
sampleconfig.yml
- Mitigation: Delete content-type > image section from
docker
Miscellaneous
- The experimental chat API endpoint uses the
OpenAI API
- It is disabled by default
- To use it add your
openai-api-key
to config.yml
Development Setup
Setup on Local Machine
-
1. Install Dependencies
-
Install Python3 [Required]
-
Required
-
Install Exiftool [Optional]
sudo apt-get -y install libimage-exiftool-perl
-
-
2. Install Khoj
git clone https://github.com/debanjum/khoj && cd khoj conda env create -f config/environment.yml conda activate khoj
-
3. Configure
- Configure files/directories to search in
content-type
section ofsample_config.yml
- To run application on test data, update file paths containing
/data/
totests/data/
insample_config.yml
- Example replace
/data/notes/*.org
withtests/data/notes/*.org
- Example replace
- Configure files/directories to search in
-
4. Run
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
python3 -m src.main -c=config/sample_config.yml -vv
Upgrade On Local Machine
cd khoj
git pull origin master
conda deactivate khoj
conda env update -f config/environment.yml
conda activate khoj
Acknowledgments
- Multi-QA MiniLM Model for Asymmetric Text Search. See SBert Documentation
- OpenAI CLIP Model for Image Search. See SBert Documentation
- Charles Cave for OrgNode Parser
- Sven Marnach for PyExifTool