mirror of
https://github.com/khoj-ai/khoj.git
synced 2024-11-27 17:35:07 +01:00
3e889760c7
- Update readme to indicate how to update the new sample_config to run on test data
4.4 KiB
4.4 KiB
Semantic Search
Allow natural language search on user content like notes, images, transactions using transformer based models
All search is done locally. User can interface with semantic-search app via Emacs, API or Commandline
Setup
Setup using Docker
1. Clone Repository
git clone https://github.com/debanjum/semantic-search && cd semantic-search
2. Configure
- Add Content Directories for Semantic Search to Docker-Compose
- Update docker-compose.yml to mount your images, org-mode notes, ledger/beancount directories
- If required, edit config settings in docker_sample_config.yml.
3. Run
docker-compose up -d
Troubleshooting
- The first run will take time. Let it run, it's mostly not hung
-
Symptom: Errors out with "Killed" in error message
- Fix: Increase RAM available to Docker Containers in Docker Settings
- Refer: StackOverflow Solution, Configure Resources on Docker for Mac
-
Symptom: Errors out complaining about Tensors mismatch, null etc
- Mitigation: Delete content-type > image section from docker_sample_config.yml
Setup on Local Machine
1. Install Dependencies
- Install Python3 [Required[
- Install Conda [Required]
-
Install Exiftool [Optional]
sudo apt-get -y install libimage-exiftool-perl
2. Install Semantic Search
git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search
3. Configure
- Configure files/directories to search in
content-type
section ofsample_config.yml
-
To run application on test data, update file paths containing
/data/
totests/data/
insample_config.yml
- Example replace
/data/notes/*.org
withtests/data/notes/*.org
- Example replace
4. Run
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
python3 -m src.main -c=sample_config.yml -vv
Use
-
Semantic Search via Emacs
- Install semantic-search.el
- Run
M-x semantic-search <user-query>
-
Semantic Search via API
- Query:
GET
http://localhost:8000/search?q="What is the meaning of life"&t=notes - Regenerate Embeddings:
GET
http://localhost:8000/regenerate - Semantic Search API Docs
- Query:
-
UI to Edit Config
Upgrade
On Docker
docker-compose build
On Local Machine
cd semantic-search
git pull origin master
conda deactivate semantic-search
conda env update -f environment.yml
conda activate semantic-search
Miscellaneous
-
The experimental /chat API endpoint uses the OpenAI API
- It is disabled by default
- To use it add your
openai-api-key
to config.yml
Acknowledgments
- MiniLM Model for Asymmetric Text Search. See SBert Documentation
- OpenAI CLIP Model for Image Search. See SBert Documentation
- Charles Cave for OrgNode Parser
- Sven Marnach for PyExifTool