khoj/README.org

57 lines
2.6 KiB
Org Mode

[[https://github.com/debanjum/semantic-search/actions/workflows/build.yml/badge.svg]]
* Semantic Search
/Allow natural language search on user content like notes, images, transactions using transformer based models/
All data is processed locally. User can interface with semantic-search app via [[./src/interface/emacs/semantic-search.el][Emacs]], API or Commandline
** Dependencies
- Python3
- [[https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links][Miniconda]]
** Install
#+begin_src shell
git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search
#+end_src
*** Install Environmental Dependencies
#+begin_src shell
sudo apt-get -y install libimage-exiftool-perl
#+end_src
** Configure
Configure application search types and their underlying data source/files in ~sample_config.yml~
Use the ~sample_config.yml~ as reference
** Run
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
#+begin_src shell
python3 -m src.main -c=sample_config.yml -vv
#+end_src
** Use
- *Semantic Search via Emacs*
- [[https://github.com/debanjum/semantic-search/tree/master/src/interface/emacs#installation][Install]] [[./src/interface/emacs/semantic-search.el][semantic-search.el]]
- Run ~M-x semantic-search <user-query>~
- *Semantic Search via API*
- Query: ~GET~ [[http://localhost:8000/search?q=%22what%20is%20the%20meaning%20of%20life%22][http://localhost:8000/search?q="What is the meaning of life&t=notes"]]
- Regenerate Embeddings: ~GET~ [[http://localhost:8000/regenerate][http://localhost:8000/regenerate?t=image]]
- [[http://localhost:8000/docs][Semantic Search API Docs]]
** Upgrade
#+begin_src shell
cd semantic-search
git pull origin master
conda env update -f environment.yml
conda activate semantic-search
#+end_src
** Acknowledgments
- [[https://huggingface.co/sentence-transformers/msmarco-MiniLM-L-6-v3][MiniLM Model]] for Asymmetric Text Search. See [[https://www.sbert.net/examples/applications/retrieve_rerank/README.html][SBert Documentation]]
- [[https://github.com/openai/CLIP][OpenAI CLIP Model]] for Image Search. See [[https://www.sbert.net/examples/applications/image-search/README.html][SBert Documentation]]
- Charles Cave for [[http://members.optusnet.com.au/~charles57/GTD/orgnode.html][OrgNode Parser]]
- Sven Marnach for [[https://github.com/smarnach/pyexiftool/blob/master/exiftool.py][PyExifTool]]