Mirror of khoj from Github
Find a file
Debanjum Singh Solanky dcf7b2d04f Remove requirements.txt for now as virtualenv setup doesn't work
Haven't gotten it to work on Mac or Ubuntu. Remove to avoid confusion
for now. Application depends on miniconda for now
2021-08-16 00:15:10 -07:00
processor/org-mode Add org processor to generate compressed jsonl from org-mode files 2021-08-15 22:52:31 -07:00
search_types Use updated path to MiniLM bi-encoder model on hugging-face 2021-08-15 23:57:22 -07:00
utils Change default install directory to current, fix open file code 2021-08-15 23:01:55 -07:00
.gitignore Add Readme, License. Update .gitignore 2021-08-15 22:52:37 -07:00
environment.yml Create API interface for Semantic Search 2021-08-15 18:11:48 -07:00
LICENSE Add Readme, License. Update .gitignore 2021-08-15 22:52:37 -07:00
main.py Move different search types into search_types directory 2021-08-15 19:09:50 -07:00
README.md Acknowledge ML models used for search. Simplify path used in commands 2021-08-15 23:56:18 -07:00

Semantic Search

Provide natural language search on user personal content like notes, images using ML models

All data is processed locally. User can interface with semantic-search app via Emacs, API or Commandline

Dependencies

Install

git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search

Setup

Generate compressed JSONL from specified org-mode files

python3 processor/org-mode/org-to-jsonl.py \
--org-files "Schedule.org" "Incoming.org" \
--org-directory "~/Notes" \
--jsonl-file ".notes.jsonl" \
--compress \
--verbose

Run

Load ML model, generate embeddings and expose API interface to run user queries on above org-mode files

python3 main.py -j .notes.jsonl.gz -e .notes_embeddings.pt

Use

  • Calls Semantic Search via Emacs

    • M-x semantic-search "<user-query>"
    • C-c C-s
  • Call Semantic Search via API

  • Call Semantic Search via Python Script Directly

    python3 search_types/asymmetric.py \
    -j .notes.jsonl.gz \
    -e .notes_embeddings.pt \
    -n 5 \
    --verbose \
    --interactive
    

Acknowledgments