Mirror of khoj from Github
Find a file
Debanjum Singh Solanky 04a9a6d62f Expose API endpoint to (re-)generate embeddings from latest notes
- Provides mechanism to update notes from within application
  - Instead of having to pass the same arguments multiple times
    Pass it once (or rely on defaults when possible) and let app keep
    state and location of intermediary files

- Allows user to not have to deal with the internals of the application
  - E.g user doesn't have to specify the jsonl.gz or embeddings file path
    The app will still put those files in a default location
  - The user doesn't have to run the generation from the commandline
    as a separate step
2021-08-16 18:52:38 -07:00
interface/emacs Minor doc updates after merging emacs package with main repository 2021-08-16 02:02:26 -07:00
processor Improve debug output from org_to_jsonl.py script 2021-08-16 18:50:29 -07:00
search_type Use verbosity level instead of bool across application 2021-08-16 17:15:41 -07:00
utils Allow reuse of get_absolute_path, is_none_or_empty methods 2021-08-16 16:33:43 -07:00
.gitignore Add Readme, License. Update .gitignore 2021-08-15 22:52:37 -07:00
environment.yml Create API interface for Semantic Search 2021-08-15 18:11:48 -07:00
LICENSE Add Readme, License. Update .gitignore 2021-08-15 22:52:37 -07:00
main.py Expose API endpoint to (re-)generate embeddings from latest notes 2021-08-16 18:52:38 -07:00
README.md Use better cmdline argument names. Drop unneeded no-compress argument 2021-08-16 13:49:39 -07:00

Semantic Search

Provide natural language search on user personal content like notes, images using ML models

All data is processed locally. User can interface with semantic-search app via Emacs, API or Commandline

Dependencies

Install

git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search

Setup

Generate compressed JSONL from specified org-mode files

python3 processor/org-mode/org-to-jsonl.py \
--input-files ~/Notes/Schedule.org ~/Notes/Incoming.org \
--output-file .notes.jsonl.gz \
--verbose

Run

Load ML model, generate embeddings and expose API interface to run user queries on above org-mode files

python3 main.py \
--compressed-jsonl .notes.jsonl.gz \
--embeddings .notes_embeddings.pt \
--verbose

Use

Acknowledgments