Mirror of khoj from Github
Find a file
Debanjum Singh Solanky 41c328dae0 Batch encode images to keep memory consumption manageable
- Issue:
  Process would get killed while encoding images
  for consuming too much memory

- Fix:
  - Encode images in batches and append to image_embeddings
  - No need to use copy or deep_copy anymore with batch processing.
    It would earlier throw too many files open error

Other Changes:
  - Use tqdm to see progress even when using batch
  - See progress bar of encoding independent of verbosity (for now)
2021-09-16 10:15:54 -07:00
src Batch encode images to keep memory consumption manageable 2021-09-16 10:15:54 -07:00
.gitignore Add Readme, License. Update .gitignore 2021-08-15 22:52:37 -07:00
environment.yml Enable Semantic Search on Images 2021-08-22 21:42:37 -07:00
LICENSE Add Readme, License. Update .gitignore 2021-08-15 22:52:37 -07:00
README.org Update Readme to state can now query beancount transactions, images 2021-08-22 21:50:27 -07:00
sample_config.yml Update sample config to add minimal config for ledger, image search 2021-08-22 21:54:49 -07:00

Semantic Search

Allow natural language search on user content like notes, images, transactions using transformer based models

All data is processed locally. User can interface with semantic-search app via Emacs, API or Commandline

Dependencies

Install

git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search

Run

Load ML model, generate embeddings and expose API to query specified org-mode files

python3 src/main.py -c=sample_config.yml --verbose

Use

Upgrade

  cd semantic-search
  git pull origin master
  conda env update -f environment.yml
  conda activate semantic-search

Acknowledgments