Mirror of khoj from Github
Find a file
Debanjum Singh Solanky fb86be8cd9 Add ID, File+Heading based Links to Org-Mode Entries
- Add links to property drawer
- This ensures results returned by semantic search contain these links
- This allows the user to jump to entry within original file for context
- The ID, file+heading based links are more robust to find relevant
  entry in original file than the line no based link,
  as edits being done by user to original files between embedding regenerations
2022-06-17 03:11:11 +03:00
.github/workflows Set PORT arg when building docker image in the build workflow 2022-01-29 18:11:47 -05:00
config Minor fix to notes jsonl file extension in sample_config.yml 2022-01-29 04:13:36 -05:00
src Add ID, File+Heading based Links to Org-Mode Entries 2022-06-17 03:11:11 +03:00
tests Improve test data organization and update correspoding conftests 2022-01-29 02:03:17 -05:00
views Fix input text behavior for null/empty value fields 2021-12-04 10:45:48 -05:00
.gitignore Improve test data organization and update correspoding conftests 2022-01-29 02:03:17 -05:00
demo.mp4 Add demo of semantic search to repository 2022-05-14 04:29:25 -04:00
docker-compose.yml Mount embeddings to /data/embeddings for directory naming consistency 2022-01-29 03:24:02 -05:00
LICENSE Add Readme, License. Update .gitignore 2021-08-15 22:52:37 -07:00
README.org Show Demo of Semantic Search in Readme 2022-05-14 01:29:13 -07:00

https://github.com/debanjum/semantic-search/actions/workflows/test.yml/badge.svg https://github.com/debanjum/semantic-search/actions/workflows/build.yml/badge.svg

Semantic Search

Allow natural language search on user content like notes, images, transactions using transformer ML models

User can interface with semantic-search via the API or Emacs. All search is done locally*

Setup

1. Clone

  git clone https://github.com/debanjum/semantic-search && cd semantic-search

2. Configure

3. Run

docker-compose up -d

Note: The first run will take time. Let it run, it's mostly not hung, just generating embeddings

Use

Upgrade

  docker-compose build

Troubleshooting

  • Symptom: Errors out with "Killed" in error message

  • Symptom: Errors out complaining about Tensors mismatch, null etc

    • Mitigation: Delete content-type > image section from docker_sample_config.yml

Miscellaneous

  • The experimental chat API endpoint uses the OpenAI API

    • It is disabled by default
    • To use it add your openai-api-key to config.yml

Development Setup

Setup on Local Machine

1. Install Dependencies
  1. Install Python3 [Required]
  2. Install Conda [Required]
  3. Install Exiftool [Optional]

    sudo apt-get -y install libimage-exiftool-perl
2. Install Semantic Search
git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f config/environment.yml
conda activate semantic-search
3. Configure
  • Configure files/directories to search in content-type section of sample_config.yml
  • To run application on test data, update file paths containing /data/ to tests/data/ in sample_config.yml

    • Example replace /data/notes/*.org with tests/data/notes/*.org
4. Run

Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML

python3 -m src.main -c=config/sample_config.yml -vv

Upgrade On Local Machine

  cd semantic-search
  git pull origin master
  conda deactivate semantic-search
  conda env update -f config/environment.yml
  conda activate semantic-search

Acknowledgments