Commit graph

5 commits

Author SHA1 Message Date
Debanjum Singh Solanky
989526ae54 Use a more accurate model for symmetric semantic search
- The all-MiniLM-L6-v2 is more accurate
  - The exact previous model isn't benchmarked but based on the
    performance of the closest model to it. Seems like the new model
    maybe similar in speed and size

- On very preliminary evaluation of the model, the new model seems
  faster, with pretty decent results
2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky
4a90972e38 Use a better model for asymmetric semantic search
- The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1]
- It has the right mix of model query speed, size and performance on benchmarks
- On hugging face it has way more downloads and likes than the msmarco model[2]
- On very preliminary evaluation of the model
  - It doubles the encoding speed of all entries (down from ~8min to 4mins)
  - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier)

[1]: https://www.sbert.net/docs/pretrained_models.html
[2]: https://huggingface.co/sentence-transformers
2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky
78b76d65a0 Minor fix to notes jsonl file extension in sample_config.yml 2022-01-29 04:13:36 -05:00
Debanjum Singh Solanky
c31abad0a6 Mount embeddings to /data/embeddings for directory naming consistency
- Keeps directory paths consistent between host and container volumes
- Consistency simplifies documentation and updates required to setup
  sample_config.yml for local installation
2022-01-29 03:24:02 -05:00
Debanjum Singh Solanky
b0067fc32e Store docker, conda, semantic-search configuration in a config directory
- Improves organization of config files required for application
- Declutters the application root directory from configs
2022-01-29 02:41:11 -05:00
Renamed from sample_config.yml (Browse further)