sij/khoj - 〄.ai

302 commits 12 branches 107 tags 132 MiB

Author	SHA1	Message	Date
Debanjum Singh Solanky	989526ae54	Use a more accurate model for symmetric semantic search - The all-MiniLM-L6-v2 is more accurate - The exact previous model isn't benchmarked but based on the performance of the closest model to it. Seems like the new model maybe similar in speed and size - On very preliminary evaluation of the model, the new model seems faster, with pretty decent results	2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky	4a90972e38	Use a better model for asymmetric semantic search - The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1] - It has the right mix of model query speed, size and performance on benchmarks - On hugging face it has way more downloads and likes than the msmarco model[2] - On very preliminary evaluation of the model - It doubles the encoding speed of all entries (down from ~8min to 4mins) - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier) [1]: https://www.sbert.net/docs/pretrained_models.html [2]: https://huggingface.co/sentence-transformers	2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky	78b76d65a0	Minor fix to notes jsonl file extension in sample_config.yml	2022-01-29 04:13:36 -05:00
Debanjum Singh Solanky	c31abad0a6	Mount embeddings to /data/embeddings for directory naming consistency - Keeps directory paths consistent between host and container volumes - Consistency simplifies documentation and updates required to setup sample_config.yml for local installation	2022-01-29 03:24:02 -05:00
Debanjum Singh Solanky	b0067fc32e	Store docker, conda, semantic-search configuration in a config directory - Improves organization of config files required for application - Declutters the application root directory from configs	2022-01-29 02:41:11 -05:00

Renamed from sample_config.yml (Browse further)

5 commits