Commit graph

4 commits

Author SHA1 Message Date
Debanjum Singh Solanky
4a90972e38 Use a better model for asymmetric semantic search
- The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1]
- It has the right mix of model query speed, size and performance on benchmarks
- On hugging face it has way more downloads and likes than the msmarco model[2]
- On very preliminary evaluation of the model
  - It doubles the encoding speed of all entries (down from ~8min to 4mins)
  - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier)

[1]: https://www.sbert.net/docs/pretrained_models.html
[2]: https://huggingface.co/sentence-transformers
2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky
79c2224eaa Improve test data organization and update correspoding conftests
- Put test data for each content type into separate directories
- Makes config.yml for docker and local host consistent
  - Prepending tests to /data in sample_config.yml makes application
    run on local host using test data
  - Allows mounting separate volume for each content type in docker-compose
- Ignore gitignore to only add tests content, not generated models or embeddings
2022-01-29 02:03:17 -05:00
Debanjum Singh Solanky
866ccb5cd3 Add all configurables to sample_config. Add test music, ledger data
- Get ledger sample from github.com/debanjum/company-ledger
- Get music sample from github.com/debanjum/org-music
2021-10-02 16:11:27 -07:00
Debanjum Singh Solanky
d2905c4be6 Move tests out to project root. Use absolute import in project
tests/ directory in project root is more standard.
Just had to use absolute path for internal module imports to get it to
work
2021-09-30 04:12:14 -07:00