- Reason:
Allow natural search on markdown based notes, documentation,
websites etc
- Details:
- Create markdown processor to extract Markdown entries (identified by
Heading) into standard jsonl format required by text_search
- Update API, Configs to support interfacing with new markdown type
- Update Emacs, Web clients to support interfacing with new markdown
type via API
- Update Readme to mentiond markdown is also supported
Closes#35
- The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1]
- It has the right mix of model query speed, size and performance on benchmarks
- On hugging face it has way more downloads and likes than the msmarco model[2]
- On very preliminary evaluation of the model
- It doubles the encoding speed of all entries (down from ~8min to 4mins)
- It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier)
[1]: https://www.sbert.net/docs/pretrained_models.html
[2]: https://huggingface.co/sentence-transformers
- YAML Config
- Can specify all params[1] earlier being passed via cmd args in config YAML
- Can now also configure sentence-transformer models to use etc for search
- [1] Config params
- org files
- compressed entries file config path
- embeddings file config path
- Include sample_config.yaml
- Include sample .org file from this repos readmes
- CLI
- Configuration Priority: Config via cmd > Config via YAML > Default Config
- Test CLI, include test config.yml for the tests
- Set default type to None unless set via query param to API
Run notes search if search_enabled, also if type is None (default)
Prepares for running queries on all search types unless type
specified in API query param
- Update Readme