- For queries with only filters in them short-circuit and return
filtered results. No need to run semantic search, re-ranking.
- Add client test for filter only query and quote query in client tests
- It's more of a hassle to not let word filter go stale on entry
updates
- Generating index on 120K lines of notes takes 1s. Loading from file
takes 0.2s. For less content load time difference will be even smaller
- Let go of startup time improvement for simplicity for now
- Do not run the more expensive explicit filter until the word to be
filtered is completed by user. This requires an end sequence marker
to identify end of explicit word filter to trigger filtering
- Space isn't a good enough delimiter as the explicit filter could be
at the end of the query in which case no space
- Improve search speed by ~10x
Tested on corpus of 125K lines, 12.5K entries
- Allow cross-encoder to re-rank results by settings &?r=true when querying /search API
- It's an optional param that default to False
- Earlier all results were re-ranked by cross-encoder
- Making this configurable allows for much faster results, if desired
but for lower accuracy
- The code for both the text search types were mostly the same
It was earlier done this way for expedience while experimenting
- The minor differences were reconciled and merged into a single
text_search type
- This simplifies the app and making it easier to process other
text types
- Had already made some progress on this earlier by updating the image
search responses. But needed to update the text search responses to
use lowercase entry and score
- Update khoj.el to consume the updated json response keys for text
search
- Rename pytest fixture search_config to more appropriate
content_config
- Create search_config pytest fixture
- Use search_config where search being setup, used in tests