Having org-mode result headings change size based on their depth in
the source document makes is a confusing UI experience.
Improve font-size, line-spacing and margins of results to make
delineation between entries, and differntiating between entry heading
and it's body easier to visually infer.
Do not white-space: pre-line. Improves rendering of Markdown results
## Details
- We were previously just wrapping results from /search API into a pre formatted div field. This was not easy to read
- Use [markdown-it](https://github.com/markdown-it/markdown-it) to render markdown results from Khoj `/search` API as proper HTML
Closes#43
## Support Incremental Search on Khoj Web Interface
- Use default, fast path to query /search API while user is typing
- Upgrade to cross-encoder re-ranked results once user hits enter on search box
## Improve Render of Org Results on Web Interface
- We were previously just wrapping results from /search API into a pre formatted div field. This was not easy to read
- Use [org.js](https://mooz.github.io/org-js/) to render results from Khoj `/search` API as proper HTML
- Improve org.js to render all task states, stylize task tags and make org-mode results look more like original content
Closes#42#41
# Details
## Improve Search API Latency
- Improve Search API Latency by ~50-100x to <100ms
- Trade-off speed for accuracy in default, fast path of /search API by not re-ranking results using cross-encoder
- Make re-ranking of results via cross-encoder configurable via new `?&r=<false|true>` query param to /search API
- Only deep-copy entries, embeddings to apply filters if query has any filter keywords
## Support Incremental Update via Khoj Emacs Frontend
- Use default, fast path to query /search API while user is typing
- Upgrade to cross-encoder re-ranked results once user goes idle (or ends search normally)
Closes#37
In current state:
- Rerank results:
- If user idles while entering query OR
- exits normally
- Do not rerank results:
- If user exits abnormally, e.g via C-g from query
- Rename functions to more standard, descriptive names
- Keep known, required code for incremental search
- E.g Do not set buffer local flag in hooks on minibuffer setup
- Only query when user in khoj minibuffer
- Use active-minibuffer-window and track khoj minibuffer
- (minibuffer-prompt) is not useful for our use-case here
- (For now) Run re-rank only if user idle while querying
- Do not run rerank on teardown/completion
- The reranking lag (~2s) is annoying; hit enter,
wait to see results
- Also triggered when user exits abnormally,
so C-g also results in rerank which is even more annoying
- Emacs will still hang if re-ranking gets triggered on idle but
that's better than always getting triggered. And better than not
having mechanism to get results re-ranked via cross-encoder at all
- Update khoj-simple to work cross-encoder re-ranked results like before
- Increment major version as incremental search considered a breaking
change and a major update to search capability
- Improve search speed by ~10x
Tested on corpus of 125K lines, 12.5K entries
- Allow cross-encoder to re-rank results by settings &?r=true when querying /search API
- It's an optional param that default to False
- Earlier all results were re-ranked by cross-encoder
- Making this configurable allows for much faster results, if desired
but for lower accuracy
- Formalize filters into class with can_filter() and filter() methods
- Use can_filter() method to decide whether to apply filter and
create deep copies of entries and embeddings for it
- Improve search speed for queries with no filters
as deep copying entries, embeddings takes the most time
after cross-encodes scoring when calling the /search API
Earlier we would create deep copies of entries, embeddings
even if the query did not contain any filter keywords
- Reason:
Allow natural search on markdown based notes, documentation,
websites etc
- Details:
- Create markdown processor to extract Markdown entries (identified by
Heading) into standard jsonl format required by text_search
- Update API, Configs to support interfacing with new markdown type
- Update Emacs, Web clients to support interfacing with new markdown
type via API
- Update Readme to mentiond markdown is also supported
Closes#35
- The code for both the text search types were mostly the same
It was earlier done this way for expedience while experimenting
- The minor differences were reconciled and merged into a single
text_search type
- This simplifies the app and making it easier to process other
text types
Now that the logic to compile entries is in the processor layer, the
extract_entries method is standard across (text) search_types
Extract the load_jsonl method as a utility helper method.
Use it in (a)symmetric search types