khoj/docs/performance.md at c93dcc948a6e7f43755e1f190ddd272cda028ac6 - sij/khoj - 〄.ai

sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-27 09:25:06 +01:00

Debanjum Singh Solanky c28755ccd2 Fix diff blocks, links, remove footnotes & rearrange sections in docs

Extract performance into separate sectin into shoving it under search
Create page for web interface

2023-07-21 00:58:30 -07:00

795 B

Raw Blame History

Performance

Search performance

Semantic search using the bi-encoder is fairly fast at <100 ms across all content types
Reranking using the cross-encoder is slower at <2s on 15 results. Tweak top_k to tradeoff speed for accuracy of results
Filters in query (e.g by file, word or date) usually add <20ms to query latency

Indexing performance

Indexing is more strongly impacted by the size of the source data
Indexing 100K+ line corpus of notes takes about 10 minutes
Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
Note: It should only take this long on the first run as the index is incrementally updated

Miscellaneous

Testing done on a Mac M1 and a >100K line corpus of notes
Search, indexing on a GPU has not been tested yet