mirror of
https://github.com/khoj-ai/khoj.git
synced 2025-02-17 08:04:21 +00:00
These changes improve context available to the search model. Specifically this should improve entry context from short knowledge trees, that is knowledge bases with sparse, short heading/entry trees Previously we'd always split markdown files by headings, even if a parent entry was small enough to fit entirely within the max token limits of the search model. This used to reduce the context available to the search model to select appropriate entries for a query, especially from short entry trees Revert back to using regex to parse through markdown file instead of using MarkdownHeaderTextSplitter. It was easier to implement the logical split using regexes rather than bend MarkdowHeaderTextSplitter to implement it. - DFS traverse the markdown knowledge tree, prefix ancestry to each entry |
||
---|---|---|
.. | ||
data | ||
__init__.py | ||
conftest.py | ||
helpers.py | ||
test_cli.py | ||
test_client.py | ||
test_conversation_utils.py | ||
test_date_filter.py | ||
test_file_filter.py | ||
test_helpers.py | ||
test_image_search.py | ||
test_markdown_to_entries.py | ||
test_multiple_users.py | ||
test_offline_chat_actors.py | ||
test_offline_chat_director.py | ||
test_openai_chat_actors.py | ||
test_openai_chat_director.py | ||
test_org_to_entries.py | ||
test_orgnode.py | ||
test_pdf_to_entries.py | ||
test_plaintext_to_entries.py | ||
test_rawconfig.py | ||
test_text_search.py | ||
test_word_filter.py |