khoj/tests
Debanjum 11ce3e2268
Update Text Chunking Strategy to Improve Search Context (#645)
## Major
- Parse markdown, org parent entries as single entry if fit within max tokens
- Parse a file as single entry if it fits with max token limits
- Add parent heading ancestry to extracted markdown entries for context
- Chunk text in preference order of para, sentence, word, character

## Minor
- Create wrapper function to get entries from org, md, pdf & text files
- Remove unused Entry to Jsonl converter from text to entry class, tests
- Dedupe code by using single func to process an org file into entries

Resolves #620
2024-04-08 13:56:38 +05:30
..
data Update the default configuration for the AppConfig 2023-11-17 19:26:31 -08:00
__init__.py Move tests out to project root. Use absolute import in project 2021-09-30 04:12:14 -07:00
conftest.py Address Notion, Image tech debt in indexing code path (#687) 2024-04-05 12:10:03 +05:30
helpers.py Use llama.cpp for offline chat models 2024-03-26 22:33:01 +05:30
test_cli.py Add isort to the pre-commit configuration and apply it to the whole project (#595) 2023-12-28 18:04:02 +05:30
test_client.py Update Text Chunking Strategy to Improve Search Context (#645) 2024-04-08 13:56:38 +05:30
test_conversation_utils.py Handle msg truncation when question is larger than max prompt size 2024-03-31 15:50:06 +05:30
test_date_filter.py Improve date filter regexes to extract structured, natural, partial dates 2024-03-30 00:07:19 +05:30
test_file_filter.py [Multi-User Part 1]: Enable storage of settings for plaintext files based on user account (#498) 2023-10-26 09:42:29 -07:00
test_helpers.py Part 2: Add web UI updates for basic agent interactions (#675) 2024-03-26 18:13:24 +05:30
test_markdown_to_entries.py Fix adding file path instead of stem to markdown entries 2024-04-04 02:41:55 +05:30
test_multiple_users.py Increase search distance to get relevant content for chat post indexer update 2024-04-04 02:41:55 +05:30
test_offline_chat_actors.py Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat 2024-03-31 00:59:20 +05:30
test_offline_chat_director.py Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat 2024-03-31 00:59:20 +05:30
test_openai_chat_actors.py Increase search distance to get relevant content for chat post indexer update 2024-04-04 02:41:55 +05:30
test_openai_chat_director.py Part 1: Server-side changes to support agents integrated with Conversations (#671) 2024-03-23 22:09:38 +05:30
test_org_to_entries.py Update drop large words test to ensure newlines considerd word boundary 2024-04-08 13:38:08 +05:30
test_orgnode.py Add isort to the pre-commit configuration and apply it to the whole project (#595) 2023-12-28 18:04:02 +05:30
test_pdf_to_entries.py Remove unused Entry to Jsonl converter from text to entry class, tests 2024-04-04 02:41:55 +05:30
test_plaintext_to_entries.py Remove unused Entry to Jsonl converter from text to entry class, tests 2024-04-04 02:41:55 +05:30
test_rawconfig.py Add isort to the pre-commit configuration and apply it to the whole project (#595) 2023-12-28 18:04:02 +05:30
test_text_search.py Fix all unit tests for test_text_search 2024-04-04 02:41:55 +05:30
test_word_filter.py Fix test word filter 2023-11-19 13:14:58 -08:00