sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-23 23:48:56 +01:00

Author	SHA1	Message	Date
sabaimran	124d97c26d	Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352 ) * Working example with LlamaV2 running locally on my machine - Download from huggingface - Plug in to GPT4All - Update prompts to fit the llama format * Add appropriate prompts for extracting questions based on a query based on llama format * Rename Falcon to Llama and make some improvements to the extract_questions flow * Do further tuning to extract question prompts and unit tests * Disable extracting questions dynamically from Llama, as results are still unreliable	2023-07-27 20:51:20 -07:00
Debanjum Singh Solanky	da3f4dc7e4	Fix test config to run OpenAI Chat Actor, Director tests OpenAI conversation processor schema had updated but conftest hadn't been updated to reflect the same. Update conftest setup of conversation processor to fix this	2023-07-27 11:30:04 -07:00
sabaimran	8b2af0b5ef	Add support for our first Local LLM 🤖🏠 (#330 ) * Add support for gpt4all's falcon model as an additional conversation processor - Update the UI pages to allow the user to point to the new endpoints for GPT - Update the internal schemas to support both GPT4 models and OpenAI - Add unit tests benchmarking some of the Falcon performance * Add exc_info to include stack trace in error logs for text processors * Pull shared functions into utils.py to be used across gpt4 and gpt * Add migration for new processor conversation schema * Skip GPT4All actor tests due to typing issues * Fix Obsidian processor configuration in auto-configure flow * Rename enable_local_llm to enable_offline_chat	2023-07-26 16:27:08 -07:00
Debanjum Singh Solanky	5bb42e56a8	Fix formatting of khoj test config and unused references in conftests	2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky	d078e7b1f6	Clean up search type usage in khoj server, tests and Readme	2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky	ef6a0044f4	Drop embeddings of deleted text entries from index Previously the deleted embeddings would continue to be in the index, even after the entry was deleted	2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky	c73feebf25	Test index embeddings are stable on incremental update & no norm Ensure order of new embedding insertion on incremental update does not affect the order and value of existing embeddings when normalization is turned off	2023-07-16 02:22:28 -07:00
Debanjum Singh Solanky	1482fd4d4d	Test index is stable sorted on incremental update with new entry Ensure order of new embedding, entry insertion on incremental update is stable	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	b02323ade6	Improve name of text search test functions Asymmetric was older name used to differentiate between symmetric, asymmetric search. Now that text search just uses asymmetric search stick to simpler name	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	7669b85da6	Test index is stable sorted on regenerate with new entry	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6e70b914c2	Remove unused dump_jsonl method The entries index is stored ingzipped jsonl files for each content type	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	88d1a29a84	Test index is stable for duplicate entries across regenerate, update - Current incorrect behavior: All entries with duplicate compiled form are kept on regenerate but on update only the last of the duplicated entries is kept This divergent behavior is not ideal to prevent index corruption across reconfigure and update	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	da98b92dd4	Create helper function to test value, order of entries & embeddings This helper should be used to observe if the current embeddings are stable sorted on regenerate and incremental update of index in text search tests	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	b9fb656657	Update Tests to setup both content_index, search_models before testing This is required by the updated structure of Khoj setup - Add content_config pytest fixture, pass bi_encoder from search_models.[text\|image]_search	2023-07-14 01:29:48 -07:00
Debanjum Singh Solanky	f664a74e77	Update Khoj server to run on non standard port, 42110 instead of 8000 Resolves #295	2023-07-10 21:27:58 -07:00
sabaimran	4c135ea316	Make streaming optional for the /chat endpoint (#287 ) * Update the /chat endpoint to conditionally support streaming - If streams are enabled, return the threadgenerator as it does currently - If stream is disabled, return a JSON response with the response/compiled references separated out - Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian - Rename chat/init/ to chat/history * Update khoj.el to use the /history endpoint - Update corresponding unit tests to use stream=true * Remove & from call to /chat for obsidian * Abstract functions out into a helpers.py file and clean up some of the error-catching	2023-07-09 10:12:09 -07:00
Debanjum Singh Solanky	171ce19e1f	Update date filter to allow quoting values in single quotes	2023-07-07 17:13:47 -07:00
Debanjum Singh Solanky	11f0a9f196	Fix chat tests since streaming. Pass args correctly to chat methods - Fix testing gpt converse method after it started streaming responses - Pass stop in model_kwargs dictionary and api key in openai_api_key parameter to chat completion methods. This should resolve the arg warning thrown by OpenAI module	2023-07-07 15:23:44 -07:00
Debanjum Singh Solanky	48870d9170	Fix parsing questions generated by extract_questions actor into list The previous json parsing was failing to handle questions with date filters Fix the chat actor tests to run without throwing error with freezegun complaining about importing transformers.local_llama model Remove quote escapes from date filter examples provided to extract_questions actor	2023-07-07 15:18:55 -07:00
Debanjum Singh Solanky	0f993b332e	Drop support for Ledger as a separate content type Khoj will soon get a generic text indexing content type. This along with a file filter should suffice for searching through Ledger transactions, if required. Having a specific content type for niche use-case like ledger isn't useful. Removing unused content types will reduce khoj code to manage.	2023-07-02 16:57:49 -07:00
Debanjum Singh Solanky	c9db5321e7	Remove unused org-music as an indexable content type from Khoj Org-music was just a custom content type that worked with org-music. It was mostly only useful for me. Cleaning up that code will reduce number of content types for khoj to manage.	2023-07-02 16:21:21 -07:00
sabaimran	36537606da	Update unit test and preserve prior operational ordering in main.py	2023-07-01 20:02:35 -07:00
sabaimran	f0f6390366	Make --no-gui the default behavior of Khoj and update corresponding documentation	2023-07-01 19:07:59 -07:00
sabaimran	6edc32f2f4	Accept current changes to include issues in rendering flow	2023-06-29 12:25:29 -07:00
sabaimran	e6053951f0	In chat conftest fixtures, use .markdown rather than .md	2023-06-29 11:53:47 -07:00
sabaimran	601b738135	Bonus: Rename all md files to markdown for cleanliness	2023-06-29 11:27:47 -07:00
Debanjum Singh Solanky	56ce97ef9e	Use async/await in tests for query method of text and image search The text, image search query method has become async. So async/await is required to get results correctly in tests etc	2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky	f516d127c8	Update client tests to expect "all" as a valid new content type	2023-06-28 22:07:02 -07:00
sabaimran	2697c7a186	Update org tests to use new method, update Github configuration in tests	2023-06-27 15:04:48 -07:00
Debanjum Singh Solanky	69d4fa6525	Rename project links across repo from debanjum/khoj to khoj-ai/khoj	2023-06-21 00:13:21 -07:00
Debanjum Singh Solanky	595cc5b0f5	Use printer icon for PDF logs. Only split lines if file at web link in web interface	2023-06-18 02:26:03 -07:00
Saba	07ade2262a	Set default value of pat_token in conftest.py to be empty string	2023-06-13 17:03:03 -07:00
Saba	751edfefe5	Add separate unit test for github. Will only run of a PAT token is set	2023-06-13 16:55:58 -07:00
Saba	3a61919344	Fix failing unit tests by hard-coding model presence of expected search types	2023-06-13 16:32:47 -07:00
Saba	019d3732de	Rename orgmode_search to org_search	2023-06-13 16:06:54 -07:00
Saba	5d5ebcbf7c	Rename truncate messages method and update unit tests to simplify assertion logic	2023-06-06 23:25:43 -07:00
Saba	7119ed0849	Run pre-commit script	2023-06-05 19:29:23 -07:00
Saba	948ba6ddca	Remove unused logger	2023-06-05 19:01:03 -07:00
Saba	f65ff9815d	Move message truncation logic into a separate function. Add unit tests with factory boy.	2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky	acd14a5e41	Wire up PDF to jsonl processor to Khoj server layer (API, config) - Specify PDF content to index via khoj.yml - Index PDF content on app start, reconfigure - Expose PDF as a search type via API	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	d63194c3a9	Create tests for PDF to JSONL processor	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	131b8407b5	Allow Khoj Chat to respond to general queries not in reference notes - Khoj chat will now respond to general queries if: 1. no relevant reference notes available or 2. when explicitly induced by prefixing the chat message with "@general" - Previously Khoj Chat would a lot of times refuse to respond to general queries not answerable from reference notes or chat history - Make chat quality tests more robust - Add more equivalent chat response options refusing to answer - Force haiku writing to not give any preable, just the haiku	2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky	cc75f986b2	Test text search index only updates on changes to text content	2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky	02aeee60aa	Set filename as top heading of org entries for better search context Previously filename was only being appended to markdown entries. Test filename getting prepended to compiled entry as heading	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	5de04621b5	Set filename as top heading of md entries for better search context Previously filename was appended to the end of the compiled entry. This didn't provide appropriate structured context Test filename getting prepended as heading to compiled entry	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	0e3fb59e09	Entries with no md headings should not get heading prefix prepended Files with no headings would previously get their entry be prefixed with a markdown heading prefix (#)	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	45a991d75c	Prepend entry heading to all compiled org snippets to improve search context All compiled snippets split by max tokens (apart from first) do not get the heading as context. This limits search context required to retrieve these continuation entries	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	5673bd5b96	Keep original formatting in compiled text entry strings - Explicity split entry string by space during split by max_tokens - Prevent formatting of compiled entry from being lost - The formatting itself contains useful information No point in dropping the formatting unnecessarily, even if (say) the currrent search models don't account for it (yet)	2023-03-30 14:02:46 +07:00
Debanjum Singh Solanky	a2ab68a7a2	Include filename of markdown entries for search indexing Append originating filename to compiled string of each entry for better search quality by providing more context to model Update markdown_to_jsonl tests to ensure filename being added Resolves #142	2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky	7e36f421f9	Truncate message logs to below max supported prompt size by model - Use tiktoken to count tokens for chat models - Make conversation turns to add to prompt configurable via method argument to generate_chatml_messages_with_context method	2023-03-25 05:13:56 +07:00

1 2 3 4

196 commits