sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-20 11:27:46 +00:00

Author	SHA1	Message	Date
sabaimran	f7e03f6d63	Switch spinner snake case -> camel case	2023-08-01 08:52:25 -07:00
sabaimran	1c52a6993f	add a lock around chat operations to prevent the offline model from getting bombarded and stealing a bunch of compute resources - This also solves #367	2023-08-01 00:23:17 -07:00
sabaimran	6c3074061b	Disable the input bar when chat response is in flight	2023-08-01 00:21:39 -07:00
sabaimran	c14cbe926a	Add a loading symbol to web chat. Closes #392	2023-07-31 23:35:48 -07:00
sabaimran	8054bdc896	Use n_batch parameter to increase resource consumption on host machine (and implicitly engage GPU)	2023-07-31 23:25:08 -07:00
sabaimran	e55e9a7b67	Fix unit tests and truncation logic	2023-07-31 21:37:59 -07:00
sabaimran	2335f11b00	Add better error handling for download processes incase of failure	2023-07-31 21:07:38 -07:00
sabaimran	209975e065	Resolve merge conflicts: let Khoj fail if the model tokenizer is not found	2023-07-31 19:12:26 -07:00
sabaimran	2d6c3cd4fa	Misc. quality improvements for Llama V2 - Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S - Use a proper Llama Tokenizer for counting tokens for truncation with Llama - Add additional null checks when running	2023-07-31 19:11:20 -07:00
sabaimran	ca195097d7	Update chat hint message at first run	2023-07-31 17:46:09 -07:00
Debanjum Singh Solanky	ded606c7cb	Fix format of user query during general conversation with Llama 2	2023-07-31 17:21:14 -07:00
Debanjum Singh Solanky	48e5ac0169	Do not drop system message when truncating context to max prompt size Previously the system message was getting dropped when the context size with chat history would be more than the max prompt size supported by the cat model Now only the previous chat messages are dropped or the current message is truncated but the system message is kept to provide guidance to the chat model	2023-07-31 17:21:14 -07:00
sabaimran	88ef86ad5c	Fix typing issues for mypy (#372 )	2023-07-30 19:27:48 -07:00
sabaimran	ca2c942b65	Add typing to compiled_references and inferred_queries	2023-07-30 19:10:30 -07:00
sabaimran	3646fd1449	Add a warning to indicate that Khoj is not configured to work with personal data sources	2023-07-30 18:52:10 -07:00
sabaimran	996832dc72	Allow user to chat even if content types aren't configured - use empty references	2023-07-30 18:47:45 -07:00
Debanjum Singh Solanky	53810a0ff7	Create khoj config dir if non-existant, before writing to khoj env file	2023-07-30 01:35:36 -07:00
sabaimran	f65d157244	Release Khoj version 0.10.0	2023-07-28 19:27:47 -07:00
Debanjum Singh Solanky	f76af869f1	Do not log the gpt4all chat response stream in khoj backend Stream floods stdout and does not provide useful info to user	2023-07-28 19:14:04 -07:00
sabaimran	5ccb01343e	Add Offline chat to Obsidian (#359 ) * Add support for configuring/using offline chat from within Obsidian * Fix type checking for search type * If Github is not configured, /update call should fail * Fix regenerate tests same as the update ones * Update help text for offline chat in obsidian * Update relevant description for Khoj settings in Obsidian * Simplify configuration logic and use smarter defaults	2023-07-28 18:47:56 -07:00
Debanjum	b3c1507708	Merge pull request #361 from khoj-ai/configure-offline-chat-from-emacs - Configure using Offline Chat from Emacs: - Enable, Disable Offline Chat from Emacs - Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup - Benefits: Offline chat models are better for privacy but not great at answering questions	2023-07-28 18:06:58 -07:00
sabaimran	9f78db0579	Let Offline chat override OpenAI API settings (#362 ) * Let Offline chat override OpenAI API settings * Download the offline model whenever offline chat is enabled * Add progressbar for download for llamav2 model to track progress * Change ordering of n due to switch of default processor * Flip ordering of offline/openai checks when extracting questions from query	2023-07-28 17:26:20 -07:00
Debanjum Singh Solanky	ebfbef1f68	Configure using offline chat from Emacs Closes #358	2023-07-28 16:07:33 -07:00
sabaimran	29081f4429	Adjust parameters for offline chat	2023-07-27 22:22:09 -07:00
sabaimran	124d97c26d	Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352 ) * Working example with LlamaV2 running locally on my machine - Download from huggingface - Plug in to GPT4All - Update prompts to fit the llama format * Add appropriate prompts for extracting questions based on a query based on llama format * Rename Falcon to Llama and make some improvements to the extract_questions flow * Do further tuning to extract question prompts and unit tests * Disable extracting questions dynamically from Llama, as results are still unreliable	2023-07-27 20:51:20 -07:00
Debanjum Singh Solanky	715d56d4f0	Use new schema to update khoj.yml config from khoj.el	2023-07-26 17:34:16 -07:00
sabaimran	8b2af0b5ef	Add support for our first Local LLM 🤖🏠 (#330 ) * Add support for gpt4all's falcon model as an additional conversation processor - Update the UI pages to allow the user to point to the new endpoints for GPT - Update the internal schemas to support both GPT4 models and OpenAI - Add unit tests benchmarking some of the Falcon performance * Add exc_info to include stack trace in error logs for text processors * Pull shared functions into utils.py to be used across gpt4 and gpt * Add migration for new processor conversation schema * Skip GPT4All actor tests due to typing issues * Fix Obsidian processor configuration in auto-configure flow * Rename enable_local_llm to enable_offline_chat	2023-07-26 16:27:08 -07:00
sabaimran	23d77ee338	Fix import issues in desktop image builds (#343 )	2023-07-26 15:45:52 -07:00
Debanjum Singh Solanky	7722a9c347	Default to using the gpt-3.5-turbo model for chat from khoj.el	2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky	f0d4a4cf9a	Revert "Make configure_content functional. Do not pass content index state to it." This reverts commit `2ddee7e745` as it broke partial updates of the content index for just the specified content types	2023-07-21 13:59:09 -07:00
sabaimran	82c725817e	Merge branch 'master' of github.com:khoj-ai/khoj	2023-07-21 13:24:05 -07:00
sabaimran	596e11ec6d	Use the same function for computing entries for IDs regardless of whether it has prev entries	2023-07-21 13:23:56 -07:00
Debanjum Singh Solanky	2ddee7e745	Make configure_content functional. Do not pass content index state to it.	2023-07-20 23:24:08 -07:00
sabaimran	1610d2ebd9	📝 Add a documentation base for Khoj! (#333 ) * Add docs for more organized, accessible information detailing Khoj setup * Delete duplicated files * Add a coverpage without enabling it. Add logo and theme * Remove obsidian README.md * Add plausible script to index.html via docsify	2023-07-20 22:34:25 -07:00
Debanjum Singh Solanky	3e59be7f1d	Release Khoj version 0.9.0	2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky	d078e7b1f6	Clean up search type usage in khoj server, tests and Readme	2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky	4d910936b7	Fix triggering index update on khoj server from khoj.el	2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky	5c7d7f558d	Make AI model used for Khoj chat configurable from khoj.el - Fix bug. Set the unused model-name to a standad default value	2023-07-18 19:57:54 -07:00
Debanjum	5f2be2a9bb	Merge pull request #298 from HyunggyuJang/patch-1 Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config	2023-07-18 17:54:11 -07:00
Debanjum Singh Solanky	429e1b4b48	Regenerate index to apply corruption fixes on first run of new khoj	2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky	83e1088d42	Manage khoj.yml config migrations on app start. Version the schema - Add version to khoj.yml schema Versioning the khoj.yml config schema will simplify future migrations	2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky	71e8ddd9a2	Check if PDF is configured before showing it as an option in khoj.el	2023-07-17 15:49:20 -07:00
Debanjum	d00c5da8b7	Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing ## Stabilize and Simplify Content Indexing ### Major Updates - `9bcca43` Unify logic to update entries when indexing from scratch or incrementally - `89c7819` Unify logic to update embeddings when indexing from scratch or incrementally - `6a0297c` Stable sort new entries when marking entries for update - `58d86d7` Unify logic to configure server from API or on server start - Create tests to ensure old entries, embeddings in index are unaffected on adding new entries - Refer: `1482fd4`, `7669b85`, `88d1a29` - `ad41ef3` Make normalization of embeddings configurable to test this in `c73feeb` ### Minor Updates - `1673bb5` Add todo state to compiled form of each entry - `6e70b91` Remove unused `dump_jsonl` helper method - `7ad9603` Improve naming of lock - `b02323a` Improve naming text search test methods Resolves #190	2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky	3e3a1ecbc8	Start app even if server init fails to let user fix it Show stacktrace on error to help debugging	2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky	ef6a0044f4	Drop embeddings of deleted text entries from index Previously the deleted embeddings would continue to be in the index, even after the entry was deleted	2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky	ad41ef3991	Make normalizing embeddings configurable	2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky	89c7819cb7	Unify logic to generate embeddings from scratch and incrementally This simplifies the `compute_embeddings' method and avoids potential later divergence in handling the index regenerate vs update scenarios	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6a0297cc86	Stable sort new entries when marking entries for update	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6e70b914c2	Remove unused dump_jsonl method The entries index is stored ingzipped jsonl files for each content type	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	9bcca43299	Use single func to handle indexing from scratch and incrementally Previous regenerate mechanism did not deduplicate entries with same key So entries looked different between regenerate and update Having single func, mark_entries_for_update, to handle both scenarios will avoid this divergence Update all text_to_jsonl methods to use the above method for generating index from scratch	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	1673bb5558	Add todo state to compiled form of each org-mode entry	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	7ad96036b0	Improve lock name to config_lock instead of search_index_lock It is used to lock updates to all app config state, including processor	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	58d86d7876	Use single func to configure server via API and on server start Improve error messages on failure to configure server components	2023-07-16 01:45:53 -07:00
sabaimran	a15711e635	Fix null type checks in get /config	2023-07-15 15:53:56 -07:00
sabaimran	e590d75b20	Start Khoj even when config is not valid (#320 ) * Add icon to indicate bad config, start Khoj even if there was an issue setting up the index	2023-07-15 14:11:54 -07:00
sabaimran	49ab201c30	Fix issues importing PySide in Docker container (#322 ) * Rather than installing PyQT dependencies, remove codepaths that require pyqt files in no-gui mode	2023-07-15 13:33:13 -07:00
sabaimran	ba47f2ab39	Merge branch 'master' of github.com:debanjum/khoj	2023-07-14 22:28:05 -07:00
sabaimran	874cffd256	Add additional support for parsing notion workspaces	2023-07-14 22:27:56 -07:00
Debanjum	52f68167ce	Merge pull request #317 from khoj-ai/reduce-memory-consumption-by-search-model-duplication Reuse Search Models across Content Types to reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled text content type - Split model state into 2 separate state objects, `search_models` and `content_index`. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - The change should cut down memory utilization quite a bit for most users. I see a >50% drop in memory utilization on my Khoj instance. But this will vary for each user based on the amount of content indexed vs number of plugins enabled. - This change does not solve the RAM utilization scaling with size of the index, as the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 19:54:12 -07:00
Debanjum Singh Solanky	f08e9539f1	Release lock after updating index even if update fails to prevent deadlock Wrap acquire/release locks in try/catch/finally when updating content index and search models to prevent lock not being released on error and causing a deadlock	2023-07-14 16:57:27 -07:00
sabaimran	37f7f9fd1d	Add additional telemetry for system understanding (#316 ) * Add additional telemetry in order to understand which data sources are the most useful * Make actions side by side in the configuration page * Restore main run command * Update links to point to wiki pages for Github, Notion integrations * Stanardize nomenclature of the api_type to use _config suffix Remove header fields that aren't actually helpful for understanding config usage	2023-07-14 10:14:07 -07:00
Debanjum Singh Solanky	86e2bec9a0	Reuse Search Models across Content Types to Reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types as well. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled content type - Split model state into 2 separate state objects, `search_models' and `content_index'. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - This should cut down memory utilization quite a bit for most users. I see a ~50% drop in memory utilization. This will, of course, vary for each user based on the amount of content indexed vs number of plugins enabled - This does not solve the RAM utilization scaling with size of the index. As the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 01:27:22 -07:00
Debanjum	b2718d330c	Merge pull request #304 from migrate-from-pyqt-to-pyside Migrate from PyQT6 to PySide6	2023-07-13 11:54:47 -07:00
sabaimran	31e933207f	Set default values for sys.stdout if they're unavailable	2023-07-12 22:22:49 -07:00
Debanjum Singh Solanky	9c76150895	Migrate from PyQT6 to PySide6	2023-07-11 18:43:44 -07:00
HyunggyuJang	88c42b3043	Encode data as utf-8 otherwise it will complain, see `1c85531090`	2023-07-11 17:06:05 +09:00
Debanjum Singh Solanky	f664a74e77	Update Khoj server to run on non standard port, 42110 instead of 8000 Resolves #295	2023-07-10 21:27:58 -07:00
sabaimran	effb52f859	Fix demo rendering with the new header	2023-07-10 21:16:19 -07:00
sabaimran	55f5be7b03	Release Khoj version 0.8.2	2023-07-10 14:39:32 -07:00
sabaimran	9a63f89f33	Merge branch 'master' of github.com:debanjum/khoj	2023-07-10 14:31:19 -07:00
sabaimran	53809298c0	Release Khoj version 0.8.1	2023-07-10 14:30:04 -07:00
tjsousa	5b37e988e6	Allow using configured GPT chat model (#292 ) My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.	2023-07-10 14:24:40 -07:00
Debanjum Singh Solanky	75ff871217	Release Khoj version 0.8.0	2023-07-10 13:37:51 -07:00
Debanjum Singh Solanky	979088b3dc	Add tooltip helper text on web settings page buttons - Provide more details on what clicking configure, initialize buttons or changing the results count slider does - This shows up on user hovering over those buttons	2023-07-10 13:32:41 -07:00
Debanjum Singh Solanky	255781e135	Use relative link on logo to jump to correct page on local and cloud	2023-07-10 13:22:20 -07:00
Debanjum Singh Solanky	b2d229c116	Move header pane style to base khoj.css for reuse. Fix logo size	2023-07-10 13:10:17 -07:00
Debanjum Singh Solanky	20cb314171	Open the Khoj config page in the browser on first run	2023-07-10 12:10:20 -07:00
sabaimran	07cf5a214a	Check if PDF files are present in the Obsidian vault before initializing the Khoj configuration (#293 )	2023-07-10 10:33:04 -07:00
sabaimran	7364bac8ae	Make the header take up less space - Use a single row for the header - Needed custom styling for each page because each of them are different in subtle ways, unfortunately	2023-07-09 22:31:37 -07:00
sabaimran	62704cac09	Add a plugin which allows users to index their Notion pages (#284 ) * For the demo instance, re-instate the scheduler, but infrequently for api updates - In constants, determine the cadence based on whether it's a demo instance or not - This allow us to collect telemetry again. This will also allow us to save the chat session * Conditionally skip updating the index altogether if it's a demo isntance * Add backend support for Notion data parsing - Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token - Make corresponding updates to the default config, raw config to support the new notion addition * Add corresponding views to support configuring Notion from the web-based settings page - Support backend APIs for deleting/configuring notion setup as well - Streamline some of the index updating code * Use defaults for search and chat queries results count * Update pagination of retrieving pages from Notion * Update state conversation processor when update is hit * frequency_penalty should be passed to gpt through kwargs * Add check for notion in render_multiple method * Add headings to Notion render * Revert results count slider and split Notion files by blocks * Clean/fix misc things in the function to update index - Use the successText and errorText variables appropriately - Name parameters in function calls - Add emojis, woohoo * Clean up and further modularize code for processing data in Notion	2023-07-09 15:29:26 -07:00
Debanjum	77755c0284	Fix Packaging the Khoj Desktop Apps (#289 ) * Add langchain static files and pytorch metadata to Khoj native app * Add pillow static files, metadata & hidden imports to Khoj native app * Fix path to web interface static files on Khoj native app * Add tiktoken hidden imports to make chat work from Khoj native app * Fix Khoj native app to run with GUI mode enabled This got broken when we moved from using the --no-gui flag to using --gui in https://github.com/khoj-ai/khoj/pull/263	2023-07-09 10:21:16 -07:00
sabaimran	4c135ea316	Make streaming optional for the /chat endpoint (#287 ) * Update the /chat endpoint to conditionally support streaming - If streams are enabled, return the threadgenerator as it does currently - If stream is disabled, return a JSON response with the response/compiled references separated out - Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian - Rename chat/init/ to chat/history * Update khoj.el to use the /history endpoint - Update corresponding unit tests to use stream=true * Remove & from call to /chat for obsidian * Abstract functions out into a helpers.py file and clean up some of the error-catching	2023-07-09 10:12:09 -07:00
Debanjum Singh Solanky	0a86220d42	Use default values, delete content config on disable and update state	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	362063f5fe	By default, connect to Khoj server over IPv4 from Obsidian plugin	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	571e8c2548	Add rerank, index corruption hint on search page of web interface Similar to the hint alrady in the Obsidian search modal Closes #272	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	61e131f95c	Hide unused model field from chat settings on web interface	2023-07-07 18:43:53 -07:00
Debanjum Singh Solanky	af30d01e85	Move to newer chat models to extract questions & summarize chats Deprecate usage of the older gpt3 models in-place of the newer chat based models - text-davinci-003 is only 50% cheaper than gpt4 and less reliable for question extraction - Using gpt-3.50turbo for summarization should reduce cost of chat - Keep conversation.chat_session as a list instead of a string - Update completion_with_backoff func to use ChatML format	2023-07-07 17:32:27 -07:00
Debanjum Singh Solanky	171ce19e1f	Update date filter to allow quoting values in single quotes	2023-07-07 17:13:47 -07:00
Debanjum Singh Solanky	e588f7c528	Deprecate unused beta search and answer API endpoints	2023-07-07 16:38:07 -07:00
Debanjum Singh Solanky	c9fc4d1296	Revert to using cross-encoder to improve search results used by chat	2023-07-07 15:31:34 -07:00
Debanjum Singh Solanky	11f0a9f196	Fix chat tests since streaming. Pass args correctly to chat methods - Fix testing gpt converse method after it started streaming responses - Pass stop in model_kwargs dictionary and api key in openai_api_key parameter to chat completion methods. This should resolve the arg warning thrown by OpenAI module	2023-07-07 15:23:44 -07:00
Debanjum Singh Solanky	48870d9170	Fix parsing questions generated by extract_questions actor into list The previous json parsing was failing to handle questions with date filters Fix the chat actor tests to run without throwing error with freezegun complaining about importing transformers.local_llama model Remove quote escapes from date filter examples provided to extract_questions actor	2023-07-07 15:18:55 -07:00
Debanjum Singh Solanky	279662620b	Move results count to settings page on web. Use it for search & chat - Before Only the search interface had the results count configuration option - After - The results count is set on the settings page instead of the search page - Both search and chat can use the configured results count instead of just search	2023-07-07 14:08:08 -07:00
Debanjum Singh Solanky	2ec8da89e8	Remove Update button from Khoj Search page on the Web interface The settings page on the Khoj web interface already has a configure button. Don't need the Update button on the search page as well	2023-07-07 12:49:58 -07:00
Debanjum Singh Solanky	bf427cd8dd	Set no. of results used to generate chat response from Khoj Emacs	2023-07-07 12:34:50 -07:00
Debanjum Singh Solanky	1d77fe712c	Set no. of results used to generate chat response from Khoj Obsidian	2023-07-07 12:32:32 -07:00
Debanjum Singh Solanky	2f31de5ed5	Set no. of references to use for chat configurable in Chat API	2023-07-07 12:29:36 -07:00
Debanjum Singh Solanky	d97682fdac	Use tooltip, placeholders to guide Khoj setup via web settings page	2023-07-06 21:37:48 -07:00
Debanjum Singh Solanky	f5cf09424b	Use more descriptive field names for content type settings on Khoj web Resolves #281	2023-07-06 20:47:39 -07:00
Debanjum Singh Solanky	a2c668268f	Use node-fetch >=3.1.0 in khoj obsidian plugin to avoid security vulnerability	2023-07-06 13:05:52 -07:00

1 2 3 4 5 ...

1085 commits