sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-03 20:33:00 +01:00

Author	SHA1	Message	Date
sabaimran	8b2af0b5ef	Add support for our first Local LLM 🤖🏠 (#330 ) * Add support for gpt4all's falcon model as an additional conversation processor - Update the UI pages to allow the user to point to the new endpoints for GPT - Update the internal schemas to support both GPT4 models and OpenAI - Add unit tests benchmarking some of the Falcon performance * Add exc_info to include stack trace in error logs for text processors * Pull shared functions into utils.py to be used across gpt4 and gpt * Add migration for new processor conversation schema * Skip GPT4All actor tests due to typing issues * Fix Obsidian processor configuration in auto-configure flow * Rename enable_local_llm to enable_offline_chat	2023-07-26 16:27:08 -07:00
sabaimran	23d77ee338	Fix import issues in desktop image builds (#343 )	2023-07-26 15:45:52 -07:00
Justin Bassett-Green	8dcc21052f	Add chat-model param in sample config yml and document (#341 ) * add chat-model config param to docs * add chat-model param to sample config yml	2023-07-22 16:53:08 -07:00
Debanjum Singh Solanky	5bb42e56a8	Fix formatting of khoj test config and unused references in conftests	2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky	7722a9c347	Default to using the gpt-3.5-turbo model for chat from khoj.el	2023-07-22 00:29:26 -07:00
Saba	36d25c4f1d	Center the title, add table headers	2023-07-21 23:36:38 -07:00
Saba	01b6a10cd1	Simplify readme	2023-07-21 23:30:44 -07:00
sabaimran	4ce072c4b3	Make the README on our Github minimal (#334 ) * Make the README on our Github minimal * Add a bit of formatting and more background	2023-07-21 23:29:04 -07:00
Debanjum Singh Solanky	4089e38283	Fix links to demos and screenshots in docs	2023-07-21 20:01:19 -07:00
Debanjum Singh Solanky	89ad362758	Update Screenshots and Demos in Docs	2023-07-21 15:22:35 -07:00
Debanjum Singh Solanky	f0d4a4cf9a	Revert "Make configure_content functional. Do not pass content index state to it." This reverts commit `2ddee7e745` as it broke partial updates of the content index for just the specified content types	2023-07-21 13:59:09 -07:00
sabaimran	82c725817e	Merge branch 'master' of github.com:khoj-ai/khoj	2023-07-21 13:24:05 -07:00
sabaimran	596e11ec6d	Use the same function for computing entries for IDs regardless of whether it has prev entries	2023-07-21 13:23:56 -07:00
Saba	634f0b4cc4	Fix docs indexing issue	2023-07-21 08:30:00 -07:00
Debanjum Singh Solanky	c28755ccd2	Fix diff blocks, links, remove footnotes & rearrange sections in docs Extract performance into separate sectin into shoving it under search Create page for web interface	2023-07-21 00:58:30 -07:00
Debanjum Singh Solanky	2ddee7e745	Make configure_content functional. Do not pass content index state to it.	2023-07-20 23:24:08 -07:00
Debanjum	e92bc0e2e6	Create CNAME to make Docs accessible at docs.khoj.dev	2023-07-20 23:24:08 -07:00
sabaimran	1610d2ebd9	📝 Add a documentation base for Khoj! (#333 ) * Add docs for more organized, accessible information detailing Khoj setup * Delete duplicated files * Add a coverpage without enabling it. Add logo and theme * Remove obsidian README.md * Add plausible script to index.html via docsify	2023-07-20 22:34:25 -07:00
Debanjum Singh Solanky	3e59be7f1d	Release Khoj version 0.9.0	2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky	d078e7b1f6	Clean up search type usage in khoj server, tests and Readme	2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky	4d910936b7	Fix triggering index update on khoj server from khoj.el	2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky	5c7d7f558d	Make AI model used for Khoj chat configurable from khoj.el - Fix bug. Set the unused model-name to a standad default value	2023-07-18 19:57:54 -07:00
Debanjum	5f2be2a9bb	Merge pull request #298 from HyunggyuJang/patch-1 Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config	2023-07-18 17:54:11 -07:00
Debanjum	3a1c5a6dab	Merge pull request #329 from khoj-ai/create-schema-migration-func-and-reindex-to-fix-corruption Create Schema Migrator and Reindex to Apply Index Corruption Fixes - `83e1088` Manage `khoj.yml' config migrations on app start. Version the `khoj.yml' schema - `429e1b4` Regenerate index to apply corruption fixes on first run of this khoj version Otherwise users would need to manually re-index their contents with khoj	2023-07-18 16:43:17 -07:00
Debanjum Singh Solanky	429e1b4b48	Regenerate index to apply corruption fixes on first run of new khoj	2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky	83e1088d42	Manage khoj.yml config migrations on app start. Version the schema - Add version to khoj.yml schema Versioning the khoj.yml config schema will simplify future migrations	2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky	71e8ddd9a2	Check if PDF is configured before showing it as an option in khoj.el	2023-07-17 15:49:20 -07:00
Debanjum	d00c5da8b7	Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing ## Stabilize and Simplify Content Indexing ### Major Updates - `9bcca43` Unify logic to update entries when indexing from scratch or incrementally - `89c7819` Unify logic to update embeddings when indexing from scratch or incrementally - `6a0297c` Stable sort new entries when marking entries for update - `58d86d7` Unify logic to configure server from API or on server start - Create tests to ensure old entries, embeddings in index are unaffected on adding new entries - Refer: `1482fd4`, `7669b85`, `88d1a29` - `ad41ef3` Make normalization of embeddings configurable to test this in `c73feeb` ### Minor Updates - `1673bb5` Add todo state to compiled form of each entry - `6e70b91` Remove unused `dump_jsonl` helper method - `7ad9603` Improve naming of lock - `b02323a` Improve naming text search test methods Resolves #190	2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky	3e3a1ecbc8	Start app even if server init fails to let user fix it Show stacktrace on error to help debugging	2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky	ef6a0044f4	Drop embeddings of deleted text entries from index Previously the deleted embeddings would continue to be in the index, even after the entry was deleted	2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky	c73feebf25	Test index embeddings are stable on incremental update & no norm Ensure order of new embedding insertion on incremental update does not affect the order and value of existing embeddings when normalization is turned off	2023-07-16 02:22:28 -07:00
Debanjum Singh Solanky	ad41ef3991	Make normalizing embeddings configurable	2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky	1482fd4d4d	Test index is stable sorted on incremental update with new entry Ensure order of new embedding, entry insertion on incremental update is stable	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	b02323ade6	Improve name of text search test functions Asymmetric was older name used to differentiate between symmetric, asymmetric search. Now that text search just uses asymmetric search stick to simpler name	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	89c7819cb7	Unify logic to generate embeddings from scratch and incrementally This simplifies the `compute_embeddings' method and avoids potential later divergence in handling the index regenerate vs update scenarios	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6a0297cc86	Stable sort new entries when marking entries for update	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	7669b85da6	Test index is stable sorted on regenerate with new entry	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6e70b914c2	Remove unused dump_jsonl method The entries index is stored ingzipped jsonl files for each content type	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	9bcca43299	Use single func to handle indexing from scratch and incrementally Previous regenerate mechanism did not deduplicate entries with same key So entries looked different between regenerate and update Having single func, mark_entries_for_update, to handle both scenarios will avoid this divergence Update all text_to_jsonl methods to use the above method for generating index from scratch	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	1673bb5558	Add todo state to compiled form of each org-mode entry	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	88d1a29a84	Test index is stable for duplicate entries across regenerate, update - Current incorrect behavior: All entries with duplicate compiled form are kept on regenerate but on update only the last of the duplicated entries is kept This divergent behavior is not ideal to prevent index corruption across reconfigure and update	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	da98b92dd4	Create helper function to test value, order of entries & embeddings This helper should be used to observe if the current embeddings are stable sorted on regenerate and incremental update of index in text search tests	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	7ad96036b0	Improve lock name to config_lock instead of search_index_lock It is used to lock updates to all app config state, including processor	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	58d86d7876	Use single func to configure server via API and on server start Improve error messages on failure to configure server components	2023-07-16 01:45:53 -07:00
sabaimran	a15711e635	Fix null type checks in get /config	2023-07-15 15:53:56 -07:00
sabaimran	e590d75b20	Start Khoj even when config is not valid (#320 ) * Add icon to indicate bad config, start Khoj even if there was an issue setting up the index	2023-07-15 14:11:54 -07:00
sabaimran	49ab201c30	Fix issues importing PySide in Docker container (#322 ) * Rather than installing PyQT dependencies, remove codepaths that require pyqt files in no-gui mode	2023-07-15 13:33:13 -07:00
sabaimran	ba47f2ab39	Merge branch 'master' of github.com:debanjum/khoj	2023-07-14 22:28:05 -07:00
sabaimran	874cffd256	Add additional support for parsing notion workspaces	2023-07-14 22:27:56 -07:00
Debanjum	52f68167ce	Merge pull request #317 from khoj-ai/reduce-memory-consumption-by-search-model-duplication Reuse Search Models across Content Types to reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled text content type - Split model state into 2 separate state objects, `search_models` and `content_index`. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - The change should cut down memory utilization quite a bit for most users. I see a >50% drop in memory utilization on my Khoj instance. But this will vary for each user based on the amount of content indexed vs number of plugins enabled. - This change does not solve the RAM utilization scaling with size of the index, as the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 19:54:12 -07:00

... 2 3 4 5 6 ...

1618 commits