Commit graph

2870 commits

Author SHA1 Message Date
Debanjum Singh Solanky
da3f4dc7e4 Fix test config to run OpenAI Chat Actor, Director tests
OpenAI conversation processor schema had updated but conftest hadn't
been updated to reflect the same.

Update conftest setup of conversation processor to fix this
2023-07-27 11:30:04 -07:00
Debanjum Singh Solanky
715d56d4f0 Use new schema to update khoj.yml config from khoj.el 2023-07-26 17:34:16 -07:00
sabaimran
8b2af0b5ef
Add support for our first Local LLM 🤖🏠 (#330)
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
2023-07-26 16:27:08 -07:00
sabaimran
23d77ee338
Fix import issues in desktop image builds (#343) 2023-07-26 15:45:52 -07:00
Justin Bassett-Green
8dcc21052f
Add chat-model param in sample config yml and document (#341)
* add chat-model config param to docs

* add chat-model param to sample config yml
2023-07-22 16:53:08 -07:00
Debanjum Singh Solanky
5bb42e56a8 Fix formatting of khoj test config and unused references in conftests 2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky
7722a9c347 Default to using the gpt-3.5-turbo model for chat from khoj.el 2023-07-22 00:29:26 -07:00
Saba
36d25c4f1d Center the title, add table headers 2023-07-21 23:36:38 -07:00
Saba
01b6a10cd1 Simplify readme 2023-07-21 23:30:44 -07:00
sabaimran
4ce072c4b3
Make the README on our Github minimal (#334)
* Make the README on our Github minimal
* Add a bit of formatting and more background
2023-07-21 23:29:04 -07:00
Debanjum Singh Solanky
4089e38283 Fix links to demos and screenshots in docs 2023-07-21 20:01:19 -07:00
Debanjum Singh Solanky
89ad362758 Update Screenshots and Demos in Docs 2023-07-21 15:22:35 -07:00
Debanjum Singh Solanky
f0d4a4cf9a Revert "Make configure_content functional. Do not pass content index state to it."
This reverts commit 2ddee7e745 as it
broke partial updates of the content index for just the specified
content types
2023-07-21 13:59:09 -07:00
sabaimran
82c725817e Merge branch 'master' of github.com:khoj-ai/khoj 2023-07-21 13:24:05 -07:00
sabaimran
596e11ec6d Use the same function for computing entries for IDs regardless of whether it has prev entries 2023-07-21 13:23:56 -07:00
Saba
634f0b4cc4 Fix docs indexing issue 2023-07-21 08:30:00 -07:00
Debanjum Singh Solanky
c28755ccd2 Fix diff blocks, links, remove footnotes & rearrange sections in docs
Extract performance into separate sectin into shoving it under search
Create page for web interface
2023-07-21 00:58:30 -07:00
Debanjum Singh Solanky
2ddee7e745 Make configure_content functional. Do not pass content index state to it. 2023-07-20 23:24:08 -07:00
Debanjum
e92bc0e2e6 Create CNAME to make Docs accessible at docs.khoj.dev 2023-07-20 23:24:08 -07:00
sabaimran
1610d2ebd9
📝 Add a documentation base for Khoj! (#333)
* Add docs for more organized, accessible information detailing Khoj setup
* Delete duplicated files
* Add a coverpage without enabling it. Add logo and theme
* Remove obsidian README.md
* Add plausible script to index.html via docsify
2023-07-20 22:34:25 -07:00
Debanjum Singh Solanky
3e59be7f1d Release Khoj version 0.9.0 2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky
d078e7b1f6 Clean up search type usage in khoj server, tests and Readme 2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky
4d910936b7 Fix triggering index update on khoj server from khoj.el 2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky
5c7d7f558d Make AI model used for Khoj chat configurable from khoj.el
- Fix bug. Set the unused model-name to a standad default value
2023-07-18 19:57:54 -07:00
Debanjum
5f2be2a9bb
Merge pull request #298 from HyunggyuJang/patch-1
Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config
2023-07-18 17:54:11 -07:00
Debanjum
3a1c5a6dab
Merge pull request #329 from khoj-ai/create-schema-migration-func-and-reindex-to-fix-corruption
Create Schema Migrator and Reindex to Apply Index Corruption Fixes

- 83e1088 Manage `khoj.yml' config migrations on app start. Version the `khoj.yml' schema
- 429e1b4 Regenerate index to apply corruption fixes on first run of this khoj version
   Otherwise users would need to manually re-index their contents with khoj
2023-07-18 16:43:17 -07:00
Debanjum Singh Solanky
429e1b4b48 Regenerate index to apply corruption fixes on first run of new khoj 2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky
83e1088d42 Manage khoj.yml config migrations on app start. Version the schema
- Add version to khoj.yml schema
  Versioning the khoj.yml config schema will simplify future migrations
2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky
71e8ddd9a2 Check if PDF is configured before showing it as an option in khoj.el 2023-07-17 15:49:20 -07:00
Debanjum
d00c5da8b7
Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing
## Stabilize and Simplify Content Indexing

### Major Updates
- 9bcca43 Unify logic to update entries when indexing from scratch or incrementally
- 89c7819 Unify logic to update embeddings when indexing from scratch or incrementally
- 6a0297c Stable sort new entries when marking entries for update
- 58d86d7 Unify logic to configure server from API or on server start
- Create tests to ensure old entries, embeddings in index are unaffected on adding new entries
  - Refer: 1482fd4, 7669b85, 88d1a29 
  - ad41ef3 Make normalization of embeddings configurable to test this in c73feeb

### Minor Updates
- 1673bb5 Add todo state to compiled form of each entry
- 6e70b91 Remove unused `dump_jsonl` helper method 
- 7ad9603 Improve naming of lock
- b02323a Improve naming text search test methods

Resolves #190
2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky
3e3a1ecbc8 Start app even if server init fails to let user fix it
Show stacktrace on error to help debugging
2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky
ef6a0044f4 Drop embeddings of deleted text entries from index
Previously the deleted embeddings would continue to be in the index,
even after the entry was deleted
2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky
c73feebf25 Test index embeddings are stable on incremental update & no norm
Ensure order of new embedding insertion on incremental update
does not affect the order and value of existing embeddings when
normalization is turned off
2023-07-16 02:22:28 -07:00
Debanjum Singh Solanky
ad41ef3991 Make normalizing embeddings configurable 2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky
1482fd4d4d Test index is stable sorted on incremental update with new entry
Ensure order of new embedding, entry insertion on incremental update
is stable
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
b02323ade6 Improve name of text search test functions
Asymmetric was older name used to differentiate between symmetric,
asymmetric search.

Now that text search just uses asymmetric search stick to simpler name
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
89c7819cb7 Unify logic to generate embeddings from scratch and incrementally
This simplifies the `compute_embeddings' method and avoids potential
later divergence in handling the index regenerate vs update scenarios
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
6a0297cc86 Stable sort new entries when marking entries for update 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
7669b85da6 Test index is stable sorted on regenerate with new entry 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
6e70b914c2 Remove unused dump_jsonl method
The entries index is stored ingzipped jsonl files for each content type
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
9bcca43299 Use single func to handle indexing from scratch and incrementally
Previous regenerate mechanism did not deduplicate entries with same key
So entries looked different between regenerate and update
Having single func, mark_entries_for_update, to handle both scenarios
will avoid this divergence

Update all text_to_jsonl methods to use the above method for
generating index from scratch
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
1673bb5558 Add todo state to compiled form of each org-mode entry 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
88d1a29a84 Test index is stable for duplicate entries across regenerate, update
- Current incorrect behavior:
  All entries with duplicate compiled form are kept on regenerate
  but on update only the last of the duplicated entries is kept

This divergent behavior is not ideal to prevent index corruption
across reconfigure and update
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
da98b92dd4 Create helper function to test value, order of entries & embeddings
This helper should be used to observe if the current embeddings are
stable sorted on regenerate and incremental update of index in text
search tests
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
7ad96036b0 Improve lock name to config_lock instead of search_index_lock
It is used to lock updates to all app config state, including processor
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
58d86d7876 Use single func to configure server via API and on server start
Improve error messages on failure to configure server components
2023-07-16 01:45:53 -07:00
sabaimran
a15711e635 Fix null type checks in get /config 2023-07-15 15:53:56 -07:00
sabaimran
e590d75b20
Start Khoj even when config is not valid (#320)
* Add icon to indicate bad config, start Khoj even if there was an issue setting up the index
2023-07-15 14:11:54 -07:00
sabaimran
49ab201c30
Fix issues importing PySide in Docker container (#322)
* Rather than installing PyQT dependencies, remove codepaths that require pyqt files in no-gui mode
2023-07-15 13:33:13 -07:00
sabaimran
ba47f2ab39 Merge branch 'master' of github.com:debanjum/khoj 2023-07-14 22:28:05 -07:00