* Working example with LlamaV2 running locally on my machine
- Download from huggingface
- Plug in to GPT4All
- Update prompts to fit the llama format
* Add appropriate prompts for extracting questions based on a query based on llama format
* Rename Falcon to Llama and make some improvements to the extract_questions flow
* Do further tuning to extract question prompts and unit tests
* Disable extracting questions dynamically from Llama, as results are still unreliable
OpenAI conversation processor schema had updated but conftest hadn't
been updated to reflect the same.
Update conftest setup of conversation processor to fix this
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
* Add docs for more organized, accessible information detailing Khoj setup
* Delete duplicated files
* Add a coverpage without enabling it. Add logo and theme
* Remove obsidian README.md
* Add plausible script to index.html via docsify
Create Schema Migrator and Reindex to Apply Index Corruption Fixes
- 83e1088 Manage `khoj.yml' config migrations on app start. Version the `khoj.yml' schema
- 429e1b4 Regenerate index to apply corruption fixes on first run of this khoj version
Otherwise users would need to manually re-index their contents with khoj
## Stabilize and Simplify Content Indexing
### Major Updates
- 9bcca43 Unify logic to update entries when indexing from scratch or incrementally
- 89c7819 Unify logic to update embeddings when indexing from scratch or incrementally
- 6a0297c Stable sort new entries when marking entries for update
- 58d86d7 Unify logic to configure server from API or on server start
- Create tests to ensure old entries, embeddings in index are unaffected on adding new entries
- Refer: 1482fd4, 7669b85, 88d1a29
- ad41ef3 Make normalization of embeddings configurable to test this in c73feeb
### Minor Updates
- 1673bb5 Add todo state to compiled form of each entry
- 6e70b91 Remove unused `dump_jsonl` helper method
- 7ad9603 Improve naming of lock
- b02323a Improve naming text search test methods
Resolves#190
Ensure order of new embedding insertion on incremental update
does not affect the order and value of existing embeddings when
normalization is turned off
Asymmetric was older name used to differentiate between symmetric,
asymmetric search.
Now that text search just uses asymmetric search stick to simpler name
Previous regenerate mechanism did not deduplicate entries with same key
So entries looked different between regenerate and update
Having single func, mark_entries_for_update, to handle both scenarios
will avoid this divergence
Update all text_to_jsonl methods to use the above method for
generating index from scratch
- Current incorrect behavior:
All entries with duplicate compiled form are kept on regenerate
but on update only the last of the duplicated entries is kept
This divergent behavior is not ideal to prevent index corruption
across reconfigure and update