Commit graph

1635 commits

Author SHA1 Message Date
Debanjum Singh Solanky
1a9023d396 Update Chat Actor test to not incept with prior world knowledge 2023-10-15 17:22:44 -07:00
Debanjum Singh Solanky
df1d74a879 Use max_prompt_size, tokenizer from config for chat model context stuffing 2023-10-15 16:52:53 -07:00
Debanjum Singh Solanky
116595b351 Use chat_model specified in new offline_chat section of config
- Dedupe offline_chat_model variable. Only reference offline chat
  model stored under offline_chat. Delete the previous chat_model
  field under GPT4AllProcessorConfig

- Set offline chat model to use via config/offline_chat API endpoint
2023-10-15 16:37:49 -07:00
Debanjum Singh Solanky
feb4f17e3d Update chat config schema. Make max_prompt, chat tokenizer configurable
This provides flexibility to use non 1st party supported chat models

- Create migration script to update khoj.yml config
  - Put `enable_offline_chat' under new `offline-chat' section
    Referring code needs to be updated to accomodate this change
  - Move `offline_chat_model' to `chat-model' under new `offline-chat' section
  - Put chat `tokenizer` under new `offline-chat' section
  - Put `max_prompt' under existing `conversation' section
    As `max_prompt' size effects both openai and offline chat models
2023-10-15 16:35:11 -07:00
Debanjum Singh Solanky
247e75595c Use AutoTokenizer to support more tokenizers 2023-10-14 16:54:52 -07:00
Debanjum Singh Solanky
1ad8b150e8 Add default tokenizer, max_prompt as fallback for non-default offline chat models
Pass user configured chat model as argument to use by converse_offline

The proper fix for this would allow users to configure the max_prompt
and tokenizer to use (while supplying default ones, if none provided)
For now, this is a reasonable start.
2023-10-13 22:48:56 -07:00
Debanjum Singh Solanky
56bd69d5af Improve Llama v2 extract questions actor and associated prompt
- Format extract questions prompt format with newlines and whitespaces
- Make llama v2 extract questions prompt consistent

- Remove empty questions extracted by offline extract_questions actor
- Update implicit qs extraction unit test for offline search actor
2023-10-13 22:48:56 -07:00
Debanjum Singh Solanky
a85ff941ca Make offline chat model user configurable
Only GPT4All supported Llama v2 models will work given the prompt
structure is not currently configurable
2023-10-04 20:41:14 -07:00
Debanjum Singh Solanky
d1ff812021 Run GPT4All Chat Model on GPU, when available
GPT4All now supports running models on GPU via Vulkan
2023-10-04 18:42:12 -07:00
Debanjum Singh Solanky
13b16a4364 Use default Llama 2 supported by GPT4All
Remove custom logic to download custom Llama 2 model.
This was added as GPT4All didn't support Llama 2 when it was added to Khoj
2023-10-03 19:01:54 -07:00
sabaimran
4a5ed7f06c
Update Khoj package version for Electron, Desktop app (#492)
* Address package upgrade for Electron application
* Update package version for Electron desktop application
2023-10-03 12:21:32 -07:00
sabaimran
3f962a55c3
Fix Linux Desktop Application (#491)
* Use separate functions for adding files and folders to configuration for indexing
* Add a loading bar while data is syncing
* Bump the minor version for the application
2023-10-03 11:43:19 -07:00
sabaimran
63b3696af0 Release Khoj version 0.12.3 2023-09-26 22:41:11 -07:00
sabaimran
d2f9bca1cf Fix null ref issue in query method and update logic for determining whether khoj is already configured in obsidian 2023-09-26 22:33:44 -07:00
sabaimran
2f18383349 Release Khoj version 0.12.2 2023-09-26 11:59:47 -07:00
sabaimran
588f35b6e9 Add max prompt size for gpt-3.5-turbo-16k 2023-09-26 10:57:35 -07:00
sabaimran
99f9c3f8e2 Update setup instructions 2023-09-26 09:40:36 -07:00
sabaimran
4e370d7a18 Release Khoj version 0.12.1 2023-09-26 09:24:53 -07:00
sabaimran
3675aa348a Update naming of Khoj in manifest.json for Obsidian 2023-09-26 09:24:36 -07:00
sabaimran
4b6d8af218 Update metadata in manifest.json 2023-09-26 09:19:56 -07:00
sabaimran
a82d1becc3 Release Khoj version 0.12.0 2023-09-26 09:17:56 -07:00
sabaimran
38f0df3d53 Remove unused icons from electron app folder 2023-09-26 07:56:29 -07:00
sabaimran
29a64be939 Deprecate desktop build instructions from old setup 2023-09-25 22:02:02 -07:00
sabaimran
99995b2497 Add basic instructions for setting up the Khoj desktop interface 2023-09-25 21:08:14 -07:00
sabaimran
5e16074b92 Fix comparison for search type in plugins mode 2023-09-25 10:57:17 -07:00
sabaimran
efe5e09c3a Use jammy for docker base image due to dependency issue with arm64 image 2023-09-18 15:38:18 -07:00
sabaimran
6df728c445 Move bash command in Dockerfile into single line 2023-09-18 15:13:11 -07:00
sabaimran
96a9fa07f0 Fix conf test setup for offline chat 2023-09-18 15:05:15 -07:00
sabaimran
2dd15e9f63
Resolve issues with GPT4All and fix prompt for yesterday extract questions date filter (#483)
- GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all
- Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example.
- Update all setup data in conftest.py to use new client-server indexing pattern
2023-09-18 14:41:26 -07:00
sabaimran
8141be97f6 Update date filter test to use compiled rather than raw key 2023-09-18 11:24:56 -07:00
sabaimran
b225d1188c Fix formatting of gpt.py 2023-09-18 11:09:02 -07:00
Jonny-GM
34b202b868
More lenient date searching (#481)
* Modify DateFilter to use compiled entry key
* Instruct search to include date in query
* Minor prompt change
* Prompt fix
2023-09-18 10:46:00 -07:00
sabaimran
16874e1953 Provide force fallback for regeneration 2023-09-12 16:35:07 -07:00
sabaimran
9f42a1a036 Propagate flags to configure index command 2023-09-11 10:33:44 -07:00
sabaimran
343854752c
Improve docker builds for local hosting (#476)
* Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions
* Move configure_search method into indexer
* Add conditional installation for gpt4all
* Add hint to go to localhost:42110 in the docs. Addresses #477
2023-09-08 17:07:26 -07:00
sabaimran
dccfae3853
Remove PySide dependency and deprecate desktop builds (#475)
* Remove PySide, gui option from code
* Remove pyside 6 dependency from code
* Remove workflows which build desktop applications
* Update unit tests and update line in documentation
* Remove additional references to pyinstaller, gui
* Add uninstall steps to normal uninstall instructions
2023-09-07 11:36:27 -07:00
sabaimran
76562f4250
Add front-end Electron application for Khoj local file syncing (#473)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes
2023-09-06 12:04:18 -07:00
bholagabbar
205dc90746
Fix notion title bug (#474)
* Update notion_to_jsonl.py
* Fix try-catch block
2023-09-05 10:47:42 -07:00
sabaimran
922222a813 Fix anyio package version to avoid backwards compatibility issue with start_blocking_portal method 2023-08-31 14:14:13 -07:00
sabaimran
4854258047
Move to a push-first model for retrieving embeddings from local files (#457)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Update unit tests to fix with new application design
* Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request
* Use state.host and state.port for configuring the URL for the indexer
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
2023-08-31 12:55:17 -07:00
sabaimran
92cbfef7ab Skip plaintext file indexing if there's a parsing issue and log the file 2023-08-29 14:34:08 -07:00
sabaimran
74409c2c64 Release Khoj version 0.11.4 2023-08-29 11:44:35 -07:00
sabaimran
1b85958bcc trim chat input start 2023-08-28 19:18:10 -07:00
sabaimran
e592f6eac8 Release Khoj version 0.11.3 2023-08-28 14:46:03 -07:00
sabaimran
7c35da9fc4 Fix bug in /chat endpoint for general and update depdendencies 2023-08-28 14:12:11 -07:00
Debanjum Singh Solanky
c93dcc948a Exclude tests data file from programming stats on Github
Git tag tests/data files with the linguist-vendored attribute to
prevent github from including them in stats.

Otherwise Khoj is getting marked as an HTML project due to the
tardigrades html page in tests data, when it's primarily a python
project currently
2023-08-28 11:00:52 -07:00
Debanjum Singh Solanky
59ffd1dc94 Document slash command and query filter in docs for chat and search 2023-08-28 11:00:52 -07:00
sabaimran
bc09143856 Release Khoj version 0.11.2 2023-08-28 10:16:13 -07:00
Debanjum
bc5e60defb
Filter knowledge base used by chat to respond (#469)
- Overview
  - Allow applying word, file or date filters on your knowledge base from the chat interface
  - This will limit the portion of the knowledge base Khoj chat can use to respond to your query
2023-08-28 09:32:33 -07:00
Debanjum Singh Solanky
01b310635e Enable passing search query filters via chat and test it 2023-08-28 09:24:32 -07:00