Commit graph

1414 commits

Author SHA1 Message Date
Debanjum Singh Solanky
d1ff812021 Run GPT4All Chat Model on GPU, when available
GPT4All now supports running models on GPU via Vulkan
2023-10-04 18:42:12 -07:00
Debanjum Singh Solanky
13b16a4364 Use default Llama 2 supported by GPT4All
Remove custom logic to download custom Llama 2 model.
This was added as GPT4All didn't support Llama 2 when it was added to Khoj
2023-10-03 19:01:54 -07:00
sabaimran
4a5ed7f06c
Update Khoj package version for Electron, Desktop app (#492)
* Address package upgrade for Electron application
* Update package version for Electron desktop application
2023-10-03 12:21:32 -07:00
sabaimran
3f962a55c3
Fix Linux Desktop Application (#491)
* Use separate functions for adding files and folders to configuration for indexing
* Add a loading bar while data is syncing
* Bump the minor version for the application
2023-10-03 11:43:19 -07:00
sabaimran
63b3696af0 Release Khoj version 0.12.3 2023-09-26 22:41:11 -07:00
sabaimran
d2f9bca1cf Fix null ref issue in query method and update logic for determining whether khoj is already configured in obsidian 2023-09-26 22:33:44 -07:00
sabaimran
2f18383349 Release Khoj version 0.12.2 2023-09-26 11:59:47 -07:00
sabaimran
588f35b6e9 Add max prompt size for gpt-3.5-turbo-16k 2023-09-26 10:57:35 -07:00
sabaimran
4e370d7a18 Release Khoj version 0.12.1 2023-09-26 09:24:53 -07:00
sabaimran
3675aa348a Update naming of Khoj in manifest.json for Obsidian 2023-09-26 09:24:36 -07:00
sabaimran
a82d1becc3 Release Khoj version 0.12.0 2023-09-26 09:17:56 -07:00
sabaimran
38f0df3d53 Remove unused icons from electron app folder 2023-09-26 07:56:29 -07:00
sabaimran
5e16074b92 Fix comparison for search type in plugins mode 2023-09-25 10:57:17 -07:00
sabaimran
2dd15e9f63
Resolve issues with GPT4All and fix prompt for yesterday extract questions date filter (#483)
- GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all
- Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example.
- Update all setup data in conftest.py to use new client-server indexing pattern
2023-09-18 14:41:26 -07:00
sabaimran
b225d1188c Fix formatting of gpt.py 2023-09-18 11:09:02 -07:00
Jonny-GM
34b202b868
More lenient date searching (#481)
* Modify DateFilter to use compiled entry key
* Instruct search to include date in query
* Minor prompt change
* Prompt fix
2023-09-18 10:46:00 -07:00
sabaimran
16874e1953 Provide force fallback for regeneration 2023-09-12 16:35:07 -07:00
sabaimran
9f42a1a036 Propagate flags to configure index command 2023-09-11 10:33:44 -07:00
sabaimran
343854752c
Improve docker builds for local hosting (#476)
* Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions
* Move configure_search method into indexer
* Add conditional installation for gpt4all
* Add hint to go to localhost:42110 in the docs. Addresses #477
2023-09-08 17:07:26 -07:00
sabaimran
dccfae3853
Remove PySide dependency and deprecate desktop builds (#475)
* Remove PySide, gui option from code
* Remove pyside 6 dependency from code
* Remove workflows which build desktop applications
* Update unit tests and update line in documentation
* Remove additional references to pyinstaller, gui
* Add uninstall steps to normal uninstall instructions
2023-09-07 11:36:27 -07:00
sabaimran
76562f4250
Add front-end Electron application for Khoj local file syncing (#473)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes
2023-09-06 12:04:18 -07:00
bholagabbar
205dc90746
Fix notion title bug (#474)
* Update notion_to_jsonl.py
* Fix try-catch block
2023-09-05 10:47:42 -07:00
sabaimran
4854258047
Move to a push-first model for retrieving embeddings from local files (#457)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Update unit tests to fix with new application design
* Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request
* Use state.host and state.port for configuring the URL for the indexer
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
2023-08-31 12:55:17 -07:00
sabaimran
92cbfef7ab Skip plaintext file indexing if there's a parsing issue and log the file 2023-08-29 14:34:08 -07:00
sabaimran
74409c2c64 Release Khoj version 0.11.4 2023-08-29 11:44:35 -07:00
sabaimran
1b85958bcc trim chat input start 2023-08-28 19:18:10 -07:00
sabaimran
e592f6eac8 Release Khoj version 0.11.3 2023-08-28 14:46:03 -07:00
sabaimran
7c35da9fc4 Fix bug in /chat endpoint for general and update depdendencies 2023-08-28 14:12:11 -07:00
sabaimran
bc09143856 Release Khoj version 0.11.2 2023-08-28 10:16:13 -07:00
Debanjum Singh Solanky
01b310635e Enable passing search query filters via chat and test it 2023-08-28 09:24:32 -07:00
Debanjum Singh Solanky
794bad8bcb Make date_filter.extract_date_range method always return a list type 2023-08-28 00:55:28 -07:00
Debanjum Singh Solanky
d5a2de6222 Add method to extract filter terms from query to all filters
- Test the get_filter_term method in all 3 word, file, date filters
- Make the existing can_filter method by default in base filter abstract class
2023-08-28 00:55:28 -07:00
Debanjum
150105505b
Add Default chat command. Make Khoj ask clarifying questions (#468)
- Make Khoj ask clarifying questions when answer not in provided context
- Add default conversation command to auto switch b/w general, notes modes
- Show filtered list of commands available with the currently input text
- Use general prompt when no references found and not in Notes mode
- Test general and notes slash commands in offline chat director tests
2023-08-28 00:52:57 -07:00
Debanjum Singh Solanky
eb6cd4f8d0 Use general prompt when no references found and not in Notes mode 2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky
edffbad837 Make Khoj ask clarifying questions when answer not in provided context
Previously it would just refuse ask for clarification. This improves
the chat quality score for the existing director tests
2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky
75c1016ec0 Show filtered list of commands available with the currently input text 2023-08-28 00:46:10 -07:00
Debanjum Singh Solanky
74605f6159 Add default conversation command to auto switch b/w general, notes modes
This was the default behavior but behavior regressed when adding slash
commands in PR #463
2023-08-28 00:46:10 -07:00
sabaimran
cbc978ea08 Update help links for notion, github to point to the main docs 2023-08-27 15:02:55 -07:00
sabaimran
b45e1d8c0d
Fix plaintext HTML parsing and rendering (#464)
* Store conversation command options in an Enum
* Move to slash commands instead of using @ to specify general commands
* Calculate conversation command once & pass it as arg to child funcs
* Add /notes command to respond using only knowledge base as context
This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base
* Test general and notes slash commands in openai chat director tests
---------

Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
2023-08-27 11:24:30 -07:00
Debanjum
7919787fb7
Use Slash Commands and Add Notes Slash Command (#463)
* Store conversation command options in an Enum

* Move to slash commands instead of using @ to specify general commands

* Calculate conversation command once & pass it as arg to child funcs

* Add /notes command to respond using only knowledge base as context

This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base

* Test general and notes slash commands in openai chat director tests

* Update gpt4all tests to use md configuration

* Add a /help tooltip

* Add dynamic support for describing slash commands. Remove default and treat notes as the default type

---------

Co-authored-by: sabaimran <narmiabas@gmail.com>
2023-08-26 18:11:18 -07:00
sabaimran
e64357698d
Skip indexing single bad markdown, plaintext file (#460) 2023-08-23 15:34:56 -07:00
sabaimran
84bd579077 Format the chat outputted message with code, bolding, or italics. Add a copy button for code. Closes #445. 2023-08-19 20:02:57 -07:00
sabaimran
f9e09ba490 Do not try downloading model from GPT4All if the user is not connected to the internet 2023-08-19 19:09:21 -07:00
Debanjum Singh Solanky
3ff4e19dd2 Release Khoj version 0.11.1 2023-08-16 22:53:29 -07:00
sabaimran
4fb8c2c5e1 Pass a SIGTERM to tell the uvicorn server to exit and gracefully kill the thread 2023-08-16 21:27:05 -07:00
sabaimran
4e03dfea43
Attach the parent to the server thread, allowing the kill signal to trigger a graceful exit (#446) 2023-08-16 19:36:10 -07:00
Debanjum Singh Solanky
26c3977fb9 Remove info hint to reindex khoj on unexpected search results
The index corruption was issue resolved a while ago in #325 and
hasn't cropped up again
2023-08-16 00:58:59 -07:00
sabaimran
def909a913
Revert "Open Web interface within Desktop app in GUI mode" (#444) 2023-08-15 23:26:28 -07:00
sabaimran
6562ec6531 Release Khoj version 0.11.0 2023-08-14 19:25:03 -07:00
sabaimran
0ea901c7c1
Allow indexing to continue even if there's an issue parsing a particular org file (#430)
* Allow indexing to continue even if there's an issue parsing a particular org file
* Use approximation in pytorch comparison in text_search UT, skip additional file parser errors for org files
* Change error of expected failure
2023-08-14 07:56:33 -07:00
sabaimran
7b907add77
Add support for indexing plaintext files (#420)
* Add support for indexing plaintext files
- Adds backend support for parsing plaintext files generically (.html, .txt, .xml, .csv, .md)
- Add equivalent frontend views for setting up plaintext file indexing
- Update config, rawconfig, default config, search API, setup endpoints
* Add a nifty plaintext file icon to configure plaintext files in the Web UI
* Use generic glob path for plaintext files. Skip indexing files that aren't in whitelist
2023-08-09 15:44:40 -07:00
Ellen7ions
26bddcb65c
Add support for starting a new line with shift-enter (#412)
* Add support for starting a new line with shift-enter
* Remove useless comments. Set font-size: medium.
* Update src/khoj/interface/web/chat.html
Update the styling to have the padding, margin and line-height like before.
Co-authored-by: Debanjum <debanjum@gmail.com>
* Update src/khoj/interface/web/chat.html
Make the chat-body scroll to the bottom after resizing
Co-authored-by: Debanjum <debanjum@gmail.com>
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
2023-08-07 19:49:07 -07:00
Debanjum Singh Solanky
97609e4995 Use 500px png of khoj logo instead svg for much smaller asset size
The khoj logo svg was 1.3Mb. The 500px png of it is 38Kb.
Given all usage of khoj-logo are below 230px this should work fine
2023-08-07 18:27:11 -07:00
Debanjum
14a816d173
Open Web interface within Desktop app in GUI mode (#429)
Previously the GUI mode (with khoj --gui or using the desktop app) would open the web interface in the users default web browser. Now the web interface is just rendered within the app itself using PyQT's Webview. This gives it a more proper app like feel
2023-08-07 17:48:30 -07:00
Debanjum Singh Solanky
378b96ec1b Open the khoj app window maximized on startup 2023-08-07 15:39:05 -07:00
Debanjum Singh Solanky
ea734ba1c8 Open app in native view on starting it in GUI mode instead of on web browser
- Opens settings page on first run and landing page after in GUI mode
  Previously was only opening the GUI on linux after first run as it
  doesn't have a system tray
- Both the views are from the web interface but are rendered within
  the app instead of the browser
2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky
9c494705a8 Open the search, chat or config view in app from the system tray menu 2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky
cc36b87345 Render the web interface directly within the desktop app as a webview 2023-08-07 13:41:12 -07:00
Jason Qin
3ef1b7073d
Update obsidian/manifest.json
Closes #426
2023-08-07 10:41:39 -07:00
sabaimran
738cf650b3
Explicitly set Khoj to use the default locale of the user (#425)
- Explicitly set locale using `locale.setLocale(locale.LC_ALL, '')` for localization. Relevant for datetime libraries. See [Python 3 documentation](https://docs.python.org/3/library/locale.html#locale.setlocale).
2023-08-07 09:23:24 -07:00
Muftawo
c8ef619090
fixed reference link to landing page (#417)
* Fixed zsh error no matches found
* Fixed home page 404 error
2023-08-04 10:38:14 -07:00
sabaimran
78012b8111 Avoid null ref issue when setting model state for web UI. Closes #410 2023-08-03 00:39:06 -07:00
sabaimran
0baed742e4
Add checksums to verify the correct model is downloaded as expected (#405)
* Add checksums to verify the correct model is downloaded as expected
- This should help debug issues related to corrupted model download
- If download fails, let the application continue
* If the model is not download as expected, add some indicators in the settings UI
* Add exc_info to error log if/when download fails for llamav2 model
* Simplify checksum checking logic, update key name in model state for web client
2023-08-02 23:26:52 -07:00
Debanjum Singh Solanky
e6e3acdbe4 Release Khoj version 0.10.1 2023-08-01 23:55:13 -07:00
Debanjum Singh Solanky
7c1d70aa17 Bump GPT4All response generation batch size to 512 from 256
A batch size of 512 performs ~20% better on a XPS with no GPU and 16Gb
RAM. Seems worth the tradeoff for now
2023-08-01 23:34:02 -07:00
Debanjum
16c6bfce8e
Improve Quality and Reliability of Offline Chat (#393)
# Incoming
## Major
### Fix Prompt Size Exceeded Issue
- Fix issues related to prompt size, Closes #386. Use the correct tokenizer to calculate whether the input needs to be truncated or not.

### Improve Llama 2 Model Download
- Use the correct download link for LlamaV2 -- should have been using the small model, but was using the medium
- Add better downloading logic to retry download if it failed, Closes #379 

### Fix Segmentation Fault due to Race
- Add a lock around generating chat responses from the offline model to avoid segmentation faults. Closes #367.
- Add a loading symbol to the web chat UI when the model is thinking. Closes #392

### Improve Chat Response Latency
- Improve performance of offline chat by increasing batch size (via `n_batch`) to automatically engage more cores/GPU, using smaller model and fixing prompt vs response token generation numbers. Closes #363

### Fix Fake Dialogue Continuation
- Fix formatting of user query with offline chat, this was contributing to #398
- Stop Llama 2 from Creating Fake Dialogue Continuations. Closes #398

## Minor
- Improve default message for Chat window on web when it's not configured. Include hint to use offline chat.
- Add null check in `perform_chat_checks` method
- Add offline chat director unit tests

## Performance Analysis (Time to First Token)
|  | v0.10.0 | this branch |
|-|-|-|
| Query 1 | 52s | 28s |
| Query 2 | 33s| 42s |
| Query 3 | 67s| 38s|
2023-08-01 22:07:27 -07:00
Debanjum Singh Solanky
44292afff2 Put offline model response generation behind the chat lock as well
Not just the chat response streaming
2023-08-01 21:53:52 -07:00
Debanjum Singh Solanky
1812473d27 Extract new schema version for each migration script into a variable
This should ease readability, indicates which version this
migration script will update the schema to once applied
2023-08-01 21:41:08 -07:00
Debanjum Singh Solanky
b9937549aa Simplify migration scripts management. Make them use static version
- Only make them update config when it's run conditions are satisfies
- Use static schema version to simplify reasoning about run conditions
2023-08-01 21:28:20 -07:00
Debanjum Singh Solanky
185a1fbed7 Remove old chat setup timer. It is mislabelled, irrelevant since streaming 2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
c2b7a14ed5 Fix context, response size for Llama 2 to stay within max token limits
Create regression text to ensure it does not throw the prompt size
exceeded context window error
2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
6e4050fa81 Make Llama 2 stop generating response on hitting specified stop words
It would previously some times start generating fake dialogue with
it's internal prompt patterns of <s>[INST] in responses.

This is a jarring experience. Stop generation response when hit <s>

Resolves #398
2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
aa6846395d Fix offline model migration script to run for version < 0.10.1
- Use same batch_size in extract question actor as the chat actor
- Log final location the chat model is to be stored in, instead of
  it's temp filename while it is being downloaded
2023-08-01 20:51:53 -07:00
Ikko Eltociear Ashimine
49abb9df9c
Fix typo in orgnode.py (#397)
Fix spelling of Ouput in org parser property drawer comment to Output.
2023-08-01 19:54:57 -07:00
sabaimran
f409e16137 Update some of the extract question prompts for llamav2 2023-08-01 12:23:36 -07:00
sabaimran
b11b00a9ff Add log line for time to first response 2023-08-01 10:57:38 -07:00
sabaimran
778df6be71 Add a logline when the offline model migration script runs 2023-08-01 09:27:42 -07:00
sabaimran
3a5d93d673 Add migration script for getting the new offline model 2023-08-01 09:25:05 -07:00
sabaimran
90efc2ea7a Update comments and add explanations 2023-08-01 09:24:03 -07:00
sabaimran
f7e03f6d63 Switch spinner snake case -> camel case 2023-08-01 08:52:25 -07:00
sabaimran
1c52a6993f add a lock around chat operations to prevent the offline model from getting bombarded and stealing a bunch of compute resources
- This also solves #367
2023-08-01 00:23:17 -07:00
sabaimran
6c3074061b Disable the input bar when chat response is in flight 2023-08-01 00:21:39 -07:00
sabaimran
c14cbe926a Add a loading symbol to web chat. Closes #392 2023-07-31 23:35:48 -07:00
sabaimran
8054bdc896 Use n_batch parameter to increase resource consumption on host machine (and implicitly engage GPU) 2023-07-31 23:25:08 -07:00
sabaimran
e55e9a7b67 Fix unit tests and truncation logic 2023-07-31 21:37:59 -07:00
sabaimran
2335f11b00 Add better error handling for download processes incase of failure 2023-07-31 21:07:38 -07:00
sabaimran
209975e065 Resolve merge conflicts: let Khoj fail if the model tokenizer is not found 2023-07-31 19:12:26 -07:00
sabaimran
2d6c3cd4fa Misc. quality improvements for Llama V2
- Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S
- Use a proper Llama Tokenizer for counting tokens for truncation with Llama
- Add additional null checks when running
2023-07-31 19:11:20 -07:00
sabaimran
ca195097d7 Update chat hint message at first run 2023-07-31 17:46:09 -07:00
Debanjum Singh Solanky
ded606c7cb Fix format of user query during general conversation with Llama 2 2023-07-31 17:21:14 -07:00
Debanjum Singh Solanky
48e5ac0169 Do not drop system message when truncating context to max prompt size
Previously the system message was getting dropped when the context
size with chat history would be more than the max prompt size
supported by the cat model

Now only the previous chat messages are dropped or the current
message is truncated but the system message is kept to provide
guidance to the chat model
2023-07-31 17:21:14 -07:00
sabaimran
88ef86ad5c
Fix typing issues for mypy (#372) 2023-07-30 19:27:48 -07:00
sabaimran
ca2c942b65 Add typing to compiled_references and inferred_queries 2023-07-30 19:10:30 -07:00
sabaimran
3646fd1449 Add a warning to indicate that Khoj is not configured to work with personal data sources 2023-07-30 18:52:10 -07:00
sabaimran
996832dc72 Allow user to chat even if content types aren't configured - use empty references 2023-07-30 18:47:45 -07:00
Debanjum Singh Solanky
53810a0ff7 Create khoj config dir if non-existant, before writing to khoj env file 2023-07-30 01:35:36 -07:00
sabaimran
f65d157244 Release Khoj version 0.10.0 2023-07-28 19:27:47 -07:00
Debanjum Singh Solanky
f76af869f1 Do not log the gpt4all chat response stream in khoj backend
Stream floods stdout and does not provide useful info to user
2023-07-28 19:14:04 -07:00
sabaimran
5ccb01343e
Add Offline chat to Obsidian (#359)
* Add support for configuring/using offline chat from within Obsidian
* Fix type checking for search type
* If Github is not configured, /update call should fail
* Fix regenerate tests same as the update ones
* Update help text for offline chat in obsidian
* Update relevant description for Khoj settings in Obsidian
* Simplify configuration logic and use smarter defaults
2023-07-28 18:47:56 -07:00
Debanjum
b3c1507708
Merge pull request #361 from khoj-ai/configure-offline-chat-from-emacs
- Configure using Offline Chat from Emacs: 
- Enable, Disable Offline Chat from Emacs

- Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup
- Benefits: Offline chat models are better for privacy but not great at answering questions
2023-07-28 18:06:58 -07:00
sabaimran
9f78db0579
Let Offline chat override OpenAI API settings (#362)
* Let Offline chat override OpenAI API settings
* Download the offline model whenever offline chat is enabled
* Add progressbar for download for llamav2 model to track progress
* Change ordering of n due to switch of default processor
* Flip ordering of offline/openai checks when extracting questions from query
2023-07-28 17:26:20 -07:00
Debanjum Singh Solanky
ebfbef1f68 Configure using offline chat from Emacs
Closes #358
2023-07-28 16:07:33 -07:00
sabaimran
29081f4429 Adjust parameters for offline chat 2023-07-27 22:22:09 -07:00
sabaimran
124d97c26d
Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352)
* Working example with LlamaV2 running locally on my machine

- Download from huggingface
- Plug in to GPT4All
- Update prompts to fit the llama format

* Add appropriate prompts for extracting questions based on a query based on llama format

* Rename Falcon to Llama and make some improvements to the extract_questions flow

* Do further tuning to extract question prompts and unit tests

* Disable extracting questions dynamically from Llama, as results are still unreliable
2023-07-27 20:51:20 -07:00
Debanjum Singh Solanky
715d56d4f0 Use new schema to update khoj.yml config from khoj.el 2023-07-26 17:34:16 -07:00
sabaimran
8b2af0b5ef
Add support for our first Local LLM 🤖🏠 (#330)
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
2023-07-26 16:27:08 -07:00
sabaimran
23d77ee338
Fix import issues in desktop image builds (#343) 2023-07-26 15:45:52 -07:00
Debanjum Singh Solanky
7722a9c347 Default to using the gpt-3.5-turbo model for chat from khoj.el 2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky
f0d4a4cf9a Revert "Make configure_content functional. Do not pass content index state to it."
This reverts commit 2ddee7e745 as it
broke partial updates of the content index for just the specified
content types
2023-07-21 13:59:09 -07:00
sabaimran
82c725817e Merge branch 'master' of github.com:khoj-ai/khoj 2023-07-21 13:24:05 -07:00
sabaimran
596e11ec6d Use the same function for computing entries for IDs regardless of whether it has prev entries 2023-07-21 13:23:56 -07:00
Debanjum Singh Solanky
2ddee7e745 Make configure_content functional. Do not pass content index state to it. 2023-07-20 23:24:08 -07:00
sabaimran
1610d2ebd9
📝 Add a documentation base for Khoj! (#333)
* Add docs for more organized, accessible information detailing Khoj setup
* Delete duplicated files
* Add a coverpage without enabling it. Add logo and theme
* Remove obsidian README.md
* Add plausible script to index.html via docsify
2023-07-20 22:34:25 -07:00
Debanjum Singh Solanky
3e59be7f1d Release Khoj version 0.9.0 2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky
d078e7b1f6 Clean up search type usage in khoj server, tests and Readme 2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky
4d910936b7 Fix triggering index update on khoj server from khoj.el 2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky
5c7d7f558d Make AI model used for Khoj chat configurable from khoj.el
- Fix bug. Set the unused model-name to a standad default value
2023-07-18 19:57:54 -07:00
Debanjum
5f2be2a9bb
Merge pull request #298 from HyunggyuJang/patch-1
Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config
2023-07-18 17:54:11 -07:00
Debanjum Singh Solanky
429e1b4b48 Regenerate index to apply corruption fixes on first run of new khoj 2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky
83e1088d42 Manage khoj.yml config migrations on app start. Version the schema
- Add version to khoj.yml schema
  Versioning the khoj.yml config schema will simplify future migrations
2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky
71e8ddd9a2 Check if PDF is configured before showing it as an option in khoj.el 2023-07-17 15:49:20 -07:00
Debanjum
d00c5da8b7
Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing
## Stabilize and Simplify Content Indexing

### Major Updates
- 9bcca43 Unify logic to update entries when indexing from scratch or incrementally
- 89c7819 Unify logic to update embeddings when indexing from scratch or incrementally
- 6a0297c Stable sort new entries when marking entries for update
- 58d86d7 Unify logic to configure server from API or on server start
- Create tests to ensure old entries, embeddings in index are unaffected on adding new entries
  - Refer: 1482fd4, 7669b85, 88d1a29 
  - ad41ef3 Make normalization of embeddings configurable to test this in c73feeb

### Minor Updates
- 1673bb5 Add todo state to compiled form of each entry
- 6e70b91 Remove unused `dump_jsonl` helper method 
- 7ad9603 Improve naming of lock
- b02323a Improve naming text search test methods

Resolves #190
2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky
3e3a1ecbc8 Start app even if server init fails to let user fix it
Show stacktrace on error to help debugging
2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky
ef6a0044f4 Drop embeddings of deleted text entries from index
Previously the deleted embeddings would continue to be in the index,
even after the entry was deleted
2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky
ad41ef3991 Make normalizing embeddings configurable 2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky
89c7819cb7 Unify logic to generate embeddings from scratch and incrementally
This simplifies the `compute_embeddings' method and avoids potential
later divergence in handling the index regenerate vs update scenarios
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
6a0297cc86 Stable sort new entries when marking entries for update 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
6e70b914c2 Remove unused dump_jsonl method
The entries index is stored ingzipped jsonl files for each content type
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
9bcca43299 Use single func to handle indexing from scratch and incrementally
Previous regenerate mechanism did not deduplicate entries with same key
So entries looked different between regenerate and update
Having single func, mark_entries_for_update, to handle both scenarios
will avoid this divergence

Update all text_to_jsonl methods to use the above method for
generating index from scratch
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
1673bb5558 Add todo state to compiled form of each org-mode entry 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
7ad96036b0 Improve lock name to config_lock instead of search_index_lock
It is used to lock updates to all app config state, including processor
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
58d86d7876 Use single func to configure server via API and on server start
Improve error messages on failure to configure server components
2023-07-16 01:45:53 -07:00
sabaimran
a15711e635 Fix null type checks in get /config 2023-07-15 15:53:56 -07:00
sabaimran
e590d75b20
Start Khoj even when config is not valid (#320)
* Add icon to indicate bad config, start Khoj even if there was an issue setting up the index
2023-07-15 14:11:54 -07:00
sabaimran
49ab201c30
Fix issues importing PySide in Docker container (#322)
* Rather than installing PyQT dependencies, remove codepaths that require pyqt files in no-gui mode
2023-07-15 13:33:13 -07:00
sabaimran
ba47f2ab39 Merge branch 'master' of github.com:debanjum/khoj 2023-07-14 22:28:05 -07:00
sabaimran
874cffd256 Add additional support for parsing notion workspaces 2023-07-14 22:27:56 -07:00
Debanjum
52f68167ce
Merge pull request #317 from khoj-ai/reduce-memory-consumption-by-search-model-duplication
Reuse Search Models across Content Types to reduce Memory Consumption

- Memory consumption now only scales with search models used, not with content types. 
  Previously each content type had it's own copy of the search ML models. 
  That'd result in 300+ Mb per enabled text content type

- Split model state into 2 separate state objects, `search_models` and `content_index`. 
  This allows loading text_search and image_search models first
  and then reusing them across all content_types in content_index

- The change should cut down memory utilization quite a bit for most users.
  I see a >50% drop in memory utilization on my Khoj instance. 
  But this will vary for each user based on the amount of content indexed vs number of plugins enabled.

- This change does not solve the RAM utilization scaling with size of the index,
  as the whole content index is still kept in RAM while Khoj is running

Should help with #195, #301 and #303
2023-07-14 19:54:12 -07:00
Debanjum Singh Solanky
f08e9539f1 Release lock after updating index even if update fails to prevent deadlock
Wrap acquire/release locks in try/catch/finally when updating content
index and search models to prevent lock not being released on error
and causing a deadlock
2023-07-14 16:57:27 -07:00
sabaimran
37f7f9fd1d
Add additional telemetry for system understanding (#316)
* Add additional telemetry in order to understand which data sources are the most useful
* Make actions side by side in the configuration page
* Restore main run command
* Update links to point to wiki pages for Github, Notion integrations
* Stanardize nomenclature of the api_type to use _config suffix

Remove header fields that aren't actually helpful for understanding config usage
2023-07-14 10:14:07 -07:00
Debanjum Singh Solanky
86e2bec9a0 Reuse Search Models across Content Types to Reduce Memory Consumption
- Memory consumption now only scales with search models used, not with
  content types as well. Previously each content type had it's own
  copy of the search ML models. That'd result in 300+ Mb per enabled
  content type

- Split model state into 2 separate state objects, `search_models' and
  `content_index'.
  This allows loading text_search and image_search models first and then
  reusing them across all content_types in content_index

- This should cut down memory utilization quite a bit for most users.
  I see a ~50% drop in memory utilization.

  This will, of course, vary for each user based on the amount of
  content indexed vs number of plugins enabled

- This does not solve the RAM utilization scaling with size of the index.
  As the whole content index is still kept in RAM while Khoj is running

Should help with #195, #301 and #303
2023-07-14 01:27:22 -07:00
Debanjum
b2718d330c
Merge pull request #304 from migrate-from-pyqt-to-pyside
Migrate from PyQT6 to PySide6
2023-07-13 11:54:47 -07:00
sabaimran
31e933207f Set default values for sys.stdout if they're unavailable 2023-07-12 22:22:49 -07:00
Debanjum Singh Solanky
9c76150895 Migrate from PyQT6 to PySide6 2023-07-11 18:43:44 -07:00
HyunggyuJang
88c42b3043
Encode data as utf-8
otherwise it will complain, see 1c85531090
2023-07-11 17:06:05 +09:00
Debanjum Singh Solanky
f664a74e77 Update Khoj server to run on non standard port, 42110 instead of 8000
Resolves #295
2023-07-10 21:27:58 -07:00
sabaimran
effb52f859 Fix demo rendering with the new header 2023-07-10 21:16:19 -07:00
sabaimran
55f5be7b03 Release Khoj version 0.8.2 2023-07-10 14:39:32 -07:00
sabaimran
9a63f89f33 Merge branch 'master' of github.com:debanjum/khoj 2023-07-10 14:31:19 -07:00
sabaimran
53809298c0 Release Khoj version 0.8.1 2023-07-10 14:30:04 -07:00
tjsousa
5b37e988e6
Allow using configured GPT chat model (#292)
My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.
2023-07-10 14:24:40 -07:00
Debanjum Singh Solanky
75ff871217 Release Khoj version 0.8.0 2023-07-10 13:37:51 -07:00
Debanjum Singh Solanky
979088b3dc Add tooltip helper text on web settings page buttons
- Provide more details on what clicking configure, initialize buttons
  or changing the results count slider does
- This shows up on user hovering over those buttons
2023-07-10 13:32:41 -07:00
Debanjum Singh Solanky
255781e135 Use relative link on logo to jump to correct page on local and cloud 2023-07-10 13:22:20 -07:00
Debanjum Singh Solanky
b2d229c116 Move header pane style to base khoj.css for reuse. Fix logo size 2023-07-10 13:10:17 -07:00
Debanjum Singh Solanky
20cb314171 Open the Khoj config page in the browser on first run 2023-07-10 12:10:20 -07:00
sabaimran
07cf5a214a
Check if PDF files are present in the Obsidian vault before initializing the Khoj configuration (#293) 2023-07-10 10:33:04 -07:00
sabaimran
7364bac8ae Make the header take up less space
- Use a single row for the header
- Needed custom styling for each page because each of them are different in subtle ways, unfortunately
2023-07-09 22:31:37 -07:00
sabaimran
62704cac09
Add a plugin which allows users to index their Notion pages (#284)
* For the demo instance, re-instate the scheduler, but infrequently for api updates
- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session
* Conditionally skip updating the index altogether if it's a demo isntance
* Add backend support for Notion data parsing
- Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token
- Make corresponding updates to the default config, raw config to support the new notion addition
* Add corresponding views to support configuring Notion from the web-based settings page
- Support backend APIs for deleting/configuring notion setup as well
- Streamline some of the index updating code
* Use defaults for search and chat queries results count
* Update pagination of retrieving pages from Notion
* Update state conversation processor when update is hit
* frequency_penalty should be passed to gpt through kwargs
* Add check for notion in render_multiple method
* Add headings to Notion render
* Revert results count slider and split Notion files by blocks
* Clean/fix misc things in the function to update index
- Use the successText and errorText variables appropriately
- Name parameters in function calls
- Add emojis, woohoo
* Clean up and further modularize code for processing data in Notion
2023-07-09 15:29:26 -07:00
Debanjum
77755c0284
Fix Packaging the Khoj Desktop Apps (#289)
* Add langchain static files and pytorch metadata to Khoj native app

* Add pillow static files, metadata & hidden imports to Khoj native app

* Fix path to web interface static files on Khoj native app

* Add tiktoken hidden imports to make chat work from Khoj native app

* Fix Khoj native app to run with GUI mode enabled

This got broken when we moved from using the --no-gui flag to using
--gui in https://github.com/khoj-ai/khoj/pull/263
2023-07-09 10:21:16 -07:00
sabaimran
4c135ea316
Make streaming optional for the /chat endpoint (#287)
* Update the /chat endpoint to conditionally support streaming

- If streams are enabled, return the threadgenerator as it does currently
- If stream is disabled, return a JSON response with the response/compiled references separated out
- Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian
- Rename chat/init/ to chat/history

* Update khoj.el to use the /history endpoint

- Update corresponding unit tests to use stream=true

* Remove & from call to /chat for obsidian

* Abstract functions out into a helpers.py file and clean up some of the error-catching
2023-07-09 10:12:09 -07:00
Debanjum Singh Solanky
0a86220d42 Use default values, delete content config on disable and update state 2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky
362063f5fe By default, connect to Khoj server over IPv4 from Obsidian plugin 2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky
571e8c2548 Add rerank, index corruption hint on search page of web interface
Similar to the hint alrady in the Obsidian search modal
Closes #272
2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky
61e131f95c Hide unused model field from chat settings on web interface 2023-07-07 18:43:53 -07:00
Debanjum Singh Solanky
af30d01e85 Move to newer chat models to extract questions & summarize chats
Deprecate usage of the older gpt3 models in-place of the newer chat
based models
- text-davinci-003 is only 50% cheaper than gpt4 and less reliable for
  question extraction
- Using gpt-3.50turbo for summarization should reduce cost of chat

- Keep conversation.chat_session as a list instead of a string
- Update completion_with_backoff func to use ChatML format
2023-07-07 17:32:27 -07:00
Debanjum Singh Solanky
171ce19e1f Update date filter to allow quoting values in single quotes 2023-07-07 17:13:47 -07:00
Debanjum Singh Solanky
e588f7c528 Deprecate unused beta search and answer API endpoints 2023-07-07 16:38:07 -07:00
Debanjum Singh Solanky
c9fc4d1296 Revert to using cross-encoder to improve search results used by chat 2023-07-07 15:31:34 -07:00
Debanjum Singh Solanky
11f0a9f196 Fix chat tests since streaming. Pass args correctly to chat methods
- Fix testing gpt converse method after it started streaming responses
- Pass stop in model_kwargs dictionary and api key in openai_api_key
  parameter to chat completion methods. This should resolve the arg
  warning thrown by OpenAI module
2023-07-07 15:23:44 -07:00
Debanjum Singh Solanky
48870d9170 Fix parsing questions generated by extract_questions actor into list
The previous json parsing was failing to handle questions with date
filters

Fix the chat actor tests to run without throwing error with freezegun
complaining about importing transformers.local_llama model

Remove quote escapes from date filter examples provided to
extract_questions actor
2023-07-07 15:18:55 -07:00
Debanjum Singh Solanky
279662620b Move results count to settings page on web. Use it for search & chat
- Before
  Only the search interface had the results count configuration option

- After
  - The results count is set on the settings page instead of the
    search page
  - Both search and chat can use the configured results count instead
    of just search
2023-07-07 14:08:08 -07:00
Debanjum Singh Solanky
2ec8da89e8 Remove Update button from Khoj Search page on the Web interface
The settings page on the Khoj web interface already has a configure
button. Don't need the Update button on the search page as well
2023-07-07 12:49:58 -07:00
Debanjum Singh Solanky
bf427cd8dd Set no. of results used to generate chat response from Khoj Emacs 2023-07-07 12:34:50 -07:00
Debanjum Singh Solanky
1d77fe712c Set no. of results used to generate chat response from Khoj Obsidian 2023-07-07 12:32:32 -07:00
Debanjum Singh Solanky
2f31de5ed5 Set no. of references to use for chat configurable in Chat API 2023-07-07 12:29:36 -07:00
Debanjum Singh Solanky
d97682fdac Use tooltip, placeholders to guide Khoj setup via web settings page 2023-07-06 21:37:48 -07:00
Debanjum Singh Solanky
f5cf09424b Use more descriptive field names for content type settings on Khoj web
Resolves #281
2023-07-06 20:47:39 -07:00
Debanjum Singh Solanky
a2c668268f Use node-fetch >=3.1.0 in khoj obsidian plugin to avoid security vulnerability 2023-07-06 13:05:52 -07:00
sabaimran
d688ddf92c
Re-instate the scheduler for the demo instances (#279)
* For the demo instance, re-instate the scheduler, but infrequently for api updates

- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session

* Conditionally skip updating the index altogether if it's a demo isntance
2023-07-06 11:01:32 -07:00
Debanjum Singh Solanky
8f36572a9b Improve typing, null checks in controllers and gpt functions 2023-07-05 20:49:25 -07:00
Debanjum
6c2a8a5bce
️ Stream Responses by Khoj Chat on Web, Obsidian
- What
   - Stream chat responses from OpenAI API to Web, Obsidian clients
      - Implement using a callback function which manages a queue where new tokens can be placed as they come on. As the thread is read from, tokens are removed.
      - When the final token has been processed, add the `compiled_references` to the queue to be rendered by the `chat` client
      - When the thread has been closed, save the accumulated conversation log in the user's history using a `partial func`
      - Incrementally decode tokens on the front end and add them as they appear from the streamed response

- Why
This significantly reduces perceived latency and OpenAI API request timeouts for Chat

Closes https://github.com/khoj-ai/khoj/issues/257
2023-07-05 20:02:11 -07:00
Debanjum Singh Solanky
e111eda6ae Make client, app_config optional in telemetry logger for correct typing 2023-07-05 18:57:38 -07:00
Debanjum Singh Solanky
e562114f6b Improve comments, var names in js for chat streaming on web interface 2023-07-05 18:57:27 -07:00
Debanjum Singh Solanky
46269ddfd3 Fix chat logging messages to get context without flooding logs 2023-07-05 18:27:06 -07:00
Debanjum Singh Solanky
0ba838b53a Show temp status message in Khoj Obsidian chat while Khoj is thinking
- Scroll to bottom after adding temporary status message and
references too
2023-07-05 18:02:43 -07:00
Debanjum Singh Solanky
8271abe729 Use optional chaining operator to extract khojBannerSubmit from conditional 2023-07-05 18:02:43 -07:00
Debanjum Singh Solanky
c12ec1fd03 Show temp status message in Khoj web chat while Khoj is thinking
- Scroll to bottom after adding temporary status message and
references too
2023-07-05 18:02:30 -07:00
sabaimran
257a421e45 Bonus: add try-catch logic around telemetry upload in case of JSON serializability issues 2023-07-05 15:12:18 -07:00
sabaimran
4e6b66b139 Add support for streaming chat response from OpenAI to Obsidian
- I needed to installed node-fetch to accomplish this, as the built-in request object from Obsidian doesn't seem to support streaming and the built-in fetch object is very sensitive to any and all cross origin requests
2023-07-05 15:01:22 -07:00
sabaimran
3ff5074cf5 Log the end-to-end time of generating a streamed response from OpenAI 2023-07-05 14:59:44 -07:00
sabaimran
68e635cc32 Remove additional comments and debug statements 2023-07-05 11:33:56 -07:00
sabaimran
67a8795b1f Clean-up commented out code 2023-07-05 11:24:40 -07:00
sabaimran
79b1b1d350 Save streamed chat conversations via partial function passed to the ThreadGenerator 2023-07-04 17:33:52 -07:00
sabaimran
afd162de01 Add reference notes to result response from GPT when streaming is completed
- NOTE: results are still not being saved to conversation history
2023-07-04 12:47:50 -07:00
sabaimran
8f491d72de Initial code with chat streaming working (warning: messy code) 2023-07-04 10:14:39 -07:00
Debanjum Singh Solanky
5889eceba4 Make text selectable in Khoj chat modal on Obsidian
Previously the text in the Khoj chat modal couldn't be copied as it
was not selectable

Resolves #206
2023-07-03 23:24:04 -07:00
sabaimran
89354def9b Update request timeout window to 20 seconds 2023-07-03 22:28:18 -07:00
sabaimran
b1940519c3 Log error if unable to decode chunk from Github 2023-07-03 16:29:32 -07:00
Debanjum Singh Solanky
ecf9730cd7 Disable Chat, Search on Web if Khoj not configured & show next steps 2023-07-03 16:04:32 -07:00
sabaimran
017e8c1aef
Skip indexing a PDF that has an indexing error (#274) 2023-07-03 15:55:11 -07:00
sabaimran
a6f313589e Release Khoj version 0.7.1 2023-07-03 12:26:41 -07:00
sabaimran
8bfd5828e6 Remove deprecation notice since we're opening the web UI by default 2023-07-03 12:01:09 -07:00
sabaimran
92d81d3b16
Initialize the search.model field to SearchModels() and fix Reinitialize API call (#273) 2023-07-03 11:32:44 -07:00
sabaimran
61403138d5
Merge pull request #269 from khoj-ai/features/simplify-configuration-steps
Simplify some common configuration steps
2023-07-03 00:16:51 -07:00
sabaimran
ea3dc2cfa3 Simplify rendering of content type pages and logic of selecting config 2023-07-03 00:15:29 -07:00
sabaimran
260272dca2 Check if state.config is populated before configuring via the update method 2023-07-03 00:10:56 -07:00
sabaimran
bf8914d0c8 Fix default config initialization for for chat.html 2023-07-03 00:00:47 -07:00
Debanjum
faad1297f4
Drop Support for Org Music, Ledger Content Types
Removing unused content types will reduce khoj code to manage

- 0f993b3 Drop support for Ledger as a separate content type
   Khoj will soon get a generic text indexing content type in Index plain text files #237.
   This along with a file filter should suffice for searching through Ledger transactions

- c9db532 Remove unused org-music as an indexable content type from Khoj
   Org-music was just a custom content type that worked with org-music.
   It was mostly only useful for me.
2023-07-02 17:48:29 -07:00
Debanjum Singh Solanky
0f993b332e Drop support for Ledger as a separate content type
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.

Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
2023-07-02 16:57:49 -07:00
sabaimran
fa218ff5aa Fix call to update for Reinitialize button 2023-07-02 16:31:30 -07:00
sabaimran
a8b83da872 Merge branch 'master' of github.com:debanjum/khoj into features/simplify-configuration-steps 2023-07-02 16:21:54 -07:00
Debanjum Singh Solanky
c9db5321e7 Remove unused org-music as an indexable content type from Khoj
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.

Cleaning up that code will reduce number of content types for khoj to
manage.
2023-07-02 16:21:21 -07:00
sabaimran
b86a3bb0c5 Merge branch 'master' of github.com:debanjum/khoj into fix/obsidian-setup-issues 2023-07-02 16:21:05 -07:00
sabaimran
a52c1c8380 Use built-in app.vault to determine whether there are any PDF files within 2023-07-02 16:20:43 -07:00
sabaimran
eff1436857 Overwrite existing PDFs in Obsidian as well, make if-block more legible 2023-07-02 16:17:25 -07:00
Debanjum Singh Solanky
30459ee4ba Fix Khoj subtitle in desktop entry, pyproject, cli and Obsidian Readme 2023-07-02 16:09:07 -07:00
sabaimran
1a1b044d12 Simplify settings pages for configuration
- Add one-click disablement
- Remove fields that probably don't need to be edited (our implementation details)
- Add a green tick if a given field is configured
2023-07-02 16:04:05 -07:00
sabaimran
e4c445f805 Add try-except-finally blocks around configure calls in /update 2023-07-02 13:35:02 -07:00
sabaimran
4b02a8c788 Fix PDF setup in Obsidian plugin and force Obsidian configuration for markdown 2023-07-02 12:37:24 -07:00
sabaimran
2a7e4f2b71 Escape special characters in the URL when adding a link to the remote file 2023-07-02 09:13:28 -07:00
sabaimran
c747562897 Update the GUI to just be a simple box with a button for the web UI 2023-07-01 20:37:21 -07:00
sabaimran
bab7f39d47 Move logic to open the web browser into the GUI section 2023-07-01 20:11:27 -07:00
sabaimran
36537606da Update unit test and preserve prior operational ordering in main.py 2023-07-01 20:02:35 -07:00
sabaimran
ea9ae4ae28 Configure Khoj to automatically open the browser to their web home page when Khoj is up 2023-07-01 19:46:31 -07:00
sabaimran
d2083dd395 Remove bespoke processing for GithubToJsonl file demo 2023-07-01 19:09:22 -07:00
sabaimran
a71440f62a Update the guidance in the error message if config is not set 2023-07-01 19:09:00 -07:00
sabaimran
7db97d8aa9 Fix: don't try to render the search_type.ALL 2023-07-01 19:08:19 -07:00
sabaimran
f0f6390366 Make --no-gui the default behavior of Khoj and update corresponding documentation 2023-07-01 19:07:59 -07:00
Debanjum Singh Solanky
d77e05c279 Release Khoj version 0.7.0 2023-07-01 05:44:22 -07:00
Debanjum Singh Solanky
30d87a9a01 Update color of Khoj chat in Obsidinan plugin to Lantern theme 2023-07-01 02:18:47 -07:00
Debanjum Singh Solanky
51826d28d6 Ensure clicking Update in Khoj Obsidian indexes PDF files too 2023-07-01 02:18:47 -07:00
sabaimran
dac2d14380 Handle file names appropriately for md files and render commits in github results 2023-07-01 01:20:58 -07:00
sabaimran
dbe713604d Fix error in tests for markdown_to_jsonl 2023-07-01 00:49:40 -07:00
sabaimran
931aab4464 Handle case for when headers value is None 2023-07-01 00:37:30 -07:00
sabaimran
d01afb3ee4 Fix path issues for URL-based markdown files 2023-07-01 00:25:11 -07:00
sabaimran
31655447e7 Add the sign-up list to the chat page as well and update copy 2023-06-30 21:43:01 -07:00
sabaimran
796102c74e Add separate configuration if the given Khoj instance is meant for demo
- In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network)
- Prevent re-indexing for Github data if this is a demo instance
- Fix up some issues with the CSS which made settings page small in mobile
- In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page
2023-06-30 20:38:55 -07:00
sabaimran
db3026739d Resolve diffs in api.py to make /chat endpoint async with new request parameter 2023-06-30 00:25:37 -07:00
sabaimran
ef72508914 Try/catch around github file decoding, await call to search in chat API, fix img width 2023-06-30 00:23:21 -07:00
Debanjum Singh Solanky
b950889f47 Fix org-mode web renderer to handle results containing list in block
- Break out of rendering list if at end of org block in org.js
- This would previous hang rendering results in web interface

Should try fix this upstream in org.js as well
2023-06-29 19:01:25 -07:00
sabaimran
780c769567 Add additional request headers to improve telemetry 2023-06-29 18:51:24 -07:00
sabaimran
6c10d68262
Merge pull request #253 from khoj-ai/features/github-issues-indexing
Support indexing Github issues as well as corresponding comments
2023-06-29 16:02:47 -07:00
sabaimran
b2dd946c6d Rename issue to entry method for accuracy 2023-06-29 15:23:50 -07:00
Debanjum Singh Solanky
51dfa48e2b Have Khoj support Python 3.11 as Pytorch supports it now
- Previously Khoj could only support Python upto 3.10 due to pytorch.
  But lots of folks had python 3.11 installed by default on their machines.

  This required installing python 3.10 and dealing with virtual envs.

  With Torch >= 2.0.1 now able to support python 3.11, at least one
  class of installation troubles for Khoj should drop. See
  https://github.com/pytorch/pytorch/issues/86566 for reference

- Preliminary testing indicates using the new torch 2.x may reduce
  search time by 25% (from 80ms to 60ms on Mac M1)

- Update Docs to not require mentioning python <=3.10 required
- Update Github test workflow to run khoj tests with python 3.11 too
2023-06-29 15:13:26 -07:00
sabaimran
65bf894302 Interpret org files as a list and put them in separate divs. Update styling of search results to separate into cards 2023-06-29 15:12:48 -07:00
Debanjum Singh Solanky
d212298573 Make Configure button on web interface incrementally update by default
We should add a way to force index everything.

But force indexing should not be the default when user is just trying
update content to index
2023-06-29 14:52:51 -07:00
Debanjum Singh Solanky
da2de21339 Only return requested result count even if search in multiple content types
- Set results_count to default value at start so it is an int, never None
2023-06-29 14:49:05 -07:00
sabaimran
77672ac0ae Demarcate different results with a border box
- Add back support for searching by type Github
- Remove custom class name in markdown js file
2023-06-29 14:14:25 -07:00
sabaimran
6edc32f2f4 Accept current changes to include issues in rendering flow 2023-06-29 12:25:29 -07:00
sabaimran
ab7dabe74f Explicitly use Union type for function parameters for lint checks 2023-06-29 11:44:30 -07:00
sabaimran
fecf6700d2 Limit small image rendering to just the avatar images 2023-06-29 11:27:18 -07:00
sabaimran
70e550250a Add an additional data source for issues from Github repositories + quality of life updates
- Use a request session to reduce the overhead of setting up a new connection with the Github URL each request
- Use the streaming feature for the REST api to reduce some of the memory footprint
2023-06-29 10:59:54 -07:00
Debanjum Singh Solanky
5f2717cc4b Use logger.warning since logger.warn is deprecated 2023-06-28 22:15:27 -07:00
Debanjum Singh Solanky
56ce97ef9e Use async/await in tests for query method of text and image search
The text, image search query method has become async. So async/await
is required to get results correctly in tests etc
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
b1767f93d6 Get any configured asymmetric search model to encode query for search
- Set image_search.query to async to use it with multi-threading
  This is same as text_search.query being set to an async method
- Exit search early if no search_model is defined in state.model
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
8eae7c898c Put each result under org heading when query for "all" content type in khoj.el
- Add "all" as default content type when no content type retrieved
  from server
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
630bf995f1 Style each result based on its content type in same view on Khoj web
- So when searching across content types (with content-type = "all")
  org-mode results get rendered differently than markdown, PDF etc. results

- Set div class for each result separately instead of a single uber div
  for styling. This allows styling div of each result based on the
  content-type of that result

- No need to create placeholder "all" content type on web interface as
  server is passing an all content type by itself
2023-06-28 22:07:01 -07:00
Debanjum Singh Solanky
1773a78339 Fix createRequestUrl method signature to fetch results from khoj web 2023-06-28 12:10:45 -07:00
Debanjum Singh Solanky
212b1a96c8 Create "all" search type for search across all content types on khoj server
Allows moving logic to handle search across all content types to
server from clients
2023-06-28 11:34:26 -07:00
Debanjum Singh Solanky
0636ceaf14 Merge branch 'master' of github.com:khoj-ai/khoj into parallelize-search-across-all-asymmetric-text-content-types
Conflicts:
- src/khoj/routers/api.py: Use theirs
2023-06-27 16:10:32 -07:00
Debanjum Singh Solanky
510bb7e684 Use typing union in text_search for python 3.8 compatible type hinting 2023-06-27 15:59:50 -07:00
Debanjum Singh Solanky
1b11d5723d Extract search request URL builder into js function in web interface 2023-06-27 15:50:41 -07:00
Debanjum Singh Solanky
09f739b8cc Null check config, log warning instead of error when configuring search 2023-06-27 15:48:48 -07:00
sabaimran
9d62d66a77 Simplify construction of repo shorthand in GithubToJsonl 2023-06-27 15:05:03 -07:00
sabaimran
227169ebde Support configuration of multiple Github repositories in the settings interface
- Add cards to configure each of the Github repositories
- Fix a bug in the API which caused all other settings to be wiped when updating one of the content types
- Provide an error message to the user if they have a misconfiguration in their chat settings
2023-06-27 14:10:09 -07:00
sabaimran
37a1f15c38 Add backend support for indexing multiple repositories
- Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view
- Support indexing a list of repositories
2023-06-27 12:06:15 -07:00
sabaimran
ddd550e6f4 Add call to use X-CSRFToken in relevant POST methods 2023-06-26 12:38:00 -07:00
sabaimran
35e24d7851 Fix null checking in state for content config API and telemetry API 2023-06-26 11:37:34 -07:00
sabaimran
5e39421f56 Merge branch 'master' of github.com:debanjum/khoj 2023-06-25 11:41:47 -07:00
sabaimran
4410a3bb4b Limit max width of the pre tag to 100% of the screen width 2023-06-25 11:41:15 -07:00
sabaimran
ffe66b848a Use a single column tempalte for config plugins when in mobile 2023-06-25 11:27:41 -07:00
Debanjum Singh Solanky
b1890aa050 Null check intermediary objects when config not fully initialized 2023-06-24 15:34:18 -07:00
Debanjum Singh Solanky
946af0889d Improve showing status message on saving config via web interface
- Show success/failure status message much closer to the save button
  Previously status message was shown on top of the page, which wasn't
  always in view and wasn't easily seen
- Improve the status message to more clearly show next steps on success
2023-06-24 00:49:57 -07:00
Debanjum Singh Solanky
40d1abfe50 Update the new /config APIs to configure Khoj for first time users
- Setup state.config and sub-components from unset state
- Setup search types with default settings
2023-06-24 00:45:30 -07:00
Debanjum Singh Solanky
edabede93a Fix post configuration state update on error or success on config html 2023-06-23 14:52:25 -07:00
Debanjum Singh Solanky
4744d69221 Resolve button name, anchor tag feedback. Add status message to settings page
- Use "Configure" name for settings config action
- Use more standard anchor tag instead of button
- Add configure status message
2023-06-23 09:48:38 -07:00
Debanjum Singh Solanky
26abafa658 Highlight currently active tab in web interface for orientation 2023-06-22 00:33:28 -07:00
Debanjum Singh Solanky
2728c714d7 Put pico.css in local assets. Move common css styling into khoj.css 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
20a37697de Add Khoj header with navigation pane to Search and Chat Interfaces 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
c467a0cbb0 Update UI of config sub pages to use khoj lantern theme styling 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
0ce2ec590a Update main config page on khoj server to match khoj lantern theme 2023-06-21 20:25:25 -07:00
Debanjum Singh Solanky
d30a9ddd33 Use Khoj Logo on Search, Chat pages of Web Interface 2023-06-21 12:34:53 -07:00
Debanjum Singh Solanky
6d4aad57e1 Use new Khoj Lantern Logo in Web, Emacs, Obsidian UIs and Docs 2023-06-21 01:57:22 -07:00
Debanjum Singh Solanky
69d4fa6525 Rename project links across repo from debanjum/khoj to khoj-ai/khoj 2023-06-21 00:13:21 -07:00
Debanjum Singh Solanky
5c4eb950d5 Search across all content types via khoj.el on Emacs
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.

This results in search across all enabled asymmetric search text
content types
2023-06-20 23:39:56 -07:00
Debanjum Singh Solanky
2cd3e799d3 Improve null and type checks 2023-06-20 23:30:59 -07:00
Debanjum Singh Solanky
d5fb4196de Update web interface to allow querying all content types at once 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
5c7c8d1f46 Use async/await to fix parallelization of search across content types 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
1192e49307 Pass default value matching argument types expected by text_search methods 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
0144e610d6 Only search across content types that work with asymmetric search 2023-06-20 22:21:46 -07:00
Debanjum Singh Solanky
f6a7aa6c96 Style Khoj chat on web interface with new lantern theme
- Color khoj chat message with new yellow theme color
- Update Khoj chat emoji to lantern
- Add page type to title of pages on web interface
2023-06-20 01:39:33 -07:00
Debanjum Singh Solanky
6d94d6e75a Encode the asymmetric, symmetric search queries in parallel for speed
Use timer to measure time to encode queries and total search time
2023-06-20 01:18:17 -07:00
Debanjum Singh Solanky
d292dc03b3 Use new Khoj Logotype in Web interface 2023-06-20 01:13:06 -07:00
Debanjum Singh Solanky
db07362ca3 Encode user query as same across search types to speed up query time
- Add new filter abstract method to remove filter terms from query
- Use the filter method to remove filter terms, encode this defiltered
  query and pass it to the query methods of each search types

TODO: Encoding query is still taking 100-200 ms unlike before. Need to
investigate why
2023-06-19 23:29:54 -07:00
Debanjum Singh Solanky
285d17af2a Search in parallel across all enabled content types requested via API
- Update API to return content from all enabled content types when type
  is not set to specific type in HTTP request param
- To do this efficiently run the search queries in parallel threads
2023-06-19 23:29:06 -07:00
Debanjum Singh Solanky
79d325fbb6 Fix triggering @general queries in Khoj Chat 2023-06-19 23:05:33 -07:00
Debanjum Singh Solanky
e97a20d70c Set conversation type if query param set, else return chat history
Only initialize variables if query is not empty, to avoid unnecessary
compute, variable null checks etc.

Fixes #230
2023-06-19 19:59:16 -07:00
sabaimran
4722a2c16d Add Github configuration page and success notifications 2023-06-18 10:06:45 -07:00
sabaimran
668135c763 Merge branch 'master' of github.com:debanjum/khoj into features/pretty-config-page 2023-06-18 08:35:09 -07:00
sabaimran
81183a1fe1 Address misc PR comments and update logo in all clients
- Rename the new logo to reflect accuracy on size (e.g., 128x128)
- Update the icns file for Mac
- Update nomenclature in settings pages
2023-06-18 08:34:58 -07:00
Debanjum Singh Solanky
a44cde2865 Show hint to re-index vault if wonky results in Obsidian search modal
Remove spurious indentation in Obsidian styles.css

Resolves #207
2023-06-18 04:53:51 -07:00
Debanjum Singh Solanky
595cc5b0f5 Use printer icon for PDF logs. Only split lines if file at web link in web interface 2023-06-18 02:26:03 -07:00
Debanjum Singh Solanky
e31a540a5e Get all md files recursively in repository by passing recursive param
Previously the `get_markdown_files' method was only getting files at
root of the repository

Fix, improve logger messages in github to jsonl processor
2023-06-18 01:47:15 -07:00
Debanjum Singh Solanky
6fdac24416 Set page size to 100 to reduce requests required to Github API to 1/3
- Default is 30. So number of paginated requests required to get all
  items (commits, files) will reduce by 67%

- No need to increase page size for the get tree Github API request from
  `get_markdown_files'

  Get tree Github API doesn't support pagination and return 100K items
  in response. This should be way more than enough for our current
  use-cases
2023-06-18 01:44:36 -07:00
Debanjum Singh Solanky
87975e589a Fix passing auth token to Github API to increase rate limits by x85
- Previously wasn't prefixing "token" to PAT token in Auth header
  This resulted in the request being considered unauthenticated

- Unauthenticated requests to Github API are limited to 60 requests/hour
  Authenticated requests to Github API are allowed 5000 requests/hour
2023-06-18 01:19:26 -07:00
Debanjum Singh Solanky
9c70af960c Extract logic to get file content from Github into a separate method 2023-06-18 01:19:13 -07:00
Debanjum Singh Solanky
10d4c38ce9 Extract Wait for rate limit reset logic into a function for reuse 2023-06-18 01:06:46 -07:00
sabaimran
aad7f825e0 Remove music configuration 2023-06-17 21:23:56 -07:00
sabaimran
5f97afbfac Ignore type checks from mypy in subindexed fields 2023-06-17 16:53:36 -07:00
sabaimran
c2d46de8bc Add endpoint for regenerating directly from the config page and add music content-type 2023-06-17 15:47:33 -07:00
sabaimran
ded3100caf Update the configuration page to make config management easier
- Add a central configuration management page to make management of config details easier
- Add relevant api endpoints both for client and server to update/request data as necessary
- Attempt to update the favicon
2023-06-17 15:21:28 -07:00
Debanjum Singh Solanky
3f24e53b6e Render URL as link in web interface if file param of result is a web link 2023-06-17 04:26:40 -07:00
Debanjum Singh Solanky
63ec84ad78 Store Github URL of Markdown files on Github in file jsonl param 2023-06-17 04:23:01 -07:00
Debanjum Singh Solanky
0c1c7583b5 Handle pagination, API rate limits. Get all commits from Github repo 2023-06-17 04:21:39 -07:00
Debanjum Singh Solanky
31d17d0b22 Index commits message from repository with the github plugin 2023-06-17 02:59:54 -07:00
Debanjum Singh Solanky
c29c141a7e Use Github Rest API to index Markdown files in Github Repository
The Llama_Hub Github plugin is fairly limited.

The Github Rest API is well supported and can easily be extended to
index commit messages, issues, discussions, PRs etc.
2023-06-17 02:16:13 -07:00
Saba
ac96f43b1b Remove try-catch specific to Github plugin; consolidate GUI logic 2023-06-16 23:46:25 -07:00
Saba
019d3732de Rename orgmode_search to org_search 2023-06-13 16:06:54 -07:00
Saba
08d79f5ba4 Unify types used in Github and other text-based configs. Fix typing issues 2023-06-13 15:52:36 -07:00
Saba
a6cd96a6a9 Add a Github plugin which can be used to read from a Github repository 2023-06-13 14:40:06 -07:00
Debanjum
c68cde4803
Log clients calling API endpoints on Khoj server
- Make API endpoints on Khoj server accept `client` as request parameter
  - Khoj API endpoints: /chat, /search, /update
- Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server
  - Khoj clients: Emacs, Obsidian and Web
- Also log khoj server_version running to telemetry server
2023-06-09 18:36:49 +05:30
sabaimran
59fa48036f
Merge pull request #224 from debanjum/fix/message-exceeds-prompt-size
Pass truncated message as string in ChatMessage when exceeding max prompt size
2023-06-08 17:32:53 -07:00
Debanjum Singh Solanky
139a3ba060 Update server to log new server version field to telemetry db 2023-06-08 14:14:21 +05:30
Saba
5d5ebcbf7c Rename truncate messages method and update unit tests to simplify assertion logic 2023-06-06 23:25:43 -07:00
Saba
7119ed0849 Run pre-commit script 2023-06-05 19:29:23 -07:00
Saba
6212d7c2e8 Remove debug line 2023-06-05 19:00:25 -07:00
Saba
f65ff9815d Move message truncation logic into a separate function. Add unit tests with factory boy. 2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky
eb6175e9b0 Update description field in webmanifest of Khoj, Khoj Chat PWA 2023-06-06 01:53:42 +05:30
Debanjum Singh Solanky
bb2363f324 Set client request param when calling khoj server APIs from Web 2023-06-06 00:05:00 +05:30
Debanjum Singh Solanky
caab55fbdd Set client request param when calling khoj server APIs from Obsidian 2023-06-06 00:04:46 +05:30
Debanjum Singh Solanky
de2494154f Set client request param when calling khoj server APIs from Emacs 2023-06-06 00:02:10 +05:30
Debanjum Singh Solanky
168c11cea7 Make server API endpoints accept client as query param
- The chat, search and update API will accept client as request param.
- This will allow logging the client from which these APIs was called.
2023-06-05 23:57:08 +05:30
Debanjum Singh Solanky
8617cf1389 Push telemetry to Posthog to grok Khoj usage 2023-06-05 22:47:49 +05:30
Debanjum Singh Solanky
d13db2e666 Make old telemetry server forward requests to new server 2023-06-05 13:06:45 +05:30
Saba
5f4223efb4 Increase timeout to OpenAI call 2023-06-04 20:49:47 -07:00
Saba
0e63a90377 Fix the mechanism to retrieve the message content 2023-06-04 20:25:37 -07:00
Saba
f0efe0177e Pass truncated message as string in ChatMessage when exceeding max prompt size 2023-06-04 19:33:46 -07:00
Saba
068ee0ac5e Swap elif with else, as usage of this method does not use openai_api_key 2023-06-04 02:25:08 -07:00
Saba
6508379d7b Use api_key keyword argument to set the openai_api_key parameter for GPT 2023-06-04 00:57:00 -07:00
Debanjum Singh Solanky
7af8a56434 Remove filename from reference before rendering references in khoj.el
Fixes bug where actual reference heading in next line jumping out of
references footnote section
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
ec280067ef Do not retrieve relevant notes when having a general chat with Khoj
- This improves latency of @general chat by avoiding unnecessary
  compute
- It also avoids passing references in API response when they haven't
  been used to generate the chat response. So interfaces don't have to
  add logic to not render them unnecessarily
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
90439a8db1 Update Khoj subtitle to AI personal assistant for your digital brain 2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
e9ed7a19fd Update search prompt to extract PDF search type. Fix extract_question prompt 2023-06-02 10:06:03 +05:30
Debanjum Singh Solanky
bbe3bf9733 Render PDF search results in Khoj Obsidian interface
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
  - Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
e3892945d4 Render PDF search results in Khoj.el Emacs interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
85144006a1 Render PDF search results in khoj web interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
acd14a5e41 Wire up PDF to jsonl processor to Khoj server layer (API, config)
- Specify PDF content to index via khoj.yml
- Index PDF content on app start, reconfigure
- Expose PDF as a search type via API
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
286b500f66 Create PDF to JSONL processor using PyPDF and LangChain
Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts
throwing typing error for python 3.8, 3.9
2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky
1b3effd8e6 Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor 2023-06-01 09:13:31 +05:30