sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-02 20:03:01 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	e3cd8b4150	Only index files returned by input-filter globs in fs_syncer Ignore .org, .pdf etc. suffixed directories under `input-filter' from being evaluated as files. Explicitly filter results by input-filter globs to only index files, not directory for each text type Add test to prevent regression Closes #448	2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky	51363d280d	Do not configure khoj server for pull based indexing from khoj.el Do not make khoj server pull update index on Obsidian plugin load. Index is updated on push from plugin instead now/	2023-10-17 21:47:19 -07:00
Debanjum Singh Solanky	d9d133dfb9	Read text files as utf-8, instead of default os locale On Windows, the default locale isn't utf8. Khoj had regressed to reading files in OS specified locale encoding, e.g cp1252, cp949 etc. It now explicitly uses utf8 encoding to read text files for indexing Resolves #495, resolves #472	2023-10-17 21:47:19 -07:00
Debanjum	3d4576ae38	Fix encoding binary files for sync from the Desktop, Obsidian client (#506 ) - Fix encoding binary files like PDFs for sync from Desktop client - Fix encoding binary files like PDFs for sync from Obsidian client	2023-10-17 15:37:22 -07:00
Debanjum Singh Solanky	c8293998d9	Fix encoding binary files like PDFs for sync from Obsidian client Use readBinary to read binary files like PDFs instead of read	2023-10-17 15:08:30 -07:00
sabaimran	ba60c869c9	Fix encoding binary files like PDFs for sync from Desktop client Use readFileSync, Buffer to pass appropriately formatted binary data	2023-10-17 15:08:23 -07:00
Andrew Spott	3d7381446d	Changed globbing. Now doesn't clobber a users glob if they want to a… (#496 ) * Changed globbing. Now doesn't clobber a users glob if they want to add it, but will (if just given a directory), add a recursive glob. Note: python's glob engine doesn't support `{}` globing, a future option is to warn if that is included. * Fix typo in globformat variable * Use older glob pattern for plaintext files --------- Co-authored-by: Saba <narmiabas@gmail.com>	2023-10-17 11:26:06 -07:00
sabaimran	2646c8554d	Provide a default value to offline_chat configuration of the conversation processor	2023-10-17 10:35:22 -07:00
Debanjum Singh Solanky	b8976426eb	Update offline chat model config schema used by Emacs, Obsidian clients The server uses a new schema for the conversation config. The Emacs, Obsidian clients need to use this schema to update the conversation config	2023-10-17 07:01:35 -07:00
Debanjum	ecc6fbfeb2	Push Files to Index from Emacs, Obsidian & Desktop Clients using Multi-Part Forms (#499 ) ### Overview - Add ability to push data to index from the Emacs, Obsidian client - Switch to standard mechanism of syncing files via HTTP multi-part/form. Previously we were streaming the data as JSON - Benefits of new mechanism - No manual parsing of files to send or receive on clients or server is required as most have in-built mechanisms to send multi-part/form requests - The whole response is not required to be kept in memory to parse content as JSON. As individual files arrive they're automatically pushed to disk to conserve memory if required - Binary files don't need to be encoded on client and decoded on server ### Code Details ### Major - Use multi-part form to receive files to index on server - Use multi-part form to send files to index on desktop client - Send files to index on server from the khoj.el emacs client - Send content for indexing on server at a regular interval from khoj.el - Send files to index on server from the khoj obsidian client - Update tests to test multi-part/form method of pushing files to index #### Minor - Put indexer API endpoint under /api path segment - Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method - Improve emoji, message on content index updated via logger - Don't call khoj server on khoj.el load, only once khoj invoked explicitly by user - Improve indexing of binary files - Let fs_syncer pass PDF files directly as binary before indexing - Use encoding of each file set in indexer request to read file - Add CORS policy to khoj server. Allow requests from khoj apps, obsidian & localhost - Update indexer API endpoint URL to` index/update` from `indexer/batch` Resolves #471 #243	2023-10-17 06:05:15 -07:00
Debanjum Singh Solanky	6a4f1b2188	Add more client, request details in logs by index/update API endpoint	2023-10-17 05:43:29 -07:00
Debanjum Singh Solanky	5efae1ad55	Update indexer API endpoint query params for force, content type New URL query params, `force' and `t' match name of query parameter in existing Khoj API endpoints Update Desktop, Obsidian and Emacs client to call using these new API query params. Set `client' query param from each client for telemetry visibility	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	84654ffc5d	Update indexer API endpoint URL to index/update from indexer/batch New URL follows action oriented endpoint naming convention used for other Khoj API endpoints Update desktop, obsidian and emacs client to call this new API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	e347823ff4	Log telemetry for index updates via push to API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	05be6bd877	Clicking Update Index in Obsidian settings should push files to index Use the indexer/batch API endpoint to regenerate content index rather than the previous pull based content indexing API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	13a3122bf3	Stop configuring server to pull files to index from Obsidian client Obsidian client now pushes vault files to index instead	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	99a2c934a3	Add CORS policy to allow requests from khoj apps, obsidian & localhost Using fetch from Khoj Obsidian plugin was failing due to cross-origin request and method: no-cors didn't allow passing x-api-key custom header. And using Obsidian's request with multi-part/form-data wasn't possible either.	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	541cd59a49	Let fs_syncer pass PDF files directly as binary before indexing No need to do unneeded base64 encoding/decoding to pass pdf contents for indexing from fs_syncer to pdf_to_jsonl	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	d27dc71dfe	Use encoding of each file set in indexer request to read file Get encoding type from multi-part/form-request body for each file Read text files as utf-8 and pdfs, images as binary	2023-10-17 04:58:12 -07:00
Debanjum Singh Solanky	8e627a5809	Pass any files to be deleted to indexer API via Khoj Obsidian plugin - Keep state of previously synced files to identify files to be deleted - Last synced files stored in settings for persistence of this data across Obsidian reboots	2023-10-17 03:34:49 -07:00
Debanjum Singh Solanky	f2e293a149	Push Vault files to index to Khoj server using Khoj Obsidian plugin Use the multi-part/form-data request to sync Markdown, PDF files in vault to index on khoj server Run scheduled job to push updates to value for indexing every 1 hour	2023-10-17 03:05:30 -07:00
Debanjum Singh Solanky	6baaaaf91a	Test request body of multi-part form to update content index from khoj.el	2023-10-16 23:54:32 -07:00
Debanjum Singh Solanky	79b3f8273a	Make khoj.el send files to be deleted from index to server	2023-10-16 23:53:02 -07:00
Debanjum Singh Solanky	f64fa06e22	Initialize the Khoj Transient menu on first run instead of load This prevents Khoj from polling the Khoj server until explicitly invoked via `khoj' entrypoint function. Previously it'd make a request to the khoj server every time Emacs or khoj.el was loaded Closes #243	2023-10-16 19:11:46 -07:00
Debanjum	b4949f7f0b	Improve Offline Chat Model Experience (#494 ) - Make offline chat model user configurable. Use `filename` of any [GPT4All supported model](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models.json) like below: - Run GPT4All Chat Model on GPU, when available via [GPT4All Vulcan support](https://blog.nomic.ai/posts/gpt4all-gpu-inference-with-vulkan) - Use default Llama 2 supported by GPT4All - Make `tokenizer` and `max-prompt-size` of chat model user configurable. E.g When using chat models not in [this pre-defined list](https://github.com/khoj-ai/khoj/blob/master/src/khoj/processor/conversation/utils.py) that support larger context window or a different tokenizer. Closes #406, #418	2023-10-16 17:44:49 -07:00
Debanjum Singh Solanky	644c3b787f	Scale no. of chat history messages to use as context with max_prompt_size Previously lookback turns was set to a static 2. But now that we support more chat models, their prompt size vary considerably. Make lookback_turns proportional to max_prompt_size. The truncate_messages can remove messages if they exceed max_prompt_size later This lets Khoj pass more of the chat history as context for models with larger context window	2023-10-16 17:22:28 -07:00
Debanjum Singh Solanky	df1d74a879	Use max_prompt_size, tokenizer from config for chat model context stuffing	2023-10-15 16:52:53 -07:00
Debanjum Singh Solanky	116595b351	Use chat_model specified in new offline_chat section of config - Dedupe offline_chat_model variable. Only reference offline chat model stored under offline_chat. Delete the previous chat_model field under GPT4AllProcessorConfig - Set offline chat model to use via config/offline_chat API endpoint	2023-10-15 16:37:49 -07:00
Debanjum Singh Solanky	feb4f17e3d	Update chat config schema. Make max_prompt, chat tokenizer configurable This provides flexibility to use non 1st party supported chat models - Create migration script to update khoj.yml config - Put `enable_offline_chat' under new `offline-chat' section Referring code needs to be updated to accomodate this change - Move `offline_chat_model' to `chat-model' under new `offline-chat' section - Put chat `tokenizer` under new `offline-chat' section - Put `max_prompt' under existing `conversation' section As `max_prompt' size effects both openai and offline chat models	2023-10-15 16:35:11 -07:00
sabaimran	c125995d94	[Multi-User]: Part 0 - Add support for logging in with Google (#487 ) * Add concept of user authentication to the request session via GoogleUser	2023-10-14 19:39:13 -07:00
Debanjum Singh Solanky	247e75595c	Use AutoTokenizer to support more tokenizers	2023-10-14 16:54:52 -07:00
Saba	ff2dbadc9d	Use computed plaintext_content to set file content rather than calling f.read again	2023-10-14 13:28:34 -07:00
Debanjum Singh Solanky	1ad8b150e8	Add default tokenizer, max_prompt as fallback for non-default offline chat models Pass user configured chat model as argument to use by converse_offline The proper fix for this would allow users to configure the max_prompt and tokenizer to use (while supplying default ones, if none provided) For now, this is a reasonable start.	2023-10-13 22:48:56 -07:00
Debanjum Singh Solanky	56bd69d5af	Improve Llama v2 extract questions actor and associated prompt - Format extract questions prompt format with newlines and whitespaces - Make llama v2 extract questions prompt consistent - Remove empty questions extracted by offline extract_questions actor - Update implicit qs extraction unit test for offline search actor	2023-10-13 22:48:56 -07:00
sabaimran	09bb3686cc	Strip the incoming query from the slash conversation command (#500 ) * Strip the incoming query from the slash conversation command before passing it to the model or for search * Return q when content index not loaded * Remove -n 4 from pytest ini configuration to isolate test failures	2023-10-13 21:11:23 -07:00
Debanjum Singh Solanky	96c0b21285	Sync desktop app package.json with other Khoj clients metadata - Make `bump_version.sh' script set version for the Khoj desktop app too - Sync Khoj desktop app authors, license, description and version with the other interfaces and server - Update description in packages metadata to match project subtitle on Github	2023-10-13 20:43:55 -07:00
sabaimran	80fb56b8a5	Sync deksktop app package version with the other releases	2023-10-13 19:23:00 -07:00
Debanjum Singh Solanky	b669aa2395	Clean and fix the content indexing code in the Emacs client - Pass payloads as unibyte. This was causing the request to fail for files with unicode characters - Suppress messages with file content in on index updates - Fix rendering response from server on index update API call - Extract code to populate body of index update HTTP request with files	2023-10-13 18:00:37 -07:00
Debanjum Singh Solanky	bea196aa30	Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method Previously global state of `url-request-method' would affect the kind of request made to api/config/data API endpoint as it wasn't being explicitly being set before calling the API endpoint This was done with the assumption that the default value of GET for url-request-method wouldn't change globally But in some cases, experientially, it can get changed. This was resulting in khoj.el load failing as POST request was being made instead which would throw error	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	292f0420ad	Send content for indexing on server at a regular interval from khoj.el - Allow indexing frequency to be configurable by user - Ensure there is only one khoj indexing timer running	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	fc99431754	Send files to index on server from the khoj.el emacs client - Add elisp variable to set API key to engage with the Khoj server - Use multi-part form to POST the files to index to the indexer API endpoint on the khoj server	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	68018ef397	Use multi-part form to send files to index on desktop client - Add typing for variables in for loop and other minor formatting clean-up - Assume utf8 encoding for text files and binary for image, pdf files	2023-10-12 20:58:49 -07:00
Debanjum Singh Solanky	7190b3811d	Remove all filter terms in user query from defiltered_query Previously only the the last filter's terms were getting effectively applied as the `filter.defilter' operation was being done on `user_query' but was updating the `defiltered_query'	2023-10-12 20:56:17 -07:00
Debanjum Singh Solanky	60e9a61647	Use multi-part form to receive files to index on server - This uses existing HTTP affordance to process files - Better handling of binary file formats as removes need to url encode/decode - Less memory utilization than streaming json as files get automatically written to disk once memory utilization exceeds preset limits - No manual parsing of raw files streams required	2023-10-11 23:58:23 -07:00
Debanjum Singh Solanky	9ba173bc2d	Improve emoji, message on content index updated via logger Use mailbox closed with flag down once content index completed. Use standard, existing logger messages in new indexer messages, when files to index sent by clients	2023-10-11 17:12:03 -07:00
Debanjum Singh Solanky	6aa69da3ef	Put indexer API endpoint under /api path segment Update FastAPI app router, desktop app and to use new url path to batch indexer API endpoint All api endpoints should exist under /api path segment	2023-10-09 21:35:58 -07:00
Debanjum Singh Solanky	f6f7a62d80	Wait for user to stop typing to trigger search from khoj.el in Emacs - Improves user experience by aligning idle time with search latency to avoid display jitter (to render results) while user is typing - Makes the idle time configurable Closes #480	2023-10-06 12:44:45 -07:00
sabaimran	5c4f0d42b7	Return new default config in API endpoint	2023-10-06 12:30:09 -07:00
sabaimran	052b25af0a	Update default configuration passed to Khoj clients to circumvent valiation issues	2023-10-06 12:29:15 -07:00
Debanjum Singh Solanky	a85ff941ca	Make offline chat model user configurable Only GPT4All supported Llama v2 models will work given the prompt structure is not currently configurable	2023-10-04 20:41:14 -07:00
Debanjum Singh Solanky	d1ff812021	Run GPT4All Chat Model on GPU, when available GPT4All now supports running models on GPU via Vulkan	2023-10-04 18:42:12 -07:00
Debanjum Singh Solanky	13b16a4364	Use default Llama 2 supported by GPT4All Remove custom logic to download custom Llama 2 model. This was added as GPT4All didn't support Llama 2 when it was added to Khoj	2023-10-03 19:01:54 -07:00
sabaimran	4a5ed7f06c	Update Khoj package version for Electron, Desktop app (#492 ) * Address package upgrade for Electron application * Update package version for Electron desktop application	2023-10-03 12:21:32 -07:00
sabaimran	3f962a55c3	Fix Linux Desktop Application (#491 ) * Use separate functions for adding files and folders to configuration for indexing * Add a loading bar while data is syncing * Bump the minor version for the application	2023-10-03 11:43:19 -07:00
sabaimran	63b3696af0	Release Khoj version 0.12.3	2023-09-26 22:41:11 -07:00
sabaimran	d2f9bca1cf	Fix null ref issue in query method and update logic for determining whether khoj is already configured in obsidian	2023-09-26 22:33:44 -07:00
sabaimran	2f18383349	Release Khoj version 0.12.2	2023-09-26 11:59:47 -07:00
sabaimran	588f35b6e9	Add max prompt size for gpt-3.5-turbo-16k	2023-09-26 10:57:35 -07:00
sabaimran	4e370d7a18	Release Khoj version 0.12.1	2023-09-26 09:24:53 -07:00
sabaimran	3675aa348a	Update naming of Khoj in manifest.json for Obsidian	2023-09-26 09:24:36 -07:00
sabaimran	a82d1becc3	Release Khoj version 0.12.0	2023-09-26 09:17:56 -07:00
sabaimran	38f0df3d53	Remove unused icons from electron app folder	2023-09-26 07:56:29 -07:00
sabaimran	5e16074b92	Fix comparison for search type in plugins mode	2023-09-25 10:57:17 -07:00
sabaimran	2dd15e9f63	Resolve issues with GPT4All and fix prompt for yesterday extract questions date filter (#483 ) - GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all - Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example. - Update all setup data in conftest.py to use new client-server indexing pattern	2023-09-18 14:41:26 -07:00
sabaimran	b225d1188c	Fix formatting of gpt.py	2023-09-18 11:09:02 -07:00
Jonny-GM	34b202b868	More lenient date searching (#481 ) * Modify DateFilter to use compiled entry key * Instruct search to include date in query * Minor prompt change * Prompt fix	2023-09-18 10:46:00 -07:00
sabaimran	16874e1953	Provide force fallback for regeneration	2023-09-12 16:35:07 -07:00
sabaimran	9f42a1a036	Propagate flags to configure index command	2023-09-11 10:33:44 -07:00
sabaimran	343854752c	Improve docker builds for local hosting (#476 ) * Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions * Move configure_search method into indexer * Add conditional installation for gpt4all * Add hint to go to localhost:42110 in the docs. Addresses #477	2023-09-08 17:07:26 -07:00
sabaimran	dccfae3853	Remove PySide dependency and deprecate desktop builds (#475 ) * Remove PySide, gui option from code * Remove pyside 6 dependency from code * Remove workflows which build desktop applications * Update unit tests and update line in documentation * Remove additional references to pyinstaller, gui * Add uninstall steps to normal uninstall instructions	2023-09-07 11:36:27 -07:00
sabaimran	76562f4250	Add front-end Electron application for Khoj local file syncing (#473 ) * Initial version - setup a file-push architecture for generating embeddings with Khoj * Use state.host and state.port for configuring the URL for the indexer * Fix parsing of PDF files * Read markdown files from streamed data and update unit tests * On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system * Init: refactor indexer/batch endpoint to support a generic file ingestion format * Add features to better support indexing from files sent by the desktop client * Initial commit with Electron application - Adds electron app * Add import for pymupdf, remove import for pypdf * Allow user to configure khoj host URL * Remove search type configuration from index.html * Use v1 path for current indexer routes	2023-09-06 12:04:18 -07:00
bholagabbar	205dc90746	Fix notion title bug (#474 ) * Update notion_to_jsonl.py * Fix try-catch block	2023-09-05 10:47:42 -07:00
sabaimran	4854258047	Move to a push-first model for retrieving embeddings from local files (#457 ) * Initial version - setup a file-push architecture for generating embeddings with Khoj * Update unit tests to fix with new application design * Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request * Use state.host and state.port for configuring the URL for the indexer * On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system	2023-08-31 12:55:17 -07:00
sabaimran	92cbfef7ab	Skip plaintext file indexing if there's a parsing issue and log the file	2023-08-29 14:34:08 -07:00
sabaimran	74409c2c64	Release Khoj version 0.11.4	2023-08-29 11:44:35 -07:00
sabaimran	1b85958bcc	trim chat input start	2023-08-28 19:18:10 -07:00
sabaimran	e592f6eac8	Release Khoj version 0.11.3	2023-08-28 14:46:03 -07:00
sabaimran	7c35da9fc4	Fix bug in /chat endpoint for general and update depdendencies	2023-08-28 14:12:11 -07:00
sabaimran	bc09143856	Release Khoj version 0.11.2	2023-08-28 10:16:13 -07:00
Debanjum Singh Solanky	01b310635e	Enable passing search query filters via chat and test it	2023-08-28 09:24:32 -07:00
Debanjum Singh Solanky	794bad8bcb	Make date_filter.extract_date_range method always return a list type	2023-08-28 00:55:28 -07:00
Debanjum Singh Solanky	d5a2de6222	Add method to extract filter terms from query to all filters - Test the get_filter_term method in all 3 word, file, date filters - Make the existing can_filter method by default in base filter abstract class	2023-08-28 00:55:28 -07:00
Debanjum	150105505b	Add Default chat command. Make Khoj ask clarifying questions (#468 ) - Make Khoj ask clarifying questions when answer not in provided context - Add default conversation command to auto switch b/w general, notes modes - Show filtered list of commands available with the currently input text - Use general prompt when no references found and not in Notes mode - Test general and notes slash commands in offline chat director tests	2023-08-28 00:52:57 -07:00
Debanjum Singh Solanky	eb6cd4f8d0	Use general prompt when no references found and not in Notes mode	2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky	edffbad837	Make Khoj ask clarifying questions when answer not in provided context Previously it would just refuse ask for clarification. This improves the chat quality score for the existing director tests	2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky	75c1016ec0	Show filtered list of commands available with the currently input text	2023-08-28 00:46:10 -07:00
Debanjum Singh Solanky	74605f6159	Add default conversation command to auto switch b/w general, notes modes This was the default behavior but behavior regressed when adding slash commands in PR #463	2023-08-28 00:46:10 -07:00
sabaimran	cbc978ea08	Update help links for notion, github to point to the main docs	2023-08-27 15:02:55 -07:00
sabaimran	b45e1d8c0d	Fix plaintext HTML parsing and rendering (#464 ) * Store conversation command options in an Enum * Move to slash commands instead of using @ to specify general commands * Calculate conversation command once & pass it as arg to child funcs * Add /notes command to respond using only knowledge base as context This prevents the chat model to try respond using it's general world knowledge only without any references pulled from the indexed knowledge base * Test general and notes slash commands in openai chat director tests --------- Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>	2023-08-27 11:24:30 -07:00
Debanjum	7919787fb7	Use Slash Commands and Add Notes Slash Command (#463 ) * Store conversation command options in an Enum * Move to slash commands instead of using @ to specify general commands * Calculate conversation command once & pass it as arg to child funcs * Add /notes command to respond using only knowledge base as context This prevents the chat model to try respond using it's general world knowledge only without any references pulled from the indexed knowledge base * Test general and notes slash commands in openai chat director tests * Update gpt4all tests to use md configuration * Add a /help tooltip * Add dynamic support for describing slash commands. Remove default and treat notes as the default type --------- Co-authored-by: sabaimran <narmiabas@gmail.com>	2023-08-26 18:11:18 -07:00
sabaimran	e64357698d	Skip indexing single bad markdown, plaintext file (#460 )	2023-08-23 15:34:56 -07:00
sabaimran	84bd579077	Format the chat outputted message with code, bolding, or italics. Add a copy button for code. Closes #445 .	2023-08-19 20:02:57 -07:00
sabaimran	f9e09ba490	Do not try downloading model from GPT4All if the user is not connected to the internet	2023-08-19 19:09:21 -07:00
Debanjum Singh Solanky	3ff4e19dd2	Release Khoj version 0.11.1	2023-08-16 22:53:29 -07:00
sabaimran	4fb8c2c5e1	Pass a SIGTERM to tell the uvicorn server to exit and gracefully kill the thread	2023-08-16 21:27:05 -07:00
sabaimran	4e03dfea43	Attach the parent to the server thread, allowing the kill signal to trigger a graceful exit (#446 )	2023-08-16 19:36:10 -07:00
Debanjum Singh Solanky	26c3977fb9	Remove info hint to reindex khoj on unexpected search results The index corruption was issue resolved a while ago in #325 and hasn't cropped up again	2023-08-16 00:58:59 -07:00
sabaimran	def909a913	Revert "Open Web interface within Desktop app in GUI mode" (#444 )	2023-08-15 23:26:28 -07:00
sabaimran	6562ec6531	Release Khoj version 0.11.0	2023-08-14 19:25:03 -07:00
sabaimran	0ea901c7c1	Allow indexing to continue even if there's an issue parsing a particular org file (#430 ) * Allow indexing to continue even if there's an issue parsing a particular org file * Use approximation in pytorch comparison in text_search UT, skip additional file parser errors for org files * Change error of expected failure	2023-08-14 07:56:33 -07:00
sabaimran	7b907add77	Add support for indexing plaintext files (#420 ) * Add support for indexing plaintext files - Adds backend support for parsing plaintext files generically (.html, .txt, .xml, .csv, .md) - Add equivalent frontend views for setting up plaintext file indexing - Update config, rawconfig, default config, search API, setup endpoints * Add a nifty plaintext file icon to configure plaintext files in the Web UI * Use generic glob path for plaintext files. Skip indexing files that aren't in whitelist	2023-08-09 15:44:40 -07:00
Ellen7ions	26bddcb65c	Add support for starting a new line with shift-enter (#412 ) * Add support for starting a new line with shift-enter * Remove useless comments. Set font-size: medium. * Update src/khoj/interface/web/chat.html Update the styling to have the padding, margin and line-height like before. Co-authored-by: Debanjum <debanjum@gmail.com> * Update src/khoj/interface/web/chat.html Make the chat-body scroll to the bottom after resizing Co-authored-by: Debanjum <debanjum@gmail.com> --------- Co-authored-by: Debanjum <debanjum@gmail.com>	2023-08-07 19:49:07 -07:00
Debanjum Singh Solanky	97609e4995	Use 500px png of khoj logo instead svg for much smaller asset size The khoj logo svg was 1.3Mb. The 500px png of it is 38Kb. Given all usage of khoj-logo are below 230px this should work fine	2023-08-07 18:27:11 -07:00
Debanjum	14a816d173	Open Web interface within Desktop app in GUI mode (#429 ) Previously the GUI mode (with khoj --gui or using the desktop app) would open the web interface in the users default web browser. Now the web interface is just rendered within the app itself using PyQT's Webview. This gives it a more proper app like feel	2023-08-07 17:48:30 -07:00
Debanjum Singh Solanky	378b96ec1b	Open the khoj app window maximized on startup	2023-08-07 15:39:05 -07:00
Debanjum Singh Solanky	ea734ba1c8	Open app in native view on starting it in GUI mode instead of on web browser - Opens settings page on first run and landing page after in GUI mode Previously was only opening the GUI on linux after first run as it doesn't have a system tray - Both the views are from the web interface but are rendered within the app instead of the browser	2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky	9c494705a8	Open the search, chat or config view in app from the system tray menu	2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky	cc36b87345	Render the web interface directly within the desktop app as a webview	2023-08-07 13:41:12 -07:00
Jason Qin	3ef1b7073d	Update obsidian/manifest.json Closes #426	2023-08-07 10:41:39 -07:00
sabaimran	738cf650b3	Explicitly set Khoj to use the default locale of the user (#425 ) - Explicitly set locale using `locale.setLocale(locale.LC_ALL, '')` for localization. Relevant for datetime libraries. See [Python 3 documentation](https://docs.python.org/3/library/locale.html#locale.setlocale).	2023-08-07 09:23:24 -07:00
Muftawo	c8ef619090	fixed reference link to landing page (#417 ) * Fixed zsh error no matches found * Fixed home page 404 error	2023-08-04 10:38:14 -07:00
sabaimran	78012b8111	Avoid null ref issue when setting model state for web UI. Closes #410	2023-08-03 00:39:06 -07:00
sabaimran	0baed742e4	Add checksums to verify the correct model is downloaded as expected (#405 ) * Add checksums to verify the correct model is downloaded as expected - This should help debug issues related to corrupted model download - If download fails, let the application continue * If the model is not download as expected, add some indicators in the settings UI * Add exc_info to error log if/when download fails for llamav2 model * Simplify checksum checking logic, update key name in model state for web client	2023-08-02 23:26:52 -07:00
Debanjum Singh Solanky	e6e3acdbe4	Release Khoj version 0.10.1	2023-08-01 23:55:13 -07:00
Debanjum Singh Solanky	7c1d70aa17	Bump GPT4All response generation batch size to 512 from 256 A batch size of 512 performs ~20% better on a XPS with no GPU and 16Gb RAM. Seems worth the tradeoff for now	2023-08-01 23:34:02 -07:00
Debanjum	16c6bfce8e	Improve Quality and Reliability of Offline Chat (#393 ) # Incoming ## Major ### Fix Prompt Size Exceeded Issue - Fix issues related to prompt size, Closes #386. Use the correct tokenizer to calculate whether the input needs to be truncated or not. ### Improve Llama 2 Model Download - Use the correct download link for LlamaV2 -- should have been using the small model, but was using the medium - Add better downloading logic to retry download if it failed, Closes #379 ### Fix Segmentation Fault due to Race - Add a lock around generating chat responses from the offline model to avoid segmentation faults. Closes #367. - Add a loading symbol to the web chat UI when the model is thinking. Closes #392 ### Improve Chat Response Latency - Improve performance of offline chat by increasing batch size (via `n_batch`) to automatically engage more cores/GPU, using smaller model and fixing prompt vs response token generation numbers. Closes #363 ### Fix Fake Dialogue Continuation - Fix formatting of user query with offline chat, this was contributing to #398 - Stop Llama 2 from Creating Fake Dialogue Continuations. Closes #398 ## Minor - Improve default message for Chat window on web when it's not configured. Include hint to use offline chat. - Add null check in `perform_chat_checks` method - Add offline chat director unit tests ## Performance Analysis (Time to First Token) \| \| v0.10.0 \| this branch \| \|-\|-\|-\| \| Query 1 \| 52s \| 28s \| \| Query 2 \| 33s\| 42s \| \| Query 3 \| 67s\| 38s\|	2023-08-01 22:07:27 -07:00
Debanjum Singh Solanky	44292afff2	Put offline model response generation behind the chat lock as well Not just the chat response streaming	2023-08-01 21:53:52 -07:00
Debanjum Singh Solanky	1812473d27	Extract new schema version for each migration script into a variable This should ease readability, indicates which version this migration script will update the schema to once applied	2023-08-01 21:41:08 -07:00
Debanjum Singh Solanky	b9937549aa	Simplify migration scripts management. Make them use static version - Only make them update config when it's run conditions are satisfies - Use static schema version to simplify reasoning about run conditions	2023-08-01 21:28:20 -07:00
Debanjum Singh Solanky	185a1fbed7	Remove old chat setup timer. It is mislabelled, irrelevant since streaming	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	c2b7a14ed5	Fix context, response size for Llama 2 to stay within max token limits Create regression text to ensure it does not throw the prompt size exceeded context window error	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	6e4050fa81	Make Llama 2 stop generating response on hitting specified stop words It would previously some times start generating fake dialogue with it's internal prompt patterns of <s>[INST] in responses. This is a jarring experience. Stop generation response when hit <s> Resolves #398	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	aa6846395d	Fix offline model migration script to run for version < 0.10.1 - Use same batch_size in extract question actor as the chat actor - Log final location the chat model is to be stored in, instead of it's temp filename while it is being downloaded	2023-08-01 20:51:53 -07:00
Ikko Eltociear Ashimine	49abb9df9c	Fix typo in orgnode.py (#397 ) Fix spelling of Ouput in org parser property drawer comment to Output.	2023-08-01 19:54:57 -07:00
sabaimran	f409e16137	Update some of the extract question prompts for llamav2	2023-08-01 12:23:36 -07:00
sabaimran	b11b00a9ff	Add log line for time to first response	2023-08-01 10:57:38 -07:00
sabaimran	778df6be71	Add a logline when the offline model migration script runs	2023-08-01 09:27:42 -07:00
sabaimran	3a5d93d673	Add migration script for getting the new offline model	2023-08-01 09:25:05 -07:00
sabaimran	90efc2ea7a	Update comments and add explanations	2023-08-01 09:24:03 -07:00
sabaimran	f7e03f6d63	Switch spinner snake case -> camel case	2023-08-01 08:52:25 -07:00
sabaimran	1c52a6993f	add a lock around chat operations to prevent the offline model from getting bombarded and stealing a bunch of compute resources - This also solves #367	2023-08-01 00:23:17 -07:00
sabaimran	6c3074061b	Disable the input bar when chat response is in flight	2023-08-01 00:21:39 -07:00
sabaimran	c14cbe926a	Add a loading symbol to web chat. Closes #392	2023-07-31 23:35:48 -07:00
sabaimran	8054bdc896	Use n_batch parameter to increase resource consumption on host machine (and implicitly engage GPU)	2023-07-31 23:25:08 -07:00
sabaimran	e55e9a7b67	Fix unit tests and truncation logic	2023-07-31 21:37:59 -07:00
sabaimran	2335f11b00	Add better error handling for download processes incase of failure	2023-07-31 21:07:38 -07:00
sabaimran	209975e065	Resolve merge conflicts: let Khoj fail if the model tokenizer is not found	2023-07-31 19:12:26 -07:00
sabaimran	2d6c3cd4fa	Misc. quality improvements for Llama V2 - Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S - Use a proper Llama Tokenizer for counting tokens for truncation with Llama - Add additional null checks when running	2023-07-31 19:11:20 -07:00
sabaimran	ca195097d7	Update chat hint message at first run	2023-07-31 17:46:09 -07:00
Debanjum Singh Solanky	ded606c7cb	Fix format of user query during general conversation with Llama 2	2023-07-31 17:21:14 -07:00
Debanjum Singh Solanky	48e5ac0169	Do not drop system message when truncating context to max prompt size Previously the system message was getting dropped when the context size with chat history would be more than the max prompt size supported by the cat model Now only the previous chat messages are dropped or the current message is truncated but the system message is kept to provide guidance to the chat model	2023-07-31 17:21:14 -07:00
sabaimran	88ef86ad5c	Fix typing issues for mypy (#372 )	2023-07-30 19:27:48 -07:00
sabaimran	ca2c942b65	Add typing to compiled_references and inferred_queries	2023-07-30 19:10:30 -07:00
sabaimran	3646fd1449	Add a warning to indicate that Khoj is not configured to work with personal data sources	2023-07-30 18:52:10 -07:00
sabaimran	996832dc72	Allow user to chat even if content types aren't configured - use empty references	2023-07-30 18:47:45 -07:00
Debanjum Singh Solanky	53810a0ff7	Create khoj config dir if non-existant, before writing to khoj env file	2023-07-30 01:35:36 -07:00
sabaimran	f65d157244	Release Khoj version 0.10.0	2023-07-28 19:27:47 -07:00
Debanjum Singh Solanky	f76af869f1	Do not log the gpt4all chat response stream in khoj backend Stream floods stdout and does not provide useful info to user	2023-07-28 19:14:04 -07:00
sabaimran	5ccb01343e	Add Offline chat to Obsidian (#359 ) * Add support for configuring/using offline chat from within Obsidian * Fix type checking for search type * If Github is not configured, /update call should fail * Fix regenerate tests same as the update ones * Update help text for offline chat in obsidian * Update relevant description for Khoj settings in Obsidian * Simplify configuration logic and use smarter defaults	2023-07-28 18:47:56 -07:00
Debanjum	b3c1507708	Merge pull request #361 from khoj-ai/configure-offline-chat-from-emacs - Configure using Offline Chat from Emacs: - Enable, Disable Offline Chat from Emacs - Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup - Benefits: Offline chat models are better for privacy but not great at answering questions	2023-07-28 18:06:58 -07:00
sabaimran	9f78db0579	Let Offline chat override OpenAI API settings (#362 ) * Let Offline chat override OpenAI API settings * Download the offline model whenever offline chat is enabled * Add progressbar for download for llamav2 model to track progress * Change ordering of n due to switch of default processor * Flip ordering of offline/openai checks when extracting questions from query	2023-07-28 17:26:20 -07:00
Debanjum Singh Solanky	ebfbef1f68	Configure using offline chat from Emacs Closes #358	2023-07-28 16:07:33 -07:00
sabaimran	29081f4429	Adjust parameters for offline chat	2023-07-27 22:22:09 -07:00
sabaimran	124d97c26d	Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352 ) * Working example with LlamaV2 running locally on my machine - Download from huggingface - Plug in to GPT4All - Update prompts to fit the llama format * Add appropriate prompts for extracting questions based on a query based on llama format * Rename Falcon to Llama and make some improvements to the extract_questions flow * Do further tuning to extract question prompts and unit tests * Disable extracting questions dynamically from Llama, as results are still unreliable	2023-07-27 20:51:20 -07:00
Debanjum Singh Solanky	715d56d4f0	Use new schema to update khoj.yml config from khoj.el	2023-07-26 17:34:16 -07:00
sabaimran	8b2af0b5ef	Add support for our first Local LLM 🤖🏠 (#330 ) * Add support for gpt4all's falcon model as an additional conversation processor - Update the UI pages to allow the user to point to the new endpoints for GPT - Update the internal schemas to support both GPT4 models and OpenAI - Add unit tests benchmarking some of the Falcon performance * Add exc_info to include stack trace in error logs for text processors * Pull shared functions into utils.py to be used across gpt4 and gpt * Add migration for new processor conversation schema * Skip GPT4All actor tests due to typing issues * Fix Obsidian processor configuration in auto-configure flow * Rename enable_local_llm to enable_offline_chat	2023-07-26 16:27:08 -07:00
sabaimran	23d77ee338	Fix import issues in desktop image builds (#343 )	2023-07-26 15:45:52 -07:00
Debanjum Singh Solanky	7722a9c347	Default to using the gpt-3.5-turbo model for chat from khoj.el	2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky	f0d4a4cf9a	Revert "Make configure_content functional. Do not pass content index state to it." This reverts commit `2ddee7e745` as it broke partial updates of the content index for just the specified content types	2023-07-21 13:59:09 -07:00
sabaimran	82c725817e	Merge branch 'master' of github.com:khoj-ai/khoj	2023-07-21 13:24:05 -07:00
sabaimran	596e11ec6d	Use the same function for computing entries for IDs regardless of whether it has prev entries	2023-07-21 13:23:56 -07:00
Debanjum Singh Solanky	2ddee7e745	Make configure_content functional. Do not pass content index state to it.	2023-07-20 23:24:08 -07:00
sabaimran	1610d2ebd9	📝 Add a documentation base for Khoj! (#333 ) * Add docs for more organized, accessible information detailing Khoj setup * Delete duplicated files * Add a coverpage without enabling it. Add logo and theme * Remove obsidian README.md * Add plausible script to index.html via docsify	2023-07-20 22:34:25 -07:00
Debanjum Singh Solanky	3e59be7f1d	Release Khoj version 0.9.0	2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky	d078e7b1f6	Clean up search type usage in khoj server, tests and Readme	2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky	4d910936b7	Fix triggering index update on khoj server from khoj.el	2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky	5c7d7f558d	Make AI model used for Khoj chat configurable from khoj.el - Fix bug. Set the unused model-name to a standad default value	2023-07-18 19:57:54 -07:00
Debanjum	5f2be2a9bb	Merge pull request #298 from HyunggyuJang/patch-1 Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config	2023-07-18 17:54:11 -07:00
Debanjum Singh Solanky	429e1b4b48	Regenerate index to apply corruption fixes on first run of new khoj	2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky	83e1088d42	Manage khoj.yml config migrations on app start. Version the schema - Add version to khoj.yml schema Versioning the khoj.yml config schema will simplify future migrations	2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky	71e8ddd9a2	Check if PDF is configured before showing it as an option in khoj.el	2023-07-17 15:49:20 -07:00
Debanjum	d00c5da8b7	Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing ## Stabilize and Simplify Content Indexing ### Major Updates - `9bcca43` Unify logic to update entries when indexing from scratch or incrementally - `89c7819` Unify logic to update embeddings when indexing from scratch or incrementally - `6a0297c` Stable sort new entries when marking entries for update - `58d86d7` Unify logic to configure server from API or on server start - Create tests to ensure old entries, embeddings in index are unaffected on adding new entries - Refer: `1482fd4`, `7669b85`, `88d1a29` - `ad41ef3` Make normalization of embeddings configurable to test this in `c73feeb` ### Minor Updates - `1673bb5` Add todo state to compiled form of each entry - `6e70b91` Remove unused `dump_jsonl` helper method - `7ad9603` Improve naming of lock - `b02323a` Improve naming text search test methods Resolves #190	2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky	3e3a1ecbc8	Start app even if server init fails to let user fix it Show stacktrace on error to help debugging	2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky	ef6a0044f4	Drop embeddings of deleted text entries from index Previously the deleted embeddings would continue to be in the index, even after the entry was deleted	2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky	ad41ef3991	Make normalizing embeddings configurable	2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky	89c7819cb7	Unify logic to generate embeddings from scratch and incrementally This simplifies the `compute_embeddings' method and avoids potential later divergence in handling the index regenerate vs update scenarios	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6a0297cc86	Stable sort new entries when marking entries for update	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6e70b914c2	Remove unused dump_jsonl method The entries index is stored ingzipped jsonl files for each content type	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	9bcca43299	Use single func to handle indexing from scratch and incrementally Previous regenerate mechanism did not deduplicate entries with same key So entries looked different between regenerate and update Having single func, mark_entries_for_update, to handle both scenarios will avoid this divergence Update all text_to_jsonl methods to use the above method for generating index from scratch	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	1673bb5558	Add todo state to compiled form of each org-mode entry	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	7ad96036b0	Improve lock name to config_lock instead of search_index_lock It is used to lock updates to all app config state, including processor	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	58d86d7876	Use single func to configure server via API and on server start Improve error messages on failure to configure server components	2023-07-16 01:45:53 -07:00
sabaimran	a15711e635	Fix null type checks in get /config	2023-07-15 15:53:56 -07:00
sabaimran	e590d75b20	Start Khoj even when config is not valid (#320 ) * Add icon to indicate bad config, start Khoj even if there was an issue setting up the index	2023-07-15 14:11:54 -07:00
sabaimran	49ab201c30	Fix issues importing PySide in Docker container (#322 ) * Rather than installing PyQT dependencies, remove codepaths that require pyqt files in no-gui mode	2023-07-15 13:33:13 -07:00
sabaimran	ba47f2ab39	Merge branch 'master' of github.com:debanjum/khoj	2023-07-14 22:28:05 -07:00
sabaimran	874cffd256	Add additional support for parsing notion workspaces	2023-07-14 22:27:56 -07:00
Debanjum	52f68167ce	Merge pull request #317 from khoj-ai/reduce-memory-consumption-by-search-model-duplication Reuse Search Models across Content Types to reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled text content type - Split model state into 2 separate state objects, `search_models` and `content_index`. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - The change should cut down memory utilization quite a bit for most users. I see a >50% drop in memory utilization on my Khoj instance. But this will vary for each user based on the amount of content indexed vs number of plugins enabled. - This change does not solve the RAM utilization scaling with size of the index, as the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 19:54:12 -07:00
Debanjum Singh Solanky	f08e9539f1	Release lock after updating index even if update fails to prevent deadlock Wrap acquire/release locks in try/catch/finally when updating content index and search models to prevent lock not being released on error and causing a deadlock	2023-07-14 16:57:27 -07:00
sabaimran	37f7f9fd1d	Add additional telemetry for system understanding (#316 ) * Add additional telemetry in order to understand which data sources are the most useful * Make actions side by side in the configuration page * Restore main run command * Update links to point to wiki pages for Github, Notion integrations * Stanardize nomenclature of the api_type to use _config suffix Remove header fields that aren't actually helpful for understanding config usage	2023-07-14 10:14:07 -07:00
Debanjum Singh Solanky	86e2bec9a0	Reuse Search Models across Content Types to Reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types as well. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled content type - Split model state into 2 separate state objects, `search_models' and `content_index'. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - This should cut down memory utilization quite a bit for most users. I see a ~50% drop in memory utilization. This will, of course, vary for each user based on the amount of content indexed vs number of plugins enabled - This does not solve the RAM utilization scaling with size of the index. As the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 01:27:22 -07:00
Debanjum	b2718d330c	Merge pull request #304 from migrate-from-pyqt-to-pyside Migrate from PyQT6 to PySide6	2023-07-13 11:54:47 -07:00
sabaimran	31e933207f	Set default values for sys.stdout if they're unavailable	2023-07-12 22:22:49 -07:00
Debanjum Singh Solanky	9c76150895	Migrate from PyQT6 to PySide6	2023-07-11 18:43:44 -07:00
HyunggyuJang	88c42b3043	Encode data as utf-8 otherwise it will complain, see `1c85531090`	2023-07-11 17:06:05 +09:00
Debanjum Singh Solanky	f664a74e77	Update Khoj server to run on non standard port, 42110 instead of 8000 Resolves #295	2023-07-10 21:27:58 -07:00
sabaimran	effb52f859	Fix demo rendering with the new header	2023-07-10 21:16:19 -07:00
sabaimran	55f5be7b03	Release Khoj version 0.8.2	2023-07-10 14:39:32 -07:00
sabaimran	9a63f89f33	Merge branch 'master' of github.com:debanjum/khoj	2023-07-10 14:31:19 -07:00
sabaimran	53809298c0	Release Khoj version 0.8.1	2023-07-10 14:30:04 -07:00
tjsousa	5b37e988e6	Allow using configured GPT chat model (#292 ) My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.	2023-07-10 14:24:40 -07:00
Debanjum Singh Solanky	75ff871217	Release Khoj version 0.8.0	2023-07-10 13:37:51 -07:00
Debanjum Singh Solanky	979088b3dc	Add tooltip helper text on web settings page buttons - Provide more details on what clicking configure, initialize buttons or changing the results count slider does - This shows up on user hovering over those buttons	2023-07-10 13:32:41 -07:00
Debanjum Singh Solanky	255781e135	Use relative link on logo to jump to correct page on local and cloud	2023-07-10 13:22:20 -07:00
Debanjum Singh Solanky	b2d229c116	Move header pane style to base khoj.css for reuse. Fix logo size	2023-07-10 13:10:17 -07:00
Debanjum Singh Solanky	20cb314171	Open the Khoj config page in the browser on first run	2023-07-10 12:10:20 -07:00
sabaimran	07cf5a214a	Check if PDF files are present in the Obsidian vault before initializing the Khoj configuration (#293 )	2023-07-10 10:33:04 -07:00
sabaimran	7364bac8ae	Make the header take up less space - Use a single row for the header - Needed custom styling for each page because each of them are different in subtle ways, unfortunately	2023-07-09 22:31:37 -07:00
sabaimran	62704cac09	Add a plugin which allows users to index their Notion pages (#284 ) * For the demo instance, re-instate the scheduler, but infrequently for api updates - In constants, determine the cadence based on whether it's a demo instance or not - This allow us to collect telemetry again. This will also allow us to save the chat session * Conditionally skip updating the index altogether if it's a demo isntance * Add backend support for Notion data parsing - Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token - Make corresponding updates to the default config, raw config to support the new notion addition * Add corresponding views to support configuring Notion from the web-based settings page - Support backend APIs for deleting/configuring notion setup as well - Streamline some of the index updating code * Use defaults for search and chat queries results count * Update pagination of retrieving pages from Notion * Update state conversation processor when update is hit * frequency_penalty should be passed to gpt through kwargs * Add check for notion in render_multiple method * Add headings to Notion render * Revert results count slider and split Notion files by blocks * Clean/fix misc things in the function to update index - Use the successText and errorText variables appropriately - Name parameters in function calls - Add emojis, woohoo * Clean up and further modularize code for processing data in Notion	2023-07-09 15:29:26 -07:00
Debanjum	77755c0284	Fix Packaging the Khoj Desktop Apps (#289 ) * Add langchain static files and pytorch metadata to Khoj native app * Add pillow static files, metadata & hidden imports to Khoj native app * Fix path to web interface static files on Khoj native app * Add tiktoken hidden imports to make chat work from Khoj native app * Fix Khoj native app to run with GUI mode enabled This got broken when we moved from using the --no-gui flag to using --gui in https://github.com/khoj-ai/khoj/pull/263	2023-07-09 10:21:16 -07:00
sabaimran	4c135ea316	Make streaming optional for the /chat endpoint (#287 ) * Update the /chat endpoint to conditionally support streaming - If streams are enabled, return the threadgenerator as it does currently - If stream is disabled, return a JSON response with the response/compiled references separated out - Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian - Rename chat/init/ to chat/history * Update khoj.el to use the /history endpoint - Update corresponding unit tests to use stream=true * Remove & from call to /chat for obsidian * Abstract functions out into a helpers.py file and clean up some of the error-catching	2023-07-09 10:12:09 -07:00
Debanjum Singh Solanky	0a86220d42	Use default values, delete content config on disable and update state	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	362063f5fe	By default, connect to Khoj server over IPv4 from Obsidian plugin	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	571e8c2548	Add rerank, index corruption hint on search page of web interface Similar to the hint alrady in the Obsidian search modal Closes #272	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	61e131f95c	Hide unused model field from chat settings on web interface	2023-07-07 18:43:53 -07:00
Debanjum Singh Solanky	af30d01e85	Move to newer chat models to extract questions & summarize chats Deprecate usage of the older gpt3 models in-place of the newer chat based models - text-davinci-003 is only 50% cheaper than gpt4 and less reliable for question extraction - Using gpt-3.50turbo for summarization should reduce cost of chat - Keep conversation.chat_session as a list instead of a string - Update completion_with_backoff func to use ChatML format	2023-07-07 17:32:27 -07:00
Debanjum Singh Solanky	171ce19e1f	Update date filter to allow quoting values in single quotes	2023-07-07 17:13:47 -07:00
Debanjum Singh Solanky	e588f7c528	Deprecate unused beta search and answer API endpoints	2023-07-07 16:38:07 -07:00
Debanjum Singh Solanky	c9fc4d1296	Revert to using cross-encoder to improve search results used by chat	2023-07-07 15:31:34 -07:00
Debanjum Singh Solanky	11f0a9f196	Fix chat tests since streaming. Pass args correctly to chat methods - Fix testing gpt converse method after it started streaming responses - Pass stop in model_kwargs dictionary and api key in openai_api_key parameter to chat completion methods. This should resolve the arg warning thrown by OpenAI module	2023-07-07 15:23:44 -07:00
Debanjum Singh Solanky	48870d9170	Fix parsing questions generated by extract_questions actor into list The previous json parsing was failing to handle questions with date filters Fix the chat actor tests to run without throwing error with freezegun complaining about importing transformers.local_llama model Remove quote escapes from date filter examples provided to extract_questions actor	2023-07-07 15:18:55 -07:00
Debanjum Singh Solanky	279662620b	Move results count to settings page on web. Use it for search & chat - Before Only the search interface had the results count configuration option - After - The results count is set on the settings page instead of the search page - Both search and chat can use the configured results count instead of just search	2023-07-07 14:08:08 -07:00
Debanjum Singh Solanky	2ec8da89e8	Remove Update button from Khoj Search page on the Web interface The settings page on the Khoj web interface already has a configure button. Don't need the Update button on the search page as well	2023-07-07 12:49:58 -07:00
Debanjum Singh Solanky	bf427cd8dd	Set no. of results used to generate chat response from Khoj Emacs	2023-07-07 12:34:50 -07:00
Debanjum Singh Solanky	1d77fe712c	Set no. of results used to generate chat response from Khoj Obsidian	2023-07-07 12:32:32 -07:00
Debanjum Singh Solanky	2f31de5ed5	Set no. of references to use for chat configurable in Chat API	2023-07-07 12:29:36 -07:00
Debanjum Singh Solanky	d97682fdac	Use tooltip, placeholders to guide Khoj setup via web settings page	2023-07-06 21:37:48 -07:00
Debanjum Singh Solanky	f5cf09424b	Use more descriptive field names for content type settings on Khoj web Resolves #281	2023-07-06 20:47:39 -07:00
Debanjum Singh Solanky	a2c668268f	Use node-fetch >=3.1.0 in khoj obsidian plugin to avoid security vulnerability	2023-07-06 13:05:52 -07:00
sabaimran	d688ddf92c	Re-instate the scheduler for the demo instances (#279 ) * For the demo instance, re-instate the scheduler, but infrequently for api updates - In constants, determine the cadence based on whether it's a demo instance or not - This allow us to collect telemetry again. This will also allow us to save the chat session * Conditionally skip updating the index altogether if it's a demo isntance	2023-07-06 11:01:32 -07:00
Debanjum Singh Solanky	8f36572a9b	Improve typing, null checks in controllers and gpt functions	2023-07-05 20:49:25 -07:00
Debanjum	6c2a8a5bce	⚡️ Stream Responses by Khoj Chat on Web, Obsidian - What - Stream chat responses from OpenAI API to Web, Obsidian clients - Implement using a callback function which manages a queue where new tokens can be placed as they come on. As the thread is read from, tokens are removed. - When the final token has been processed, add the `compiled_references` to the queue to be rendered by the `chat` client - When the thread has been closed, save the accumulated conversation log in the user's history using a `partial func` - Incrementally decode tokens on the front end and add them as they appear from the streamed response - Why This significantly reduces perceived latency and OpenAI API request timeouts for Chat Closes https://github.com/khoj-ai/khoj/issues/257	2023-07-05 20:02:11 -07:00
Debanjum Singh Solanky	e111eda6ae	Make client, app_config optional in telemetry logger for correct typing	2023-07-05 18:57:38 -07:00
Debanjum Singh Solanky	e562114f6b	Improve comments, var names in js for chat streaming on web interface	2023-07-05 18:57:27 -07:00
Debanjum Singh Solanky	46269ddfd3	Fix chat logging messages to get context without flooding logs	2023-07-05 18:27:06 -07:00
Debanjum Singh Solanky	0ba838b53a	Show temp status message in Khoj Obsidian chat while Khoj is thinking - Scroll to bottom after adding temporary status message and references too	2023-07-05 18:02:43 -07:00
Debanjum Singh Solanky	8271abe729	Use optional chaining operator to extract khojBannerSubmit from conditional	2023-07-05 18:02:43 -07:00
Debanjum Singh Solanky	c12ec1fd03	Show temp status message in Khoj web chat while Khoj is thinking - Scroll to bottom after adding temporary status message and references too	2023-07-05 18:02:30 -07:00
sabaimran	257a421e45	Bonus: add try-catch logic around telemetry upload in case of JSON serializability issues	2023-07-05 15:12:18 -07:00
sabaimran	4e6b66b139	Add support for streaming chat response from OpenAI to Obsidian - I needed to installed node-fetch to accomplish this, as the built-in request object from Obsidian doesn't seem to support streaming and the built-in fetch object is very sensitive to any and all cross origin requests	2023-07-05 15:01:22 -07:00
sabaimran	3ff5074cf5	Log the end-to-end time of generating a streamed response from OpenAI	2023-07-05 14:59:44 -07:00
sabaimran	68e635cc32	Remove additional comments and debug statements	2023-07-05 11:33:56 -07:00
sabaimran	67a8795b1f	Clean-up commented out code	2023-07-05 11:24:40 -07:00
sabaimran	79b1b1d350	Save streamed chat conversations via partial function passed to the ThreadGenerator	2023-07-04 17:33:52 -07:00
sabaimran	afd162de01	Add reference notes to result response from GPT when streaming is completed - NOTE: results are still not being saved to conversation history	2023-07-04 12:47:50 -07:00
sabaimran	8f491d72de	Initial code with chat streaming working (warning: messy code)	2023-07-04 10:14:39 -07:00
Debanjum Singh Solanky	5889eceba4	Make text selectable in Khoj chat modal on Obsidian Previously the text in the Khoj chat modal couldn't be copied as it was not selectable Resolves #206	2023-07-03 23:24:04 -07:00
sabaimran	89354def9b	Update request timeout window to 20 seconds	2023-07-03 22:28:18 -07:00
sabaimran	b1940519c3	Log error if unable to decode chunk from Github	2023-07-03 16:29:32 -07:00
Debanjum Singh Solanky	ecf9730cd7	Disable Chat, Search on Web if Khoj not configured & show next steps	2023-07-03 16:04:32 -07:00
sabaimran	017e8c1aef	Skip indexing a PDF that has an indexing error (#274 )	2023-07-03 15:55:11 -07:00
sabaimran	a6f313589e	Release Khoj version 0.7.1	2023-07-03 12:26:41 -07:00
sabaimran	8bfd5828e6	Remove deprecation notice since we're opening the web UI by default	2023-07-03 12:01:09 -07:00
sabaimran	92d81d3b16	Initialize the search.model field to SearchModels() and fix Reinitialize API call (#273 )	2023-07-03 11:32:44 -07:00
sabaimran	61403138d5	Merge pull request #269 from khoj-ai/features/simplify-configuration-steps Simplify some common configuration steps	2023-07-03 00:16:51 -07:00
sabaimran	ea3dc2cfa3	Simplify rendering of content type pages and logic of selecting config	2023-07-03 00:15:29 -07:00
sabaimran	260272dca2	Check if state.config is populated before configuring via the update method	2023-07-03 00:10:56 -07:00
sabaimran	bf8914d0c8	Fix default config initialization for for chat.html	2023-07-03 00:00:47 -07:00
Debanjum	faad1297f4	Drop Support for Org Music, Ledger Content Types Removing unused content types will reduce khoj code to manage - `0f993b3` Drop support for Ledger as a separate content type Khoj will soon get a generic text indexing content type in Index plain text files #237. This along with a file filter should suffice for searching through Ledger transactions - `c9db532` Remove unused org-music as an indexable content type from Khoj Org-music was just a custom content type that worked with org-music. It was mostly only useful for me.	2023-07-02 17:48:29 -07:00
Debanjum Singh Solanky	0f993b332e	Drop support for Ledger as a separate content type Khoj will soon get a generic text indexing content type. This along with a file filter should suffice for searching through Ledger transactions, if required. Having a specific content type for niche use-case like ledger isn't useful. Removing unused content types will reduce khoj code to manage.	2023-07-02 16:57:49 -07:00
sabaimran	fa218ff5aa	Fix call to update for Reinitialize button	2023-07-02 16:31:30 -07:00
sabaimran	a8b83da872	Merge branch 'master' of github.com:debanjum/khoj into features/simplify-configuration-steps	2023-07-02 16:21:54 -07:00
Debanjum Singh Solanky	c9db5321e7	Remove unused org-music as an indexable content type from Khoj Org-music was just a custom content type that worked with org-music. It was mostly only useful for me. Cleaning up that code will reduce number of content types for khoj to manage.	2023-07-02 16:21:21 -07:00
sabaimran	b86a3bb0c5	Merge branch 'master' of github.com:debanjum/khoj into fix/obsidian-setup-issues	2023-07-02 16:21:05 -07:00
sabaimran	a52c1c8380	Use built-in app.vault to determine whether there are any PDF files within	2023-07-02 16:20:43 -07:00
sabaimran	eff1436857	Overwrite existing PDFs in Obsidian as well, make if-block more legible	2023-07-02 16:17:25 -07:00
Debanjum Singh Solanky	30459ee4ba	Fix Khoj subtitle in desktop entry, pyproject, cli and Obsidian Readme	2023-07-02 16:09:07 -07:00
sabaimran	1a1b044d12	Simplify settings pages for configuration - Add one-click disablement - Remove fields that probably don't need to be edited (our implementation details) - Add a green tick if a given field is configured	2023-07-02 16:04:05 -07:00
sabaimran	e4c445f805	Add try-except-finally blocks around configure calls in /update	2023-07-02 13:35:02 -07:00
sabaimran	4b02a8c788	Fix PDF setup in Obsidian plugin and force Obsidian configuration for markdown	2023-07-02 12:37:24 -07:00
sabaimran	2a7e4f2b71	Escape special characters in the URL when adding a link to the remote file	2023-07-02 09:13:28 -07:00
sabaimran	c747562897	Update the GUI to just be a simple box with a button for the web UI	2023-07-01 20:37:21 -07:00
sabaimran	bab7f39d47	Move logic to open the web browser into the GUI section	2023-07-01 20:11:27 -07:00
sabaimran	36537606da	Update unit test and preserve prior operational ordering in main.py	2023-07-01 20:02:35 -07:00
sabaimran	ea9ae4ae28	Configure Khoj to automatically open the browser to their web home page when Khoj is up	2023-07-01 19:46:31 -07:00
sabaimran	d2083dd395	Remove bespoke processing for GithubToJsonl file demo	2023-07-01 19:09:22 -07:00
sabaimran	a71440f62a	Update the guidance in the error message if config is not set	2023-07-01 19:09:00 -07:00
sabaimran	7db97d8aa9	Fix: don't try to render the search_type.ALL	2023-07-01 19:08:19 -07:00
sabaimran	f0f6390366	Make --no-gui the default behavior of Khoj and update corresponding documentation	2023-07-01 19:07:59 -07:00
Debanjum Singh Solanky	d77e05c279	Release Khoj version 0.7.0	2023-07-01 05:44:22 -07:00
Debanjum Singh Solanky	30d87a9a01	Update color of Khoj chat in Obsidinan plugin to Lantern theme	2023-07-01 02:18:47 -07:00
Debanjum Singh Solanky	51826d28d6	Ensure clicking Update in Khoj Obsidian indexes PDF files too	2023-07-01 02:18:47 -07:00
sabaimran	dac2d14380	Handle file names appropriately for md files and render commits in github results	2023-07-01 01:20:58 -07:00
sabaimran	dbe713604d	Fix error in tests for markdown_to_jsonl	2023-07-01 00:49:40 -07:00
sabaimran	931aab4464	Handle case for when headers value is None	2023-07-01 00:37:30 -07:00
sabaimran	d01afb3ee4	Fix path issues for URL-based markdown files	2023-07-01 00:25:11 -07:00
sabaimran	31655447e7	Add the sign-up list to the chat page as well and update copy	2023-06-30 21:43:01 -07:00
sabaimran	796102c74e	Add separate configuration if the given Khoj instance is meant for demo - In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network) - Prevent re-indexing for Github data if this is a demo instance - Fix up some issues with the CSS which made settings page small in mobile - In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page	2023-06-30 20:38:55 -07:00
sabaimran	db3026739d	Resolve diffs in api.py to make /chat endpoint async with new request parameter	2023-06-30 00:25:37 -07:00
sabaimran	ef72508914	Try/catch around github file decoding, await call to search in chat API, fix img width	2023-06-30 00:23:21 -07:00
Debanjum Singh Solanky	b950889f47	Fix org-mode web renderer to handle results containing list in block - Break out of rendering list if at end of org block in org.js - This would previous hang rendering results in web interface Should try fix this upstream in org.js as well	2023-06-29 19:01:25 -07:00
sabaimran	780c769567	Add additional request headers to improve telemetry	2023-06-29 18:51:24 -07:00
sabaimran	6c10d68262	Merge pull request #253 from khoj-ai/features/github-issues-indexing Support indexing Github issues as well as corresponding comments	2023-06-29 16:02:47 -07:00
sabaimran	b2dd946c6d	Rename issue to entry method for accuracy	2023-06-29 15:23:50 -07:00
Debanjum Singh Solanky	51dfa48e2b	Have Khoj support Python 3.11 as Pytorch supports it now - Previously Khoj could only support Python upto 3.10 due to pytorch. But lots of folks had python 3.11 installed by default on their machines. This required installing python 3.10 and dealing with virtual envs. With Torch >= 2.0.1 now able to support python 3.11, at least one class of installation troubles for Khoj should drop. See https://github.com/pytorch/pytorch/issues/86566 for reference - Preliminary testing indicates using the new torch 2.x may reduce search time by 25% (from 80ms to 60ms on Mac M1) - Update Docs to not require mentioning python <=3.10 required - Update Github test workflow to run khoj tests with python 3.11 too	2023-06-29 15:13:26 -07:00
sabaimran	65bf894302	Interpret org files as a list and put them in separate divs. Update styling of search results to separate into cards	2023-06-29 15:12:48 -07:00
Debanjum Singh Solanky	d212298573	Make Configure button on web interface incrementally update by default We should add a way to force index everything. But force indexing should not be the default when user is just trying update content to index	2023-06-29 14:52:51 -07:00
Debanjum Singh Solanky	da2de21339	Only return requested result count even if search in multiple content types - Set results_count to default value at start so it is an int, never None	2023-06-29 14:49:05 -07:00
sabaimran	77672ac0ae	Demarcate different results with a border box - Add back support for searching by type Github - Remove custom class name in markdown js file	2023-06-29 14:14:25 -07:00
sabaimran	6edc32f2f4	Accept current changes to include issues in rendering flow	2023-06-29 12:25:29 -07:00

... 4 5 6 7 8 ...

1414 commits