sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-04 21:03:01 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	0885fc6c23	Handle server unavailable error on auto-index schedule job in khoj.el	2023-11-24 16:39:44 -08:00
sabaimran	c13953311a	Add reflective questions to admin pages	2023-11-23 14:01:05 -08:00
sabaimran	c42ec32a95	Merge pull request #552 from khoj-ai/features/internet-enabled-search Support internet-enabled, online searching using Serper.dev	2023-11-23 12:34:05 -08:00
sabaimran	c641b8df58	Update desktop package version	2023-11-22 17:54:53 -08:00
sabaimran	a1b2289074	Release Khoj version 1.0.1	2023-11-22 17:52:07 -08:00
sabaimran	b1b037f0ea	Fix URL configuration issues with reorganized subfolders	2023-11-22 17:03:33 -08:00
sabaimran	e0949e232b	Import random in adapters file for selecting reflective question	2023-11-22 07:52:51 -08:00
sabaimran	256e8de40a	Merge with features/internet-enabled-search	2023-11-22 07:25:24 -08:00
Debanjum Singh Solanky	fd60db766e	Clear Conversation History from the Web Client	2023-11-22 03:35:00 -08:00
Debanjum Singh Solanky	d5a4830761	Clear Conversation History from the Desktop Client	2023-11-22 03:35:00 -08:00
Debanjum Singh Solanky	3096544cf2	Create API endpoint to clear user's chat history	2023-11-22 03:34:59 -08:00
Debanjum Singh Solanky	63675b3299	Speak to Khoj from the Desktop client - Use icons to style speech to text recording state	2023-11-22 02:47:17 -08:00
Debanjum Singh Solanky	2951fc92d7	Speak to Khoj from the Web client - Use icons to style speech to text recording state	2023-11-22 02:47:17 -08:00
Debanjum Singh Solanky	cc77bc4076	Create speech to text API endpoint. Use OpenAI whisper for ASR - Wrap audio transcription in try/catch and delete audio file after processing - Use configured speech to text model, else handle error	2023-11-22 02:47:06 -08:00
Debanjum Singh Solanky	1ca99b6eb0	Add speech to text model configuration to Database	2023-11-22 02:24:31 -08:00
sabaimran	c652a7fd2d	Move text_to_entries under the new content folder	2023-11-21 22:25:17 -08:00
sabaimran	1e2af083f0	Rename the data_sources module to content	2023-11-21 22:11:32 -08:00
sabaimran	4cb28aeffb	Resolve merge conflicts with master	2023-11-21 22:07:41 -08:00
Debanjum Singh Solanky	4cdfe8fc4f	Re-enable Khoj Obsidian plugin for Mobile, as Khoj cloud is available	2023-11-21 16:33:48 -08:00
Debanjum	5d9d50157e	Clean Logs, Improve Message Rendering and Make Khoj Trusted Host Configurable (#561 ) - Append chat message to chat logs as TextNodes in web, desktop clients - Simplify Code to Identify Files from Github, Notion on Web, Desktop Client - Use file source to find entries from github, notion on web, desktop client - Pass file source to clients via text search API response - Make Django Logs Follow Khoj Log Format, Verbosity - Handle image search setup related warning - Format Django initializing outputs using Khoj logger format - Use `KHOJ_HOST` env var to set allowed/trusted domains to host Khoj	2023-11-21 15:14:34 -08:00
Debanjum Singh Solanky	9e736d4340	Use KHOJ_DOMAIN for CORS allow_origins list as well - Default to app.khoj.dev - Remove unnecesary any_path regex in allow_origins. It only cares about host, paths are not set in origin header	2023-11-21 14:02:04 -08:00
sabaimran	5469e81a87	Use full path for the static directory in FastAPI and reflect deeper nesting of the django app	2023-11-21 13:44:45 -08:00
sabaimran	d199c4c35f	Resovle merge conflicts with matser	2023-11-21 13:35:56 -08:00
Debanjum Singh Solanky	76d041f633	Use KHOJ_HOST env var to set allowed/trusted domains to host Khoj Allows hosting Khoj behind other, non "khoj.dev" domains	2023-11-21 13:11:45 -08:00
Debanjum Singh Solanky	90d463c12a	Append chat message to chat logs as TextNodes in web, desktop clients	2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky	befcbcdd5d	Use file source to find entries from github, notion on web, desktop client This is a more robust mechanism of identification than via file name including github or notion domain names	2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky	3f0de45ec6	Pass file source to clients via text search API response Source of entry stored in DB is now passed to clients for processing	2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky	4aec581306	Handle image search setup related warning Ideally should rename model_directory to config_directory or some such but the current image search code will need to be migrated soon. So changing the variable name and creating a migration script for old khoj.yml files using model-directory variable isn't worth it Remove the explicity set of number of threads to use by pytorch. Use the default used by it.	2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky	b06628ee31	Format Django initializing outputs using Khoj logger format - Collect STDOUT from the `migrate', `collectstatic' commands and output using the Khoj logger format and verbosity settings - Only show Django `collectstatic' command output in verbose mode - Fix showing the Initializing Khoj log line by moving it after logger level set	2023-11-21 13:10:50 -08:00
sabaimran	341abf03ff	Handle none for search_type and use equals comparator rather than in for determining Notion type	2023-11-21 12:55:09 -08:00
sabaimran	2bb989e9d8	Resolve merge conflicts and fix some import ordering	2023-11-21 12:30:43 -08:00
sabaimran	244b76ffed	Add isort for automatic import sorting and skip main.py because it's a drama queen 👑	2023-11-21 12:20:41 -08:00
Debanjum	8a0d92e2d7	Fix Connectivity Check in Obsidian Client (#559 ) from dtkav/bugfix-local-connectivity-check Check connection to Khoj server for self-hosted server. This check had regressed during the cloud rearchitecture	2023-11-21 12:05:16 -08:00
sabaimran	0e6f09b241	Merge pull request #562 from khoj-ai/fix/pypi-package-app-not-included Fix PyPi package app reference issue	2023-11-21 11:54:46 -08:00
sabaimran	333cb3445c	Use colon rather than equals to indicate typing	2023-11-21 11:28:51 -08:00
Debanjum Singh Solanky	645fd96634	Search across all content types from Khoj Obsidian client Previously it was only searching for PDF and Markdown files. This was meant to show only content from current vault as results. But it has not scaled well as other clients also allow syncing PDF and markdown files now. So remove this content type filter for now. A proper solution would limit by using file/dir filters on server or client side.	2023-11-21 11:19:33 -08:00
sabaimran	a1460a5bf9	Set operations to typed empty list in migration file	2023-11-21 11:14:40 -08:00
sabaimran	71e794c26f	Remove the sys.append line in the main.py file, as it's not required	2023-11-21 10:57:21 -08:00
sabaimran	a474c31e02	Move the django app into the src/khoj folder for better organization and functionality - Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder - Update associated file paths and system references	2023-11-21 10:56:04 -08:00
Debanjum Singh Solanky	c89bd49973	Fix ranking search results on Obsidian It's reversed since score of entries is now a distance metric on Khoj server. So lesser distance is better. Previously higher score was better	2023-11-21 01:24:59 -08:00
Daniel Grossmann-Kavanagh	f142999bce	fix khoj local server usage	2023-11-20 17:07:30 -08:00
Debanjum Singh Solanky	c07401cf76	Fix, Improve chat config via CLI on first run by using defaults - Fix setting prompt size for online chat - generally improve chat config via cli by using default chat model, prompt size for online and offline chat	2023-11-20 17:01:20 -08:00
sabaimran	b142de15a8	Merge branch 'features/internet-enabled-search' of github.com:khoj-ai/khoj into features/reflective-suggested-questions	2023-11-20 15:56:09 -08:00
sabaimran	a9623ef85a	Add requisite imports in order to instantiate offline model in adapters file	2023-11-20 15:27:42 -08:00
sabaimran	a8f13f334f	Fix merging issues with base after popping the stash	2023-11-20 15:22:50 -08:00
sabaimran	8fa0b69c67	Resolve merge issue with adapters methods	2023-11-20 15:21:06 -08:00
sabaimran	fee99779bf	Add subqueries for internet-connected search results and update client-side code accordingly - Add a wrapper method to help make direct queries to the LLM and determine any intermediate responses needed for handling the request	2023-11-20 15:19:15 -08:00
Debanjum Singh Solanky	d61b0dd55c	Add Khoj Django app package to sys path to load Django module via pip install	2023-11-20 14:55:00 -08:00
sabaimran	b8e6883a81	Merge branch 'master' of github.com:khoj-ai/khoj into features/internet-enabled-search	2023-11-19 16:20:08 -08:00
sabaimran	237195e20e	Make all name-related fields nullable within the GoogleUser	2023-11-19 14:22:32 -08:00
Debanjum	71799add0b	Index Parent Headings of Org-Mode Entries to Improve Search Context (#548 ) ### Overview The parent hierarchy of org-mode entries can store important context. This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index ### Details - Test search uses ancestor headings as context for improved results - Add ancestor headings of each org-mode entry to their compiled form - Track ancestor headings for each org-mode entry in org-node parser Resolves #85	2023-11-19 13:18:19 -08:00
sabaimran	ef5e9d66c1	Resolve merge conflicts in dependency imports	2023-11-19 11:42:20 -08:00
Debanjum Singh Solanky	c3465d6982	Release Khoj version 1.0.0	2023-11-19 09:50:25 -08:00
Debanjum	736744be3a	Update documentation to reflect new multi-user config scenario (#550 ) - Update docs to show how to use Khoj Cloud - Move self-hosting Khoj to separate section - Add page to setup Desktop app - Set default URL to Khoj Cloud URL in Obsidian, Emacs clients	2023-11-18 18:22:46 -08:00
Debanjum Singh Solanky	e1bf1f0e86	Update default Khoj server URL to Khoj cloud on Emacs, Obsidian clients	2023-11-18 16:25:45 -08:00
Debanjum Singh Solanky	8775ce730a	Use URL fragments to allow jumping to config page sections on Web app	2023-11-18 16:25:45 -08:00
sabaimran	f792b1e301	Remove already defined identical function	2023-11-18 14:08:50 -08:00
sabaimran	e2fff5dc47	Don't explicitly use value to get the model type value	2023-11-18 14:01:01 -08:00
sabaimran	a8a25ceac2	Honor user's chat settings when running the extract questions phase - Add marginally better error handling when GPT gives a messed up respones to the extract questions method - Remove debug log lines	2023-11-18 13:31:51 -08:00
sabaimran	67156e6aec	Add new logs for debugging issues with chat references	2023-11-18 12:10:50 -08:00
sabaimran	5de2ab6098	Change parse_obj calls to use model_validate per new pydantic specification	2023-11-18 12:10:36 -08:00
sabaimran	6d249645a6	Fix interpretation of the default search type	2023-11-18 00:04:18 -08:00
sabaimran	f180b2ba94	Resolve mypy errors for various data types	2023-11-17 23:26:15 -08:00
sabaimran	3328a41f08	Update types of base config models for pydantic 2.0	2023-11-17 23:08:52 -08:00
sabaimran	f688529150	Update the default configuration for the AppConfig	2023-11-17 19:26:31 -08:00
sabaimran	11ccb92755	Fix formatting of welcome message to use markdown	2023-11-17 18:55:59 -08:00
Debanjum Singh Solanky	ca87b4ede9	Wrap common API query parameters into shared class to deduplicate code - Upgrade FastAPI to >= latest version. Required upgrade of FastAPI. Earlier version didn't support wrapping common query params in class - Use per fixture app instead of a global FastAPI app in conftest - Upgrade minimum required Django version - Fix no notes chat director test with updated no notes message No notes message was updated in commit `118f1143`	2023-11-17 18:43:49 -08:00
sabaimran	262f3ccb59	Resolve mypy issues with formatting	2023-11-17 17:11:00 -08:00
sabaimran	a7e00898cb	Fix rendering even when no online context references are returned	2023-11-17 16:41:28 -08:00
sabaimran	0fcf234f07	Add support for using serper.dev for online queries - Use the knowledgeGraph, answerBox, peopleAlsoAsk and organic responses of serper.dev to provide online context for queries made with the /online command - Add it as an additional tool for doing Google searches - Render the results appropriately in the chat web window - Pass appropriate reference data down to the LLM	2023-11-17 16:19:11 -08:00
Debanjum Singh Solanky	55785d50c3	Use title, when present, as root ancestor of entries instead of file path	2023-11-17 15:03:27 -08:00
sabaimran	bfbe273ffd	Add some styling to the copy button for programmatic output	2023-11-17 12:18:35 -08:00
sabaimran	9ddf3b58c3	Use the markdown parser for rendering the chat messages in the web interface	2023-11-17 12:14:02 -08:00
sabaimran	a0b12b001a	Provide in-line rendering when output matches certain views	2023-11-17 11:04:36 -08:00
sabaimran	ec06d2c446	Move data indexer files into a separate folder under processor. Update assoc UTs	2023-11-16 17:19:55 -08:00
sabaimran	45a42faec8	Make adjectives more positive for api token generation	2023-11-16 15:55:35 -08:00
sabaimran	118f1143ff	When user tries using the notes slash command without having any data indexed	2023-11-16 12:52:39 -08:00
sabaimran	e8a13f0813	Add multi-user support to Khoj and use Postgres for backend storage (#549 ) - Adds support for multiple users to be connected to the same Khoj instance using their Google login credentials - Moves storage solution from in-memory json data to a Postgres db. This stores all relevant information, including accounts, embeddings, chat history, server side chat configuration - Adds the concept of a Khoj server admin for configuring instance-wide settings regarding search model, and chat configuration - Miscellaneous updates and fixes to the UX, including chat references, colors, and an updated config page - Adds billing to allow users to subscribe to the cloud service easily - Adds a separate GitHub action for building the dockerized production (tag `prod`) and dev (tag `dev`) images, separate from the image used for local building. The production image uses `gunicorn` with multiple workers to run the server. - Updates all clients (Obsidian, Emacs, Desktop) to follow the client/server architecture. The server no longer reads from the file system at all; it only accepts data via the indexer API. In line with that, removes the functionality to configure org, markdown, plaintext, or other file-specific settings in the server. Only leaves GitHub and Notion for server-side configuration. - Changes license to GNU AGPLv3 Resolves #467 Resolves #488 Resolves #303 Resolves #345 Resolves #195 Resolves #280 Resolves #461 Closes #259 Resolves #351 Resolves #301 Resolves #296	2023-11-16 11:48:01 -08:00
Debanjum Singh Solanky	74403e3536	Add ancestor headings of each org-mode entry to their compiled form Resolves #85	2023-11-16 02:54:41 -08:00
Debanjum Singh Solanky	305c25ae1a	Track ancestor headings for each org-mode entry in org-node parser	2023-11-16 02:39:14 -08:00
Debanjum Singh Solanky	cc05013715	Update first run message on Web app with Chat models setup instructions - Link to Django admin panel for user to create Chat Models on their Khoj server - This should only get hit when user is not using Khoj cloud, as Khoj cloud would already have Chat models configured	2023-11-15 22:44:24 -08:00
Debanjum Singh Solanky	6c1693b8f4	Update first run message on Desktop app with API token setup instructions - Open Web app settings in the default browser via link click - Open Desktop app settings via link click	2023-11-15 22:44:11 -08:00
Debanjum Singh Solanky	922983bd53	Set max cos distance to 0.18. Test search API query with max distance	2023-11-15 20:26:21 -08:00
Debanjum Singh Solanky	18dbad5edb	Use Sigmoid to normalize cross-encoder score between 0-1 - While sigmoid normalization isn't required for reranking. Normalizing score to distance metrics for both encoder and cross encoder scores is useful to reason about them - Softmax wasn't required as don't need probabilities, sigmoid is good enough to get distance metric	2023-11-15 19:31:59 -08:00
sabaimran	ea144de438	Merge with master	2023-11-15 18:34:46 -08:00
Debanjum Singh Solanky	348cc0cf0e	Use better name for DB adapter func to create user by Google token	2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky	08a057bdd5	Rename SearchModel to SearchModelConfig DB model, Require Cross-Encoder	2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky	0679b2a7bd	Use embeddings model store from state in text to entries Do not need to instantiating it separately. In all other places we're using the embeddings model store in global state anyway	2023-11-15 17:31:50 -08:00
sabaimran	245a9cbf63	Fix return type of the update_or_create method	2023-11-15 17:31:50 -08:00
sabaimran	bbae7dd83c	Update logic for creating a new user to use aupdate_or_create	2023-11-15 17:31:50 -08:00
sabaimran	8e62af77b9	Update format for return type of the generate token mehtod	2023-11-15 17:03:01 -08:00
sabaimran	4a487aff23	Fix return type of the update_or_create method	2023-11-15 14:35:42 -08:00
sabaimran	b63856ecb4	Update logic for creating a new user to use aupdate_or_create	2023-11-15 12:50:39 -08:00
sabaimran	b8e7488a95	Use a more permissive distance filter for search results from notes	2023-11-15 11:13:47 -08:00
sabaimran	05b7542115	Remove config lock from the state	2023-11-15 10:44:45 -08:00
sabaimran	ecd005cac0	Check if search model is already in DB before creating a new one	2023-11-15 10:41:35 -08:00
Debanjum Singh Solanky	9c6e7bdea2	Upgrade server, desktop app dependencies to resolve CVE bugs	2023-11-15 01:47:53 -08:00
Debanjum Singh Solanky	8f200cf53f	Remove unused parameter from configure_search_type method	2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky	f8e5e118e1	Only create KhojUser on login if doesn't already exist	2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky	3d8d6145f2	Add search model config from khoj.yml to Postgres DB via migration script	2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky	4af194d74b	Make search model configurable on server - Expose ability to modify search model via Django admin interface - Previously the bi_encoder and cross_encoder models to use were set in code - Now it's user configurable but with a default config generated by default	2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky	e98141f4c3	Subscribe default user to standard plan with a far away renewal date Self hosted users in anonymous mode have all capabilities unlocked	2023-11-14 16:31:39 -08:00
Debanjum Singh Solanky	9d30fda26d	Deduplicate, improve name of prompt templates for GPT4All chat models - Do not pass unused rerank_results parameter to text_search.query method	2023-11-14 16:31:09 -08:00
Debanjum Singh Solanky	795ec9eb55	Add KHOJ_prefix to server admin credentials environment variables	2023-11-14 16:13:13 -08:00
sabaimran	ee005de662	Rename django files URL to server instead of django	2023-11-14 12:36:38 -08:00
sabaimran	20ce3d0c78	Update default docker compose configuration with Khoj local mode	2023-11-14 12:21:26 -08:00
sabaimran	8c36079f74	Add a first run experience to intialize the admin user if none exists and setup chat models	2023-11-13 21:07:12 -08:00
Debanjum Singh Solanky	e9adb58c16	Rate limit calls to the /chat API per user, per day/minute	2023-11-13 19:41:46 -08:00
Debanjum Singh Solanky	33a8eb0470	Log when new user is created	2023-11-13 19:37:24 -08:00
sabaimran	603f838115	Block input text field when waiting for chat response	2023-11-11 17:14:37 -08:00
Debanjum Singh Solanky	9c321ac070	Fix cross encoder to use softmax to convert it to a distance metric	2023-11-11 16:12:24 -08:00
sabaimran	8a824167cf	Merge branch 'fix/imports-and-references' of github.com:khoj-ai/khoj into fix/imports-and-references	2023-11-11 12:59:31 -08:00
sabaimran	fa428932a8	Update URL for downloading the desktop application	2023-11-11 12:59:15 -08:00
Debanjum Singh Solanky	941c7f23a3	Only get text search results above confidence threshold via API - During the migration, the confidence score stopped being used. It was being passed down from API to some point and went unused - Remove score thresholding for images as image search confidence score different from text search model distance score - Default score threshold of 0.15 is experimentally determined by manually looking at search results vs distance for a few queries - Use distance instead of confidence as metric for search result quality Previously we'd moved text search to a distance metric from a confidence score. Now convert even cross encoder, image search scores to distance metric for consistent results sorting	2023-11-11 04:11:33 -08:00
Debanjum Singh Solanky	e44e6df221	Reduce data dumped in console log from web, desktop app	2023-11-11 02:05:07 -08:00
Debanjum Singh Solanky	f044a89d50	Show status in Save, Reinitialize button of config page on web app - Show non-transient error message in status element if action fails - On success, just show temporary success message within button	2023-11-11 02:04:58 -08:00
Debanjum Singh Solanky	f17d9da36c	Move Configure, Reinitialize buttons into the Content section on Web app Remove the Results Count button from the web app. It's hanging weirdly with not much context to its purpose. Reintroduce it in the Search card when created under the Features section	2023-11-11 02:01:39 -08:00
Debanjum Singh Solanky	325cb0f7fb	Show message in Save button of Github, Notion config save in web app Show the success, failure message only temporarily. Previously it stuck around after clicking save until page refresh	2023-11-11 02:01:39 -08:00
Debanjum Singh Solanky	b34d4fa741	Save config, update index on save of Github, Notion config in web app Reduce user confusion by joining config update with index updation for each content type. So only a single click required to configure any content type instead of two clicks on two separate pages	2023-11-11 00:33:49 -08:00
Debanjum Singh Solanky	c4364b9100	Weaken asking follow-up qs and q&a mode in notes prompt to OpenAI models - Notes prompt doesn't need to be so tuned to question answering. User could just want to talk about life. The notes need to be used to response to those, not necessarily only retrieve answers from notes - System and notes prompts were forcing asking follow-up questions a little too much. Reduce strength of follow-up question asking	2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky	cba371678d	Stop OpenAI chat from emitting reference notes directly in chat body The Chat models sometime output reference notes directly in the chat body in unformatted form, specifically as Notes:\n['. Prevent that. Reference notes are shown in clean, formatted form anyway	2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky	8585976f37	Revert "Use notes in system prompt, rather than in the user message" This reverts commit `e695b9ab8c`.	2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky	b6441683c6	Increase reference text on 1st expansion to 3 lines and 140 characters	2023-11-10 23:36:43 -08:00
sabaimran	55c97241b5	Merge branch 'fix/imports-and-references' of github.com:khoj-ai/khoj into fix/imports-and-references	2023-11-10 22:38:34 -08:00
sabaimran	e2e96f9aa4	Add default settings to let new users be subscribed on trial - Add the default user to a subscription trial - Update associated unit tests	2023-11-10 22:38:28 -08:00
Debanjum Singh Solanky	501e7606a0	Increase reference text on 1st expansion to 3 lines and 140 characters	2023-11-10 21:27:04 -08:00
sabaimran	0a950d9382	Fix checker to determine if obsidian client is connected	2023-11-10 19:21:58 -08:00
sabaimran	c736604366	Merge with remote	2023-11-10 17:50:15 -08:00
sabaimran	b0b07bde6c	Allow chat reference to expand enough to show the whole reference, rather than constraining the height	2023-11-10 17:49:20 -08:00
sabaimran	14f8c151c8	Fix return type of the generate_chat_response method	2023-11-10 17:48:54 -08:00
Debanjum Singh Solanky	45b8670c25	Fix return type hint for generate_chat_response func	2023-11-10 17:34:19 -08:00
Debanjum Singh Solanky	9b6c5ddba4	Update action row padding in cards on config page of web app	2023-11-10 16:53:25 -08:00
sabaimran	54d4fd0e08	Add chat_model data for logging selected models to telemetry	2023-11-10 16:46:34 -08:00
sabaimran	e695b9ab8c	Use notes in system prompt, rather than in the user message	2023-11-10 15:09:33 -08:00
sabaimran	cec932d88a	Update prompt so that GPT is more context aware with its capabilities	2023-11-10 14:37:11 -08:00
sabaimran	e62788ad79	Await result for determining if user has entries	2023-11-10 13:51:56 -08:00
sabaimran	1a56344f12	Remove the old syncData reference as it no longer exists	2023-11-10 10:10:07 -08:00
Debanjum Singh Solanky	39ad1c6ce6	Release Khoj version 0.14.0 Fix Khoj subtitle in manifest of Khoj Obsidian plugin	2023-11-10 00:28:33 -08:00
Debanjum Singh Solanky	745d6bfeed	Add detailed intro message, mention download desktop app for docs sync	2023-11-10 00:20:28 -08:00
Debanjum Singh Solanky	6eb7df717c	Only show search in web app nav pane if user has documents indexed	2023-11-09 19:14:54 -08:00
Debanjum Singh Solanky	c0789dc57b	Use email to get_user_subscription from DB and other DB adapters - Needing user subscription requires chaining function - Simplify get_file_sources DB adapter	2023-11-09 19:09:57 -08:00
Debanjum Singh Solanky	841ed95521	Move active user profile halo check into nav pane macro on web app	2023-11-09 18:05:19 -08:00
Debanjum Singh Solanky	ddac693762	Hide download desktop app message in web app if synced files exist	2023-11-09 17:47:00 -08:00
Debanjum Singh Solanky	30a9674f25	Mark generated profile pic with subscription circle in web app	2023-11-09 15:22:38 -08:00
Debanjum Singh Solanky	d6e6ed1cfa	Keep single Save button, Show next sync, default to prod Khoj URL in Desktop app - Make mutable syncing variable not a const - Show next sync time to make users aware of data sync is automated - Keep a single Save button to reduce confusion. It does what Save All previously did. Intent to manual sync should Save All - Default to using app.khoj.dev as default Khoj URL to ease setup	2023-11-09 14:04:58 -08:00
Debanjum Singh Solanky	e1f0128576	Change config migration script to update to 0.15.0 version Next release, 0.14.0 wouldn't contain the migration to Postgres	2023-11-09 12:21:58 -08:00
Debanjum Singh Solanky	17cbbb0b01	Use Consistent Environment Variable for KHOJ_DEBUG	2023-11-09 11:01:28 -08:00
Debanjum Singh Solanky	391db80499	Improve subscribed user profile pictures and nav pane selection - Add yellow halo around subscribed user profile - Fix highlighting current page in header nav pane	2023-11-09 00:57:05 -08:00
Debanjum Singh Solanky	605058c72a	Allow null user profile picture from Google OAuth in DB - Fix width of generated profile picture generated for user - Ignore unused Stripe webhook events	2023-11-09 00:46:59 -08:00
Debanjum Singh Solanky	a2609973b8	Disable Subscription if Stripe environment not setup Deduplicate DJANGO_SECRET_KEY and KHOJ_DJANGO_SECRET_KEY to latter name as prefixed with KHOJ as KHOJ app specific	2023-11-08 19:39:32 -08:00
Debanjum Singh Solanky	09e1235832	Auto update billing card UI on (re/un-)subscribe click on web app Previously required a page load to see the updated billing state after clicking resubscribe or unsubscribe buttons	2023-11-08 18:38:12 -08:00
Debanjum Singh Solanky	8b8bb15866	Keep sync state in memory, initialized to false in Desktop app Prevent deadlock if desktop app killed in middle of syncing	2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky	c043eb54ae	Use typed entry source instead of raw str to map source to conf in api.py	2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky	8178004e6d	Move Subscription data into separate table in DB. Merge migrations	2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky	3bb10128ef	Move subscription API to separate, independent router	2023-11-08 16:20:27 -08:00
Debanjum Singh Solanky	ec1395d072	Clean, merge subscription update events, API and functions - Reduce webhook triggers for subscription updates - Merge subscription update API endpoint, functions for (re/un-)subscribe	2023-11-08 15:55:20 -08:00
Debanjum Singh Solanky	ef5c13f968	Keep user subscription state. Update it when user has unsubscribed	2023-11-08 12:08:36 -08:00
Debanjum Singh Solanky	c52affc6d9	Get Khoj Cloud Subscription URL via environment variable	2023-11-08 12:07:53 -08:00
sabaimran	609d358b1a	Use sql datetime comparison for detecting validity of subscription renewal date - Update the unsubscribe endpoint to use query params - Use subscription id to process unsubscribe endpoint, rather than the customer id	2023-11-07 19:17:36 -08:00
sabaimran	98cf095b65	Fix bug for rendering chat references in LLM response	2023-11-07 16:44:41 -08:00
sabaimran	0e1cdb6536	Add additional error handling for processing unknown Stripe events and fix typo in STRIPE_SIGNING env variable	2023-11-07 16:43:05 -08:00
sabaimran	08c86927cb	Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into fix-improve-config-page-on-desktop-and-web-app	2023-11-07 12:46:49 -08:00
sabaimran	cec54e3a8a	Merge pull request #536 from khoj-ai/features/update-chat-ui Update the chat UI to have richer representation of the references	2023-11-07 12:34:57 -08:00
Debanjum Singh Solanky	f466751f4d	Expose card on web app config page to manage subscription to Khoj cloud	2023-11-07 10:21:00 -08:00
Debanjum Singh Solanky	9aaf475c8a	Create API webhook, endpoints for subscription payments using Stripe - Add fields to mark users as subscribed to a specific plan and subscription renewal date in DB - Add ability to unsubscribe a user using their email address - Expose webhook for stripe to callback confirming payment	2023-11-07 10:20:51 -08:00
Debanjum Singh Solanky	156421d30a	Show file type icons for each indexed file in config card of web app	2023-11-07 05:48:44 -08:00
Debanjum Singh Solanky	045c2252d6	Set content enabled status on update via config buttons on web app Previously hitting configure or disable wouldn't update the state of the content cards. It needed page refresh to see if the content was synced correctly. Now cards automatically get set to new state on hitting disable button on card or global configure buttons	2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky	7c424e0d5f	Enable deleting all indexed desktop files from Khoj via Desktop app	2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky	779fa531a5	Prevent Desktop app triggering multiple simultaneous syncs to server Lock syncing to server if a sync is already in progress. While the sync save button gets disabled while sync is in progress, the background sync job can still trigger a sync in parallel. This sync lock prevents that	2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky	404d47f1a1	Bubble up content indexing errors to notify user on client apps	2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky	6e957584ac	Create config page on web app to manage computer files indexed by Khoj Remove the table of all files indexed by Khoj. This seems overkill and doesn't match the UI semantics of the other data sources like Github, Notion. Create instead a data source card for computer files with the same update, disable semantics of the Github and Notion data source cards Users can disable each data source from its card on the main config page. They can see/delete individual files indexed from the computer data source once they click into the computer files data source card on the config page	2023-11-07 04:42:53 -08:00
Debanjum Singh Solanky	d527b644f4	Update content by source via API. Make web client use this API for config	2023-11-07 03:41:19 -08:00
Debanjum Singh Solanky	9ab327a2b6	Store the data source of each entry in database This will be useful for updating, deleting entries by their data source. Data source can be one of Computer, Github or Notion for now Store each file/entries source in database	2023-11-07 02:18:48 -08:00
Debanjum Singh Solanky	c82cd0862a	Delete deprecated content config pages for local files from web client The desktop app now manages syncing local computer files to index The server only manages "cloud" data source like github and notion.	2023-11-06 23:55:37 -08:00
Debanjum Singh Solanky	97cf8339aa	Rename Sync button, Force Sync toggle to Save, Save All buttons	2023-11-06 21:57:37 -08:00
Debanjum Singh Solanky	a08b152358	Improve log messages in text_entries and memory leak unit test	2023-11-06 19:27:31 -08:00
sabaimran	6c8689e4ae	Update corresponding chat UX in the desktop client as well	2023-11-06 16:18:41 -08:00
sabaimran	e01ecf1419	/s/references/reference to fix bug of jumping references	2023-11-06 16:12:25 -08:00
Debanjum	38f24a037d	Improve Indexing Text Entries (#535 ) Major - Ensure search results logic consistent across migration to DB, multi-user - Manually verified search results for sample queries look the same across migration - Flatten indexing code for better indexing progress tracking and code readability Minor - `a4f407f` Test memory leak on MPS device when generating vector embeddings - `ef24485` Improve Khoj with DB setup instructions in the Django app readme (for now) - `f212cc7` Arrange remaining text search tests in arrange, act, assert order - `022017d` Fix text search tests to test updated indexing log messages	2023-11-06 16:01:53 -08:00
sabaimran	270f7b3eb3	Update the chat UI to have richer representation of the references	2023-11-05 15:46:43 -08:00
sabaimran	d697d752c2	Use repeat rather than manually specify auto in grid-template-rows Co-authored-by: Debanjum <debanjum@gmail.com>	2023-11-05 15:23:42 -08:00
sabaimran	5f1e37fff0	Adjust indentation for css property	2023-11-05 14:33:23 -08:00
Debanjum Singh Solanky	a4f407f595	Test memory leak on MPS device when generating vector embeddings Slope threshold of 2.0 determined qualitatively on local Mac device Minor unused import and clean-up	2023-11-05 03:48:54 -08:00
Debanjum Singh Solanky	ef24485ada	Improve Khoj with DB setup instructions in the Django app readme (for now)	2023-11-05 02:04:52 -08:00
sabaimran	084a8becc5	Fix but to prevent default in chat trigger	2023-11-04 20:13:33 -07:00
Debanjum Singh Solanky	5489e98b9c	Do not index org heading entries by default This is to maintain the previous default behavior	2023-11-04 20:09:25 -07:00
Debanjum Singh Solanky	34b5a86d1d	Use SentenceTransformer to disable progress bar when encoding query The Langchain HuggingFaceEmbeddings wrapper doesn't support disabling progressbar, not especially for only query but not documents. This makes the logs noisy with encoding progressbar for each incremental queries No features of the Langchain wrapper for SentenceTransformer was currently being used anyway for now, and we can always switch back to it if required	2023-11-04 20:09:25 -07:00
Debanjum Singh Solanky	dc9946fc03	Flatten nested loops, improve progress reporting in text_to_jsonl indexer Flatten the nested loops to improve visibilty into indexing progress Reduce spurious logs, report the logs at aggregated level and update the logging description text to improve indexing progress reporting	2023-11-04 20:09:25 -07:00
sabaimran	88eeee3f4b	Move try/catch for import one line later	2023-11-04 19:46:47 -07:00
sabaimran	dbaa892665	Flip catching modulenotfound to import error exception	2023-11-04 19:34:10 -07:00
sabaimran	8c3d5a49da	Add try/except around image extraction step	2023-11-04 19:27:18 -07:00
sabaimran	fdfab39942	Update the config UI to show all files indexed with option to delete - Given the separation of the client and server now, the web UI will no longer support configuration of local file paths of data to index - Expose a way to show all the files that are currently set for indexing, along with an option to delete all or specific files	2023-11-04 19:03:34 -07:00
sabaimran	800bb4f458	Remove references to demo - The demo setting is no longer necessary for the time being, as we won't have anymore demo instances	2023-11-04 17:17:04 -07:00
sabaimran	b5972e9311	Use OCR to extract image text in PDFs	2023-11-04 17:15:28 -07:00
Debanjum Singh Solanky	8273bf26b7	Fix multi-line chat input and output render on web, desktop clients - Remove spurious whitespace in chat input box on page load being added because text area element was ending on newline - Do not insert newline in message when send message by hitting enter key This would be more evident when send message with cursor in the middle of the sentence, as a newline would be inserted at the cursor point - Remove chat message separator tokens from model output. Model sometimes starts to output text in it's chat format	2023-11-04 01:09:35 -07:00
Debanjum Singh Solanky	2f1756cc15	Do not use icon for each file, folder to index in desktop app. Other minor fixes based on PR feedback	2023-11-04 00:13:10 -07:00
Debanjum Singh Solanky	e8f568d79c	Make splash screen wider, opaque and fix it's spinner radius Radius should be such that final spin doesn't extend out of the circle Opaque background improves contrast for better visual	2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky	3ef05f4803	Use css var for main font color in search, chat page of desktop app	2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky	a19cbde2d7	Add About page for Khoj to Desktop app. Expose it via system tray - Pass current khoj version from package.json to about page via electron IPC between backend js and frontend page - Update Khoj information in default About screen as well, in case it's exposed anywhere else	2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky	a327294ee9	Rename khoj.js to utils.js in web and desktop client apps	2023-11-03 18:13:37 -07:00
Debanjum Singh Solanky	db57eeaefe	Console log a welcome message on loading Desktop client	2023-11-03 05:15:41 -07:00
Debanjum Singh Solanky	6fae6fb2a4	Merge branch 'features/multi-user-support-khoj' into improve-client-app-theming	2023-11-03 04:58:41 -07:00
Debanjum Singh Solanky	4cd76311ad	Slow down spinning at end of splash sequence. Make animation bigger	2023-11-03 04:28:17 -07:00
Debanjum Singh Solanky	34661c33a2	Show splash screen on starting desktop app	2023-11-03 03:19:08 -07:00
Debanjum Singh Solanky	126d3f4563	Render each file, folder to index row with icon in desktop app Make the file, folders to index look less like an editable field	2023-11-03 02:48:42 -07:00
Debanjum Singh Solanky	80ae132cad	Update Desktop, Obsidian client color theme to lighter yellow - Update background color to a different shade of white - Make primary and primary hover colors less intense and more aligned with lantern flame shade - Add water, leaf, flower color variables	2023-11-03 02:48:42 -07:00
sabaimran	fb6ebd19fc	Fix refactor bugs, CSRF token issues for use in production (#531 ) Fix refactor bugs, CSRF token issues for use in production * Add flags for samesite settings to enable django admin login * Include tzdata to dependencies to work around python package issues in linux * Use DJANGO_DEBUG flag correctly * Fix naming of entry field when creating EntryDate objects * Correctly retrieve openai config settings * Fix datefilter with embeddings name for field	2023-11-02 23:02:38 -07:00
Debanjum Singh Solanky	345856e7be	Merge branch 'master' of github.com:khoj-ai/khoj into features/multi-user-support-khoj Merge changes to use latest GPT4All with GPU, GGUF model support into khoj multi-user support rearchitecture branch	2023-11-02 22:44:25 -07:00
Debanjum Singh Solanky	041074ccd6	Make chat the landing page for the desktop app Chat, unlike search, doesn't knowledge base indexing setup. So you can get started with chat much faster.	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	3801105b2a	Make chat the landing page for the web app Chat, unlike search, doesn't knowledge base indexing setup. So you can get started with chat much faster.	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	0d4e7d46c2	Fix color and size of profile picture circle in nav pane	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	4fbe8ac6b1	Console log a welcome message on loading web client	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	9fc6c97139	Use Khoj standard font family, weight in web client settings page	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	b6f07099cd	Simplify login page styling on web client - Center all elements: icon, text and button - Use khoj icon not logo-text - Simplify login title text	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	7b7f6d3bc8	Update web client theme to a lighter - Update background color to a different shade of white - Make primary and primary hover colors less intense and more aligned with lantern flame shade - Add water, leaf, flower color variables	2023-11-02 20:42:21 -07:00
sabaimran	fe860aaf83	Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into features/multi-user-support-khoj	2023-11-02 14:56:01 -07:00
sabaimran	2c9496bcf1	Add additional null checks in the migrate_server_pg script	2023-11-02 14:55:58 -07:00
sabaimran	20df0f5330	Use url_path_for for creating the login page URL in the application	2023-11-02 14:55:14 -07:00
sabaimran	fd11b78552	Fix migration script error when openai not available (#530 )	2023-11-02 11:28:08 -07:00
sabaimran	fe6720fa06	[Multi-User Part 8]: Make conversation processor settings server-wide (#529 ) - Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code. - To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match. - Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings	2023-11-02 10:43:27 -07:00
Debanjum Singh Solanky	12b3eeae9e	Use Khoj fonts on config page of web and desktop apps too Previously pico.css font-families were being selected for the config page. This was different from the fonts used by index.html, chat.html This improves spacing issue of heading further	2023-11-01 17:50:50 -07:00
Debanjum Singh Solanky	022d695309	Switch to narrow view below width of 700px on web client This makes the dropdown menu align better to the profile picture in mobile view	2023-11-01 17:49:44 -07:00
Debanjum Singh Solanky	6a0adfbfbb	Default to profile picture with Initial if user has no profile picture	2023-11-01 17:49:44 -07:00
Tuan Nguyen	354605e73e	Autofocus to chat input when openning chat (#524 )	2023-11-01 16:09:45 -07:00
Debanjum Singh Solanky	d92a2d03a7	Rename Files, Classes from X_To_JSONL to more appropriate X_To_Entries These content processors are converting content into entries in DB instead of entries in JSONL file	2023-11-01 14:51:33 -07:00
Debanjum Singh Solanky	2ad2055bcb	Remove user null check in API controllers that require authentication	2023-11-01 14:38:19 -07:00
Debanjum Singh Solanky	7ac5a4766d	Match spacing of navigation header pane in config vs search/chat pages	2023-11-01 14:38:19 -07:00
Debanjum Singh Solanky	2e3a4a6a9b	Use Jinja macro to deduplicate navigation header HTML	2023-11-01 14:38:12 -07:00
Debanjum Singh Solanky	c631b61a81	Put colors shared by index, chat html into khoj css global variables	2023-11-01 02:13:24 -07:00
Debanjum Singh Solanky	f585a71744	Put logout, settings under dropdown menu with logged in user's profile picture - Create dropdown menu. Put settings page, logout action under it - Make user's profile picture the dropdown menu heading - Create khoj.js to store shared js across web client It currently stores the dropdown menu open, close functionality - Put shared styling for khoj dropdown menu under khoj.css	2023-11-01 02:13:24 -07:00
Debanjum Singh Solanky	58a7171911	Show truncated API key for identification & restrict table width - Use a function to generate API Key table row HTML, to dedup logic - Show delete, copy icon hints on hover - Reduce length of copied message to not expand table width - Truncating API key helps keep the API key table width within width of smaller width displays	2023-10-31 23:10:26 -07:00
Debanjum Singh Solanky	9cebd7f856	Add emoji icons to Search, Chat, Settings items in nav menu of Web client Emoji icons have already been added to the Search, Chat and Settings top navigation menu in the desktop client. This change adds these to the web client as well	2023-10-31 22:38:44 -07:00
Debanjum Singh Solanky	f77336ba61	Add key icon for API keys table in Web client config page	2023-10-31 19:01:09 -07:00
Debanjum Singh Solanky	87e6b1eab9	Rename TextEmbeddings to TextEntries for improved readability Improves readability as name has closer match to underlying constructs	2023-10-31 18:55:59 -07:00
Debanjum Singh Solanky	bcbee05a9e	Rename DbModels Embeddings, EmbeddingsAdapter to Entry, EntryAdapter Improves readability as name has closer match to underlying constructs - Entry is any atomic item indexed by Khoj. This can be an org-mode entry, a markdown section, a PDF or Notion page etc. - Embeddings are semantic vectors generated by the search ML model that encodes for meaning contained in an entries text. - An "Entry" contains "Embeddings" vectors but also other metadata about the entry like filename etc.	2023-10-31 18:50:54 -07:00
sabaimran	54a387326c	[Multi-User Part 6]: Address small bugs and upstream PR comments (#518 ) - `08654163cb`: Add better parsing for XML files - `f3acfac7fb`: Add a try/catch around the dateparser in order to avoid internal server errors in app - `7d43cd62c0`: Chunk embeddings generation in order to avoid large memory load - `e02d751eb3`: Addresses comments from PR #498 - `a3f393edb4`: Addresses comments from PR #503 - `66eb078286`: Addresses comments from PR #511 - Address various items in https://github.com/khoj-ai/khoj/issues/527	2023-10-31 17:59:53 -07:00
sabaimran	5f3f6b7c61	[Multi-User Part 5]: Add a production Docker file and use a gunicorn configuration with it (#514 ) - Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests - Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same	2023-10-26 13:15:31 -07:00
Debanjum	9acc722f7f	[Multi-User Part 4]: Authenticate using API Tokens (#513 ) ### ✨ New - Use API keys to authenticate from Desktop, Obsidian, Emacs clients - Create API, UI on web app config page to CRUD API Keys - Create user API keys table and functions to CRUD them in Database ### 🧪 Improve - Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality - Only load chat model to GPU if enough space, throw error on load failure - Show encoding progress, truncate headings to max chars supported - Add instruction to create db in Django DB setup Readme ### ⚙️ Fix - Fix error handling when configure offline chat via Web UI - Do not warn in anon mode about Google OAuth env vars not being set - Fix path to load static files when server started from project root	2023-10-26 12:33:03 -07:00
sabaimran	4b6ec248a6	[Multi-User Part 3]: Separate chat sesssions based on authenticated users (#511 ) - Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration. - There does _seem_ to be some regression in chat quality, which is most likely attributable to search results. This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.	2023-10-26 11:37:41 -07:00
sabaimran	a8a82d274a	[Multi-User Part 2]: Add login pages and gate access to application behind login wall (#503 ) - Make most routes conditional on authentication if anonymous mode is not enabled. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions. - Add a basic login page and add routes for redirecting the user if logged in	2023-10-26 10:17:29 -07:00
sabaimran	216acf545f	[Multi-User Part 1]: Enable storage of settings for plaintext files based on user account (#498 ) - Partition configuration for indexing local data based on user accounts - Store indexed data in an underlying postgres db using the `pgvector` extension - Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time - Apply filters using SQL queries - Start removing many server-level configuration settings - Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB. - Update the Docker image and docker-compose.yml to work with the new application design	2023-10-26 09:42:29 -07:00
Debanjum Singh Solanky	9677eae791	Expose CLI flag to disable using GPU for offline chat model - Offline chat models outputing gibberish when loaded onto some GPU. GPU support with Vulkan in GPT4All seems a bit buggy - This change mitigates the upstream issue by allowing user to manually disable using GPU for offline chat Closes #516	2023-10-25 17:51:46 -07:00
Debanjum Singh Solanky	0f1ebcae18	Upgrade to latest GPT4All. Use Mistral as default offline chat model GPT4all now supports gguf llama.cpp chat models. Latest GPT4All (+mistral) performs much at least 3x faster. On Macbook Pro at ~10s response start time vs 30s-120s earlier. Mistral is also a better chat model, although it hallucinates more than llama-2	2023-10-22 19:04:23 -07:00
sabaimran	963cd165eb	Resolve merge conflicts	2023-10-19 14:39:05 -07:00
Debanjum Singh Solanky	8346e1193c	Release Khoj version 0.13.0	2023-10-18 03:43:54 -07:00
Debanjum Singh Solanky	6631fc38db	Delete plaintext config via API. Catch any offline model loading exception	2023-10-18 03:37:45 -07:00
Debanjum Singh Solanky	53abd1a506	Mark sync completed on desktop client, even when no files to send Previously Sync spinner on desktop config screen would hang when no files to send to server & the Sync button had been manually triggered	2023-10-18 01:30:56 -07:00
Debanjum Singh Solanky	71b0012e8c	Set offline chat config to default value if unset on server load	2023-10-18 00:59:43 -07:00
Debanjum Singh Solanky	cf1cdc3fe1	Disambiguate input_filter variable names in fs_syncer functions	2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky	e3cd8b4150	Only index files returned by input-filter globs in fs_syncer Ignore .org, .pdf etc. suffixed directories under `input-filter' from being evaluated as files. Explicitly filter results by input-filter globs to only index files, not directory for each text type Add test to prevent regression Closes #448	2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky	51363d280d	Do not configure khoj server for pull based indexing from khoj.el Do not make khoj server pull update index on Obsidian plugin load. Index is updated on push from plugin instead now/	2023-10-17 21:47:19 -07:00
Debanjum Singh Solanky	d9d133dfb9	Read text files as utf-8, instead of default os locale On Windows, the default locale isn't utf8. Khoj had regressed to reading files in OS specified locale encoding, e.g cp1252, cp949 etc. It now explicitly uses utf8 encoding to read text files for indexing Resolves #495, resolves #472	2023-10-17 21:47:19 -07:00
Debanjum	3d4576ae38	Fix encoding binary files for sync from the Desktop, Obsidian client (#506 ) - Fix encoding binary files like PDFs for sync from Desktop client - Fix encoding binary files like PDFs for sync from Obsidian client	2023-10-17 15:37:22 -07:00
Debanjum Singh Solanky	c8293998d9	Fix encoding binary files like PDFs for sync from Obsidian client Use readBinary to read binary files like PDFs instead of read	2023-10-17 15:08:30 -07:00
sabaimran	ba60c869c9	Fix encoding binary files like PDFs for sync from Desktop client Use readFileSync, Buffer to pass appropriately formatted binary data	2023-10-17 15:08:23 -07:00
Andrew Spott	3d7381446d	Changed globbing. Now doesn't clobber a users glob if they want to a… (#496 ) * Changed globbing. Now doesn't clobber a users glob if they want to add it, but will (if just given a directory), add a recursive glob. Note: python's glob engine doesn't support `{}` globing, a future option is to warn if that is included. * Fix typo in globformat variable * Use older glob pattern for plaintext files --------- Co-authored-by: Saba <narmiabas@gmail.com>	2023-10-17 11:26:06 -07:00
sabaimran	2646c8554d	Provide a default value to offline_chat configuration of the conversation processor	2023-10-17 10:35:22 -07:00
Debanjum Singh Solanky	b8976426eb	Update offline chat model config schema used by Emacs, Obsidian clients The server uses a new schema for the conversation config. The Emacs, Obsidian clients need to use this schema to update the conversation config	2023-10-17 07:01:35 -07:00
Debanjum	ecc6fbfeb2	Push Files to Index from Emacs, Obsidian & Desktop Clients using Multi-Part Forms (#499 ) ### Overview - Add ability to push data to index from the Emacs, Obsidian client - Switch to standard mechanism of syncing files via HTTP multi-part/form. Previously we were streaming the data as JSON - Benefits of new mechanism - No manual parsing of files to send or receive on clients or server is required as most have in-built mechanisms to send multi-part/form requests - The whole response is not required to be kept in memory to parse content as JSON. As individual files arrive they're automatically pushed to disk to conserve memory if required - Binary files don't need to be encoded on client and decoded on server ### Code Details ### Major - Use multi-part form to receive files to index on server - Use multi-part form to send files to index on desktop client - Send files to index on server from the khoj.el emacs client - Send content for indexing on server at a regular interval from khoj.el - Send files to index on server from the khoj obsidian client - Update tests to test multi-part/form method of pushing files to index #### Minor - Put indexer API endpoint under /api path segment - Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method - Improve emoji, message on content index updated via logger - Don't call khoj server on khoj.el load, only once khoj invoked explicitly by user - Improve indexing of binary files - Let fs_syncer pass PDF files directly as binary before indexing - Use encoding of each file set in indexer request to read file - Add CORS policy to khoj server. Allow requests from khoj apps, obsidian & localhost - Update indexer API endpoint URL to` index/update` from `indexer/batch` Resolves #471 #243	2023-10-17 06:05:15 -07:00
Debanjum Singh Solanky	6a4f1b2188	Add more client, request details in logs by index/update API endpoint	2023-10-17 05:43:29 -07:00
Debanjum Singh Solanky	5efae1ad55	Update indexer API endpoint query params for force, content type New URL query params, `force' and `t' match name of query parameter in existing Khoj API endpoints Update Desktop, Obsidian and Emacs client to call using these new API query params. Set `client' query param from each client for telemetry visibility	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	84654ffc5d	Update indexer API endpoint URL to index/update from indexer/batch New URL follows action oriented endpoint naming convention used for other Khoj API endpoints Update desktop, obsidian and emacs client to call this new API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	e347823ff4	Log telemetry for index updates via push to API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	05be6bd877	Clicking Update Index in Obsidian settings should push files to index Use the indexer/batch API endpoint to regenerate content index rather than the previous pull based content indexing API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	13a3122bf3	Stop configuring server to pull files to index from Obsidian client Obsidian client now pushes vault files to index instead	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	99a2c934a3	Add CORS policy to allow requests from khoj apps, obsidian & localhost Using fetch from Khoj Obsidian plugin was failing due to cross-origin request and method: no-cors didn't allow passing x-api-key custom header. And using Obsidian's request with multi-part/form-data wasn't possible either.	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	541cd59a49	Let fs_syncer pass PDF files directly as binary before indexing No need to do unneeded base64 encoding/decoding to pass pdf contents for indexing from fs_syncer to pdf_to_jsonl	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	d27dc71dfe	Use encoding of each file set in indexer request to read file Get encoding type from multi-part/form-request body for each file Read text files as utf-8 and pdfs, images as binary	2023-10-17 04:58:12 -07:00
Debanjum Singh Solanky	8e627a5809	Pass any files to be deleted to indexer API via Khoj Obsidian plugin - Keep state of previously synced files to identify files to be deleted - Last synced files stored in settings for persistence of this data across Obsidian reboots	2023-10-17 03:34:49 -07:00
Debanjum Singh Solanky	f2e293a149	Push Vault files to index to Khoj server using Khoj Obsidian plugin Use the multi-part/form-data request to sync Markdown, PDF files in vault to index on khoj server Run scheduled job to push updates to value for indexing every 1 hour	2023-10-17 03:05:30 -07:00
Debanjum Singh Solanky	6baaaaf91a	Test request body of multi-part form to update content index from khoj.el	2023-10-16 23:54:32 -07:00
Debanjum Singh Solanky	79b3f8273a	Make khoj.el send files to be deleted from index to server	2023-10-16 23:53:02 -07:00
Debanjum Singh Solanky	f64fa06e22	Initialize the Khoj Transient menu on first run instead of load This prevents Khoj from polling the Khoj server until explicitly invoked via `khoj' entrypoint function. Previously it'd make a request to the khoj server every time Emacs or khoj.el was loaded Closes #243	2023-10-16 19:11:46 -07:00
Debanjum	b4949f7f0b	Improve Offline Chat Model Experience (#494 ) - Make offline chat model user configurable. Use `filename` of any [GPT4All supported model](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models.json) like below: - Run GPT4All Chat Model on GPU, when available via [GPT4All Vulcan support](https://blog.nomic.ai/posts/gpt4all-gpu-inference-with-vulkan) - Use default Llama 2 supported by GPT4All - Make `tokenizer` and `max-prompt-size` of chat model user configurable. E.g When using chat models not in [this pre-defined list](https://github.com/khoj-ai/khoj/blob/master/src/khoj/processor/conversation/utils.py) that support larger context window or a different tokenizer. Closes #406, #418	2023-10-16 17:44:49 -07:00
Debanjum Singh Solanky	644c3b787f	Scale no. of chat history messages to use as context with max_prompt_size Previously lookback turns was set to a static 2. But now that we support more chat models, their prompt size vary considerably. Make lookback_turns proportional to max_prompt_size. The truncate_messages can remove messages if they exceed max_prompt_size later This lets Khoj pass more of the chat history as context for models with larger context window	2023-10-16 17:22:28 -07:00
Debanjum Singh Solanky	df1d74a879	Use max_prompt_size, tokenizer from config for chat model context stuffing	2023-10-15 16:52:53 -07:00
Debanjum Singh Solanky	116595b351	Use chat_model specified in new offline_chat section of config - Dedupe offline_chat_model variable. Only reference offline chat model stored under offline_chat. Delete the previous chat_model field under GPT4AllProcessorConfig - Set offline chat model to use via config/offline_chat API endpoint	2023-10-15 16:37:49 -07:00
Debanjum Singh Solanky	feb4f17e3d	Update chat config schema. Make max_prompt, chat tokenizer configurable This provides flexibility to use non 1st party supported chat models - Create migration script to update khoj.yml config - Put `enable_offline_chat' under new `offline-chat' section Referring code needs to be updated to accomodate this change - Move `offline_chat_model' to `chat-model' under new `offline-chat' section - Put chat `tokenizer` under new `offline-chat' section - Put `max_prompt' under existing `conversation' section As `max_prompt' size effects both openai and offline chat models	2023-10-15 16:35:11 -07:00
sabaimran	c125995d94	[Multi-User]: Part 0 - Add support for logging in with Google (#487 ) * Add concept of user authentication to the request session via GoogleUser	2023-10-14 19:39:13 -07:00
Debanjum Singh Solanky	247e75595c	Use AutoTokenizer to support more tokenizers	2023-10-14 16:54:52 -07:00
Saba	ff2dbadc9d	Use computed plaintext_content to set file content rather than calling f.read again	2023-10-14 13:28:34 -07:00
Debanjum Singh Solanky	1ad8b150e8	Add default tokenizer, max_prompt as fallback for non-default offline chat models Pass user configured chat model as argument to use by converse_offline The proper fix for this would allow users to configure the max_prompt and tokenizer to use (while supplying default ones, if none provided) For now, this is a reasonable start.	2023-10-13 22:48:56 -07:00
Debanjum Singh Solanky	56bd69d5af	Improve Llama v2 extract questions actor and associated prompt - Format extract questions prompt format with newlines and whitespaces - Make llama v2 extract questions prompt consistent - Remove empty questions extracted by offline extract_questions actor - Update implicit qs extraction unit test for offline search actor	2023-10-13 22:48:56 -07:00
sabaimran	09bb3686cc	Strip the incoming query from the slash conversation command (#500 ) * Strip the incoming query from the slash conversation command before passing it to the model or for search * Return q when content index not loaded * Remove -n 4 from pytest ini configuration to isolate test failures	2023-10-13 21:11:23 -07:00
Debanjum Singh Solanky	96c0b21285	Sync desktop app package.json with other Khoj clients metadata - Make `bump_version.sh' script set version for the Khoj desktop app too - Sync Khoj desktop app authors, license, description and version with the other interfaces and server - Update description in packages metadata to match project subtitle on Github	2023-10-13 20:43:55 -07:00
sabaimran	80fb56b8a5	Sync deksktop app package version with the other releases	2023-10-13 19:23:00 -07:00
Debanjum Singh Solanky	b669aa2395	Clean and fix the content indexing code in the Emacs client - Pass payloads as unibyte. This was causing the request to fail for files with unicode characters - Suppress messages with file content in on index updates - Fix rendering response from server on index update API call - Extract code to populate body of index update HTTP request with files	2023-10-13 18:00:37 -07:00
Debanjum Singh Solanky	bea196aa30	Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method Previously global state of `url-request-method' would affect the kind of request made to api/config/data API endpoint as it wasn't being explicitly being set before calling the API endpoint This was done with the assumption that the default value of GET for url-request-method wouldn't change globally But in some cases, experientially, it can get changed. This was resulting in khoj.el load failing as POST request was being made instead which would throw error	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	292f0420ad	Send content for indexing on server at a regular interval from khoj.el - Allow indexing frequency to be configurable by user - Ensure there is only one khoj indexing timer running	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	fc99431754	Send files to index on server from the khoj.el emacs client - Add elisp variable to set API key to engage with the Khoj server - Use multi-part form to POST the files to index to the indexer API endpoint on the khoj server	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	68018ef397	Use multi-part form to send files to index on desktop client - Add typing for variables in for loop and other minor formatting clean-up - Assume utf8 encoding for text files and binary for image, pdf files	2023-10-12 20:58:49 -07:00
Debanjum Singh Solanky	7190b3811d	Remove all filter terms in user query from defiltered_query Previously only the the last filter's terms were getting effectively applied as the `filter.defilter' operation was being done on `user_query' but was updating the `defiltered_query'	2023-10-12 20:56:17 -07:00
Debanjum Singh Solanky	60e9a61647	Use multi-part form to receive files to index on server - This uses existing HTTP affordance to process files - Better handling of binary file formats as removes need to url encode/decode - Less memory utilization than streaming json as files get automatically written to disk once memory utilization exceeds preset limits - No manual parsing of raw files streams required	2023-10-11 23:58:23 -07:00
Debanjum Singh Solanky	9ba173bc2d	Improve emoji, message on content index updated via logger Use mailbox closed with flag down once content index completed. Use standard, existing logger messages in new indexer messages, when files to index sent by clients	2023-10-11 17:12:03 -07:00
Debanjum Singh Solanky	6aa69da3ef	Put indexer API endpoint under /api path segment Update FastAPI app router, desktop app and to use new url path to batch indexer API endpoint All api endpoints should exist under /api path segment	2023-10-09 21:35:58 -07:00
Debanjum Singh Solanky	f6f7a62d80	Wait for user to stop typing to trigger search from khoj.el in Emacs - Improves user experience by aligning idle time with search latency to avoid display jitter (to render results) while user is typing - Makes the idle time configurable Closes #480	2023-10-06 12:44:45 -07:00
sabaimran	5c4f0d42b7	Return new default config in API endpoint	2023-10-06 12:30:09 -07:00
sabaimran	052b25af0a	Update default configuration passed to Khoj clients to circumvent valiation issues	2023-10-06 12:29:15 -07:00
Debanjum Singh Solanky	a85ff941ca	Make offline chat model user configurable Only GPT4All supported Llama v2 models will work given the prompt structure is not currently configurable	2023-10-04 20:41:14 -07:00
Debanjum Singh Solanky	d1ff812021	Run GPT4All Chat Model on GPU, when available GPT4All now supports running models on GPU via Vulkan	2023-10-04 18:42:12 -07:00
Debanjum Singh Solanky	13b16a4364	Use default Llama 2 supported by GPT4All Remove custom logic to download custom Llama 2 model. This was added as GPT4All didn't support Llama 2 when it was added to Khoj	2023-10-03 19:01:54 -07:00
sabaimran	4a5ed7f06c	Update Khoj package version for Electron, Desktop app (#492 ) * Address package upgrade for Electron application * Update package version for Electron desktop application	2023-10-03 12:21:32 -07:00
sabaimran	3f962a55c3	Fix Linux Desktop Application (#491 ) * Use separate functions for adding files and folders to configuration for indexing * Add a loading bar while data is syncing * Bump the minor version for the application	2023-10-03 11:43:19 -07:00
sabaimran	63b3696af0	Release Khoj version 0.12.3	2023-09-26 22:41:11 -07:00
sabaimran	d2f9bca1cf	Fix null ref issue in query method and update logic for determining whether khoj is already configured in obsidian	2023-09-26 22:33:44 -07:00
sabaimran	2f18383349	Release Khoj version 0.12.2	2023-09-26 11:59:47 -07:00
sabaimran	588f35b6e9	Add max prompt size for gpt-3.5-turbo-16k	2023-09-26 10:57:35 -07:00
sabaimran	4e370d7a18	Release Khoj version 0.12.1	2023-09-26 09:24:53 -07:00
sabaimran	3675aa348a	Update naming of Khoj in manifest.json for Obsidian	2023-09-26 09:24:36 -07:00
sabaimran	a82d1becc3	Release Khoj version 0.12.0	2023-09-26 09:17:56 -07:00
sabaimran	38f0df3d53	Remove unused icons from electron app folder	2023-09-26 07:56:29 -07:00
sabaimran	5e16074b92	Fix comparison for search type in plugins mode	2023-09-25 10:57:17 -07:00
sabaimran	2dd15e9f63	Resolve issues with GPT4All and fix prompt for yesterday extract questions date filter (#483 ) - GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all - Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example. - Update all setup data in conftest.py to use new client-server indexing pattern	2023-09-18 14:41:26 -07:00
sabaimran	b225d1188c	Fix formatting of gpt.py	2023-09-18 11:09:02 -07:00
Jonny-GM	34b202b868	More lenient date searching (#481 ) * Modify DateFilter to use compiled entry key * Instruct search to include date in query * Minor prompt change * Prompt fix	2023-09-18 10:46:00 -07:00
sabaimran	16874e1953	Provide force fallback for regeneration	2023-09-12 16:35:07 -07:00
sabaimran	9f42a1a036	Propagate flags to configure index command	2023-09-11 10:33:44 -07:00
sabaimran	343854752c	Improve docker builds for local hosting (#476 ) * Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions * Move configure_search method into indexer * Add conditional installation for gpt4all * Add hint to go to localhost:42110 in the docs. Addresses #477	2023-09-08 17:07:26 -07:00
sabaimran	dccfae3853	Remove PySide dependency and deprecate desktop builds (#475 ) * Remove PySide, gui option from code * Remove pyside 6 dependency from code * Remove workflows which build desktop applications * Update unit tests and update line in documentation * Remove additional references to pyinstaller, gui * Add uninstall steps to normal uninstall instructions	2023-09-07 11:36:27 -07:00
sabaimran	76562f4250	Add front-end Electron application for Khoj local file syncing (#473 ) * Initial version - setup a file-push architecture for generating embeddings with Khoj * Use state.host and state.port for configuring the URL for the indexer * Fix parsing of PDF files * Read markdown files from streamed data and update unit tests * On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system * Init: refactor indexer/batch endpoint to support a generic file ingestion format * Add features to better support indexing from files sent by the desktop client * Initial commit with Electron application - Adds electron app * Add import for pymupdf, remove import for pypdf * Allow user to configure khoj host URL * Remove search type configuration from index.html * Use v1 path for current indexer routes	2023-09-06 12:04:18 -07:00
bholagabbar	205dc90746	Fix notion title bug (#474 ) * Update notion_to_jsonl.py * Fix try-catch block	2023-09-05 10:47:42 -07:00
sabaimran	4854258047	Move to a push-first model for retrieving embeddings from local files (#457 ) * Initial version - setup a file-push architecture for generating embeddings with Khoj * Update unit tests to fix with new application design * Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request * Use state.host and state.port for configuring the URL for the indexer * On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system	2023-08-31 12:55:17 -07:00
sabaimran	92cbfef7ab	Skip plaintext file indexing if there's a parsing issue and log the file	2023-08-29 14:34:08 -07:00
sabaimran	74409c2c64	Release Khoj version 0.11.4	2023-08-29 11:44:35 -07:00
sabaimran	1b85958bcc	trim chat input start	2023-08-28 19:18:10 -07:00
sabaimran	e592f6eac8	Release Khoj version 0.11.3	2023-08-28 14:46:03 -07:00
sabaimran	7c35da9fc4	Fix bug in /chat endpoint for general and update depdendencies	2023-08-28 14:12:11 -07:00
sabaimran	bc09143856	Release Khoj version 0.11.2	2023-08-28 10:16:13 -07:00
Debanjum Singh Solanky	01b310635e	Enable passing search query filters via chat and test it	2023-08-28 09:24:32 -07:00
Debanjum Singh Solanky	794bad8bcb	Make date_filter.extract_date_range method always return a list type	2023-08-28 00:55:28 -07:00
Debanjum Singh Solanky	d5a2de6222	Add method to extract filter terms from query to all filters - Test the get_filter_term method in all 3 word, file, date filters - Make the existing can_filter method by default in base filter abstract class	2023-08-28 00:55:28 -07:00
Debanjum	150105505b	Add Default chat command. Make Khoj ask clarifying questions (#468 ) - Make Khoj ask clarifying questions when answer not in provided context - Add default conversation command to auto switch b/w general, notes modes - Show filtered list of commands available with the currently input text - Use general prompt when no references found and not in Notes mode - Test general and notes slash commands in offline chat director tests	2023-08-28 00:52:57 -07:00
Debanjum Singh Solanky	eb6cd4f8d0	Use general prompt when no references found and not in Notes mode	2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky	edffbad837	Make Khoj ask clarifying questions when answer not in provided context Previously it would just refuse ask for clarification. This improves the chat quality score for the existing director tests	2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky	75c1016ec0	Show filtered list of commands available with the currently input text	2023-08-28 00:46:10 -07:00
Debanjum Singh Solanky	74605f6159	Add default conversation command to auto switch b/w general, notes modes This was the default behavior but behavior regressed when adding slash commands in PR #463	2023-08-28 00:46:10 -07:00
sabaimran	cbc978ea08	Update help links for notion, github to point to the main docs	2023-08-27 15:02:55 -07:00
sabaimran	b45e1d8c0d	Fix plaintext HTML parsing and rendering (#464 ) * Store conversation command options in an Enum * Move to slash commands instead of using @ to specify general commands * Calculate conversation command once & pass it as arg to child funcs * Add /notes command to respond using only knowledge base as context This prevents the chat model to try respond using it's general world knowledge only without any references pulled from the indexed knowledge base * Test general and notes slash commands in openai chat director tests --------- Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>	2023-08-27 11:24:30 -07:00
Debanjum	7919787fb7	Use Slash Commands and Add Notes Slash Command (#463 ) * Store conversation command options in an Enum * Move to slash commands instead of using @ to specify general commands * Calculate conversation command once & pass it as arg to child funcs * Add /notes command to respond using only knowledge base as context This prevents the chat model to try respond using it's general world knowledge only without any references pulled from the indexed knowledge base * Test general and notes slash commands in openai chat director tests * Update gpt4all tests to use md configuration * Add a /help tooltip * Add dynamic support for describing slash commands. Remove default and treat notes as the default type --------- Co-authored-by: sabaimran <narmiabas@gmail.com>	2023-08-26 18:11:18 -07:00
sabaimran	e64357698d	Skip indexing single bad markdown, plaintext file (#460 )	2023-08-23 15:34:56 -07:00
sabaimran	84bd579077	Format the chat outputted message with code, bolding, or italics. Add a copy button for code. Closes #445 .	2023-08-19 20:02:57 -07:00
sabaimran	f9e09ba490	Do not try downloading model from GPT4All if the user is not connected to the internet	2023-08-19 19:09:21 -07:00
Debanjum Singh Solanky	3ff4e19dd2	Release Khoj version 0.11.1	2023-08-16 22:53:29 -07:00
sabaimran	4fb8c2c5e1	Pass a SIGTERM to tell the uvicorn server to exit and gracefully kill the thread	2023-08-16 21:27:05 -07:00
sabaimran	4e03dfea43	Attach the parent to the server thread, allowing the kill signal to trigger a graceful exit (#446 )	2023-08-16 19:36:10 -07:00
Debanjum Singh Solanky	26c3977fb9	Remove info hint to reindex khoj on unexpected search results The index corruption was issue resolved a while ago in #325 and hasn't cropped up again	2023-08-16 00:58:59 -07:00
sabaimran	def909a913	Revert "Open Web interface within Desktop app in GUI mode" (#444 )	2023-08-15 23:26:28 -07:00
sabaimran	6562ec6531	Release Khoj version 0.11.0	2023-08-14 19:25:03 -07:00
sabaimran	0ea901c7c1	Allow indexing to continue even if there's an issue parsing a particular org file (#430 ) * Allow indexing to continue even if there's an issue parsing a particular org file * Use approximation in pytorch comparison in text_search UT, skip additional file parser errors for org files * Change error of expected failure	2023-08-14 07:56:33 -07:00
sabaimran	7b907add77	Add support for indexing plaintext files (#420 ) * Add support for indexing plaintext files - Adds backend support for parsing plaintext files generically (.html, .txt, .xml, .csv, .md) - Add equivalent frontend views for setting up plaintext file indexing - Update config, rawconfig, default config, search API, setup endpoints * Add a nifty plaintext file icon to configure plaintext files in the Web UI * Use generic glob path for plaintext files. Skip indexing files that aren't in whitelist	2023-08-09 15:44:40 -07:00
Ellen7ions	26bddcb65c	Add support for starting a new line with shift-enter (#412 ) * Add support for starting a new line with shift-enter * Remove useless comments. Set font-size: medium. * Update src/khoj/interface/web/chat.html Update the styling to have the padding, margin and line-height like before. Co-authored-by: Debanjum <debanjum@gmail.com> * Update src/khoj/interface/web/chat.html Make the chat-body scroll to the bottom after resizing Co-authored-by: Debanjum <debanjum@gmail.com> --------- Co-authored-by: Debanjum <debanjum@gmail.com>	2023-08-07 19:49:07 -07:00
Debanjum Singh Solanky	97609e4995	Use 500px png of khoj logo instead svg for much smaller asset size The khoj logo svg was 1.3Mb. The 500px png of it is 38Kb. Given all usage of khoj-logo are below 230px this should work fine	2023-08-07 18:27:11 -07:00
Debanjum	14a816d173	Open Web interface within Desktop app in GUI mode (#429 ) Previously the GUI mode (with khoj --gui or using the desktop app) would open the web interface in the users default web browser. Now the web interface is just rendered within the app itself using PyQT's Webview. This gives it a more proper app like feel	2023-08-07 17:48:30 -07:00
Debanjum Singh Solanky	378b96ec1b	Open the khoj app window maximized on startup	2023-08-07 15:39:05 -07:00
Debanjum Singh Solanky	ea734ba1c8	Open app in native view on starting it in GUI mode instead of on web browser - Opens settings page on first run and landing page after in GUI mode Previously was only opening the GUI on linux after first run as it doesn't have a system tray - Both the views are from the web interface but are rendered within the app instead of the browser	2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky	9c494705a8	Open the search, chat or config view in app from the system tray menu	2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky	cc36b87345	Render the web interface directly within the desktop app as a webview	2023-08-07 13:41:12 -07:00
Jason Qin	3ef1b7073d	Update obsidian/manifest.json Closes #426	2023-08-07 10:41:39 -07:00
sabaimran	738cf650b3	Explicitly set Khoj to use the default locale of the user (#425 ) - Explicitly set locale using `locale.setLocale(locale.LC_ALL, '')` for localization. Relevant for datetime libraries. See [Python 3 documentation](https://docs.python.org/3/library/locale.html#locale.setlocale).	2023-08-07 09:23:24 -07:00
Muftawo	c8ef619090	fixed reference link to landing page (#417 ) * Fixed zsh error no matches found * Fixed home page 404 error	2023-08-04 10:38:14 -07:00
sabaimran	78012b8111	Avoid null ref issue when setting model state for web UI. Closes #410	2023-08-03 00:39:06 -07:00
sabaimran	0baed742e4	Add checksums to verify the correct model is downloaded as expected (#405 ) * Add checksums to verify the correct model is downloaded as expected - This should help debug issues related to corrupted model download - If download fails, let the application continue * If the model is not download as expected, add some indicators in the settings UI * Add exc_info to error log if/when download fails for llamav2 model * Simplify checksum checking logic, update key name in model state for web client	2023-08-02 23:26:52 -07:00
Debanjum Singh Solanky	e6e3acdbe4	Release Khoj version 0.10.1	2023-08-01 23:55:13 -07:00
Debanjum Singh Solanky	7c1d70aa17	Bump GPT4All response generation batch size to 512 from 256 A batch size of 512 performs ~20% better on a XPS with no GPU and 16Gb RAM. Seems worth the tradeoff for now	2023-08-01 23:34:02 -07:00
Debanjum	16c6bfce8e	Improve Quality and Reliability of Offline Chat (#393 ) # Incoming ## Major ### Fix Prompt Size Exceeded Issue - Fix issues related to prompt size, Closes #386. Use the correct tokenizer to calculate whether the input needs to be truncated or not. ### Improve Llama 2 Model Download - Use the correct download link for LlamaV2 -- should have been using the small model, but was using the medium - Add better downloading logic to retry download if it failed, Closes #379 ### Fix Segmentation Fault due to Race - Add a lock around generating chat responses from the offline model to avoid segmentation faults. Closes #367. - Add a loading symbol to the web chat UI when the model is thinking. Closes #392 ### Improve Chat Response Latency - Improve performance of offline chat by increasing batch size (via `n_batch`) to automatically engage more cores/GPU, using smaller model and fixing prompt vs response token generation numbers. Closes #363 ### Fix Fake Dialogue Continuation - Fix formatting of user query with offline chat, this was contributing to #398 - Stop Llama 2 from Creating Fake Dialogue Continuations. Closes #398 ## Minor - Improve default message for Chat window on web when it's not configured. Include hint to use offline chat. - Add null check in `perform_chat_checks` method - Add offline chat director unit tests ## Performance Analysis (Time to First Token) \| \| v0.10.0 \| this branch \| \|-\|-\|-\| \| Query 1 \| 52s \| 28s \| \| Query 2 \| 33s\| 42s \| \| Query 3 \| 67s\| 38s\|	2023-08-01 22:07:27 -07:00
Debanjum Singh Solanky	44292afff2	Put offline model response generation behind the chat lock as well Not just the chat response streaming	2023-08-01 21:53:52 -07:00
Debanjum Singh Solanky	1812473d27	Extract new schema version for each migration script into a variable This should ease readability, indicates which version this migration script will update the schema to once applied	2023-08-01 21:41:08 -07:00
Debanjum Singh Solanky	b9937549aa	Simplify migration scripts management. Make them use static version - Only make them update config when it's run conditions are satisfies - Use static schema version to simplify reasoning about run conditions	2023-08-01 21:28:20 -07:00
Debanjum Singh Solanky	185a1fbed7	Remove old chat setup timer. It is mislabelled, irrelevant since streaming	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	c2b7a14ed5	Fix context, response size for Llama 2 to stay within max token limits Create regression text to ensure it does not throw the prompt size exceeded context window error	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	6e4050fa81	Make Llama 2 stop generating response on hitting specified stop words It would previously some times start generating fake dialogue with it's internal prompt patterns of <s>[INST] in responses. This is a jarring experience. Stop generation response when hit <s> Resolves #398	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	aa6846395d	Fix offline model migration script to run for version < 0.10.1 - Use same batch_size in extract question actor as the chat actor - Log final location the chat model is to be stored in, instead of it's temp filename while it is being downloaded	2023-08-01 20:51:53 -07:00
Ikko Eltociear Ashimine	49abb9df9c	Fix typo in orgnode.py (#397 ) Fix spelling of Ouput in org parser property drawer comment to Output.	2023-08-01 19:54:57 -07:00
sabaimran	f409e16137	Update some of the extract question prompts for llamav2	2023-08-01 12:23:36 -07:00
sabaimran	b11b00a9ff	Add log line for time to first response	2023-08-01 10:57:38 -07:00
sabaimran	778df6be71	Add a logline when the offline model migration script runs	2023-08-01 09:27:42 -07:00
sabaimran	3a5d93d673	Add migration script for getting the new offline model	2023-08-01 09:25:05 -07:00
sabaimran	90efc2ea7a	Update comments and add explanations	2023-08-01 09:24:03 -07:00
sabaimran	f7e03f6d63	Switch spinner snake case -> camel case	2023-08-01 08:52:25 -07:00
sabaimran	1c52a6993f	add a lock around chat operations to prevent the offline model from getting bombarded and stealing a bunch of compute resources - This also solves #367	2023-08-01 00:23:17 -07:00
sabaimran	6c3074061b	Disable the input bar when chat response is in flight	2023-08-01 00:21:39 -07:00
sabaimran	c14cbe926a	Add a loading symbol to web chat. Closes #392	2023-07-31 23:35:48 -07:00
sabaimran	8054bdc896	Use n_batch parameter to increase resource consumption on host machine (and implicitly engage GPU)	2023-07-31 23:25:08 -07:00
sabaimran	e55e9a7b67	Fix unit tests and truncation logic	2023-07-31 21:37:59 -07:00
sabaimran	2335f11b00	Add better error handling for download processes incase of failure	2023-07-31 21:07:38 -07:00
sabaimran	209975e065	Resolve merge conflicts: let Khoj fail if the model tokenizer is not found	2023-07-31 19:12:26 -07:00
sabaimran	2d6c3cd4fa	Misc. quality improvements for Llama V2 - Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S - Use a proper Llama Tokenizer for counting tokens for truncation with Llama - Add additional null checks when running	2023-07-31 19:11:20 -07:00
sabaimran	ca195097d7	Update chat hint message at first run	2023-07-31 17:46:09 -07:00
Debanjum Singh Solanky	ded606c7cb	Fix format of user query during general conversation with Llama 2	2023-07-31 17:21:14 -07:00
Debanjum Singh Solanky	48e5ac0169	Do not drop system message when truncating context to max prompt size Previously the system message was getting dropped when the context size with chat history would be more than the max prompt size supported by the cat model Now only the previous chat messages are dropped or the current message is truncated but the system message is kept to provide guidance to the chat model	2023-07-31 17:21:14 -07:00
sabaimran	88ef86ad5c	Fix typing issues for mypy (#372 )	2023-07-30 19:27:48 -07:00
sabaimran	ca2c942b65	Add typing to compiled_references and inferred_queries	2023-07-30 19:10:30 -07:00
sabaimran	3646fd1449	Add a warning to indicate that Khoj is not configured to work with personal data sources	2023-07-30 18:52:10 -07:00
sabaimran	996832dc72	Allow user to chat even if content types aren't configured - use empty references	2023-07-30 18:47:45 -07:00
Debanjum Singh Solanky	53810a0ff7	Create khoj config dir if non-existant, before writing to khoj env file	2023-07-30 01:35:36 -07:00
sabaimran	f65d157244	Release Khoj version 0.10.0	2023-07-28 19:27:47 -07:00
Debanjum Singh Solanky	f76af869f1	Do not log the gpt4all chat response stream in khoj backend Stream floods stdout and does not provide useful info to user	2023-07-28 19:14:04 -07:00
sabaimran	5ccb01343e	Add Offline chat to Obsidian (#359 ) * Add support for configuring/using offline chat from within Obsidian * Fix type checking for search type * If Github is not configured, /update call should fail * Fix regenerate tests same as the update ones * Update help text for offline chat in obsidian * Update relevant description for Khoj settings in Obsidian * Simplify configuration logic and use smarter defaults	2023-07-28 18:47:56 -07:00
Debanjum	b3c1507708	Merge pull request #361 from khoj-ai/configure-offline-chat-from-emacs - Configure using Offline Chat from Emacs: - Enable, Disable Offline Chat from Emacs - Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup - Benefits: Offline chat models are better for privacy but not great at answering questions	2023-07-28 18:06:58 -07:00
sabaimran	9f78db0579	Let Offline chat override OpenAI API settings (#362 ) * Let Offline chat override OpenAI API settings * Download the offline model whenever offline chat is enabled * Add progressbar for download for llamav2 model to track progress * Change ordering of n due to switch of default processor * Flip ordering of offline/openai checks when extracting questions from query	2023-07-28 17:26:20 -07:00

... 6 7 8 9 10 ...

1763 commits