sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-30 16:18:08 +00:00

Author	SHA1	Message	Date
sabaimran	dbdca7d8d1	Disable swagger UI docs in production	2024-01-24 15:23:39 +05:30
sabaimran	ddf6fd9c09	Remove valid number alert	2024-01-23 17:57:27 +05:30
Debanjum Singh Solanky	17107a0337	Release Khoj version 1.4.0	2024-01-23 10:18:31 +05:30
sabaimran	679db51453	Add support for phone number authentication with Khoj (part 2) (#621 ) * Allow users to configure phone numbers with the Khoj server * Integration of API endpoint for updating phone number * Add phone number association and OTP via Twilio for users connecting to WhatsApp - When verified, store the result as such in the KhojUser object * Add a Whatsapp.svg for configuring phone number * Change setup hint depending on whether the user has a number already connected or not * Add an integrity check for the intl tel js dependency * Customize the UI based on whether the user has verified their phone number - Update API routes to make nomenclature for phone addition and verification more straightforward (just /config/phone, etc). - If user has not verified, prompt them for another verification code (if verification is enabled) in the configuration page * Use the verified filter only if the user is linked to an account with an email * Add some basic documentation for using the WhatsApp client with Khoj * Point help text to the docs, rather than landing page info * Update messages on various callbacks and add link to docs page to learn more about the integration	2024-01-22 18:14:58 -08:00
sabaimran	58bf917775	Update the font used across Khoj desktop and web to be Tajawal (#622 )	2024-01-20 23:13:33 +05:30
Debanjum	679f0f24a4	Improve Chat Input Pane Actions. Move to 1 Click Audio Chat on Mobile (#624 ) ## Major ### Move to single click audio chat UX on Obsidian, Desktop, Web clients New default UX has 1 long-press on mobile, 2-click on desktop to send transcribed audio message - New Audio Chat Flow 1. Record audio while microphone button pressed 2. Show auto-send 3s countdown timer UI for audio chat message Provide a visual cue around send button for how long before audio message is automatically sent to Khoj for response 3. Auto-send msg in 3s unless stop send message button clicked - Why - Removes the previous default of 3 clicks required to send audio message The record > stop > send process to send audio messages was unclear and effortful - Still allows stopping message from being sent, to make correction to transcribed audio - Removes inadvertent long audio transcriptions if forget to press stop while recording ### Improve chat input pane actions & icons on Obsidian. Desktop, Web clients - Use SVG icons in chat footer on web, desktop app - Move delete icon to left of chat input. This makes it harder to inadvertently click it - Add send button to chat input pane - Color chat message send button to make it primary CTA - Make chat footer shorter. Use no or round border on action buttons ## Minor - Stop rendering empty starter questions element when no questions present - Add round border, hover color to starter questions in web, desktop apps - Fix auto resizing chat input box when transcribed text added - Convert chat input into a text area in the Obsidian client	2024-01-20 21:52:56 +05:30
Debanjum Singh Solanky	ec3b837d00	Send audio message in 2-clicks on desktop to avoid holding down mic button	2024-01-20 21:40:38 +05:30
Debanjum Singh Solanky	f0daa45ae0	Move to single click audio chat UX on Obsidian client - Capabillity New default UX has 1 long-press to send transcribed audio message - Removes the previous default of 3 clicks required to send audio message - The record > stop > send process to send audio messages was unclear - Still allows stopping message from being sent, if users want to make correction to transcribed audio - Removes inadvertent long audio transcriptions if user forgets to press stop when recording - Changes - Record audio while microphone button pressed - Show auto-send 3s countdown timer UI for audio chat message Provide a visual cue around send button for how long before audio message is automatically sent to Khoj for response - Auto-send msg in 3s unless stop send message button clicked	2024-01-20 16:07:12 +05:30
Debanjum Singh Solanky	29a581d2b0	Move to single click audio chat UX on desktop app - Capabillity New default UX has 1 long-press to send transcribed audio message - Removes the previous default of 3 clicks required to send audio message - The record > stop > send process to send audio messages was unclear - Still allows stopping message from being sent, if users want to make correction to transcribed audio - Removes inadvertent long audio transcriptions if user forgets to press stop when recording - Changes - Record audio while microphone button pressed - Show auto-send 3s countdown timer UI for audio chat message Provide a visual cue around send button for how long before audio message is automatically sent to Khoj for response - Auto-send msg in 3s unless stop send message button clicked	2024-01-20 16:03:51 +05:30
Debanjum Singh Solanky	699e9ff878	Move to single click audio chat UX on web app - Capabillity New default UX has 1 long-press to send transcribed audio message - Removes the previous default of 3 clicks required to send audio message - The record > stop > send process to send audio messages was unclear - Still allows stopping message from being sent, if users want to make correction to transcribed audio - Removes inadvertent long audio transcriptions if user forgets to press stop when recording - Changes - Record audio while microphone button pressed - Show auto-send 3s countdown timer UI for audio chat message Provide a visual cue around send button for how long before audio message is automatically sent to Khoj for response - Auto-send msg in 3s unless stop send message button clicked	2024-01-20 15:56:46 +05:30
Debanjum Singh Solanky	26bd3533d8	Stop rendering empty starter questions element when no questions present	2024-01-20 11:39:58 +05:30
Debanjum Singh Solanky	7c8c475c3a	Add round border, hover color to starter questions in web, desktop apps	2024-01-20 00:51:11 +05:30
Debanjum Singh Solanky	8a488b9e39	Fix auto resizing chat input box when transcribed text added	2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky	07ca137bdf	Convert chat input into a text area in the Obsidian client This allows for better readability of multi-line messages by users. The chat input is a text area in the other clients as well.	2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky	d4552117f6	Add and improve chat input pane, actions, icons on Obsidian client - Move delete icon to left of chat input. This makes it harder to inadvertently click - Add send button to chat footer. Enter being the only way to send messages is not intuitive, outside standard modern UI patterns - Color chat message send button to make it primary CTA on web client - Make chat footer shorter. Use no or round border on action buttons	2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky	c0ad64d9a3	Add and improve chat input pane, actions, icons on desktop client - Use SVG icons in chat footer on web - Move delete icon to left of chat input. This makes it harder to inadvertently click - Add send button to chat footer. Enter being the only way to send messages is not intuitive, outside standard modern UI patterns - Color chat message send button to make it primary CTA on web client - Make chat footer shorter. Use no or round border on action buttons	2024-01-20 00:29:49 +05:30
Debanjum Singh Solanky	ea85ebdacb	Add and improve chat input pane, actions, icons on web client - Use SVG icons in chat footer on web - Move delete icon to left of chat input. This makes it harder to inadvertently click - Add send button to chat footer. Enter being the only way to send messages is not intuitive, outside standard modern UI patterns - Color chat message send button to make it primary CTA on web client - Make chat footer shorter. Use no or round border on action buttons	2024-01-19 20:40:42 +05:30
sabaimran	039ed78253	Add support for a first-party client app to call into Khoj (Part 1) (#601 ) * Add support for a first party client app - Based on a client id and client secret, allow a first party app to call into the Khoj backend with a phone number identifier - Add migration to add phone numbers to the KhojUser object * Add plus in front of country code when registering a phone number. - Decrease free tier limit to 5 (from 10) - Return a response object when handling stripe webhooks * Fix telemetry method which references authenticated user's client app * Add better error handling for null phone numbers, simplify logic of authenticating user * Pull the client_secret in the API call from the authorization header * Add a migration merge to resolve phone number and other changes	2024-01-18 19:24:14 +05:30
Debanjum Singh Solanky	9dfe1bb003	Fix updating subscription when invoice paid. Revert renewal_date logic The actual issue was that `get_or_create_user_by_email' tried to create a subscription even if it already existed. With updated logic: - New subscription is only created when it doesn't already exist in `get_or_create_user_by_email' - `set_user_subscription' just updates the subscription state as user subscription object creation is already managed by `get_or_create_user_by_email'. So the other conditionals are unnecessary	2024-01-18 16:20:18 +05:30
Debanjum Singh Solanky	9b1a66c969	Fix updating subscription renewal date when invoice paid	2024-01-18 14:46:10 +05:30
sabaimran	93d5cb128c	Initialize embeddings to empty list before processing	2024-01-18 13:27:04 +05:30
Debanjum Singh Solanky	24af888c41	Release Khoj version 1.3.0	2024-01-18 11:42:13 +05:30
Debanjum	8b4dd16255	Fix markdownRenderer arg to allow chat responses in Obsidian plugin (#619 ) - Issue Users with Dataview plugin would have error as its markdown post-processor expects the sourcePath to be a string This prevents Khoj from responding to chat messages in the Obsidian chat modal. Search via Obsidian still works but it throws the same dataview plugin error - Fix Pass a string as sourcePath to markdownRenderer to fix failing chat response and stop throwing dataview errors on search Resolves #614, Resolves #606	2024-01-18 10:18:31 +05:30
Debanjum	c8dbe8ee7b	Improve server status check and message in Obsidian client (#617 ) - Update health API to pass authenticated users their info - Improve Khoj server status check in Khoj Obsidian client - Show Khoj Obsidian commands even if no connection to server - Show Khoj chat by default in Obsidian side pane instead of search	2024-01-18 10:17:35 +05:30
Debanjum Singh Solanky	f9420e1209	Show Khoj Obsidian commands even if no connection to server Server connection check can be a little flaky in Obsidian. Don't gate the commands behind it to improve usability of Khoj. Previously the commands would get disabled when server connection check failed, even though server was actually accessible	2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky	36bf42a860	Show Khoj chat by default in Obsidian side pane instead of search	2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky	aab75a6ead	Improve Khoj server status check in Khoj Obsidian client - Update server connection status on every edit of khoj url, api key in settings instead of only on plugin load The error message was stale if connection fixed after changes in Khoj plugin settings to URL or API key, like on plugin install - Show better welcome message on first plugin install. Include API key setup instruction - Show logged in user email on Khoj settings page	2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky	1a46734485	Fix markdownRenderer arg to allow chat responses in Obsidian plugin - Issue: Users with Dataview plugin would have error as its markdown post-processor expects the sourcePath to be a string This prevents Khoj from responding to chat messages in the Obsidian chat modal. Search via Obsidian still works but it throws the same dataview error - Fix: Pass a string as sourcePath to markdownRenderer to fix failing chat response Resolves #614, Resolves #606	2024-01-18 10:02:50 +05:30
sabaimran	e9e49ea098	Allow custom inference endpoint for the crossencoder model (#616 ) * Add support for custom inference endpoints for the cross encoder model - Since there's not a good out of the box solution, I've deployed a custom model/handler via huggingface to support this use case. * Use langchain.community for pdf, openai chat modules * Add an explicit stipulation that the api endpoint for crossencoder inference should be for huggingface for now	2024-01-18 10:02:12 +05:30
Debanjum Singh Solanky	870af19ba4	Update health API to pass authenticated users their info This allows Khoj clients to get email address associated with user's API token for display in client UX In anonymous mode, default user information is passed	2024-01-17 13:38:57 +05:30
Debanjum	4d30f7d1d9	Short-circuit API rate limiter for unauthenticated users (#607 ) ### Major - Short-circuit API rate limiter for unauthenticated user Calls by unauthenticated users were failing at API rate limiter as it failed to access user info object. This is a bug. API rate limiter should short-circuit for unauthenicated users so a proper Forbidden response can be returned by API Add regression test to verify that unauthenticated users get 403 response when calling the /chat API endpoint ### Minor - Remove trailing slash to normalize khoj url in obsidian plugin settings - Move used /api/config API controllers into separate module - Delete unused /api/beta API endpoint - Fix error message rendering in khoj.el, khoj obsidian chat - Handle deprecation warnings for subscribe renew date, langchain, pydantic & logger.warn	2024-01-17 00:59:52 +05:30
Debanjum Singh Solanky	2752e0d607	Update jinja2 and axios min supported package versions	2024-01-16 18:45:38 +05:30
Debanjum Singh Solanky	7039c202c8	Merge branch 'master' into short-circuit-api-rate-limiter	2024-01-16 18:18:34 +05:30
Debanjum Singh Solanky	8917228dbb	Remove unused, deprecated /api/config/data API endpoints - Use /api/health for server up check instead of api/config/default - Remove unused `khoj--post-new-config' method - Remove the now unused /config/data GET, POST API endpoints	2024-01-16 18:15:06 +05:30
Debanjum Singh Solanky	6ded4c1d75	Merge branch 'master' into fix-1000-file-index-update-limit	2024-01-16 16:50:58 +05:30
Debanjum Singh Solanky	16175137e5	Decode URL encoded query string in chat API endpoint before processing	2024-01-16 13:09:28 +05:30
Debanjum Singh Solanky	9fe1c8ae13	Make references and online_results optional params to converse_offline Fixes all the failing GPT4All tests because they were missing the online_results argument	2024-01-16 13:09:28 +05:30
Debanjum Singh Solanky	d74f8e03d3	Pass max context length to fix using updated GPT4All.list_gpu method It's signature was updated in GPT4All 2.1.0 pypi release. Resolves #610	2024-01-16 12:23:45 +05:30
Debanjum Singh Solanky	1ae6669fbf	Correctly handle API response when no files to index	2024-01-16 11:57:40 +05:30
sabaimran	50575b749b	Add option to use HuggingFace's inference endpoint for generating embeddings (#609 ) * Support using hosted Huggingface inference endpoint for embeddings generation * Since the huggingface inference endpoint is model-specific, make the URL an optional property of the search model config * Handle ECONNREFUSED error in desktop app * Drive API key via the search model config model and use more generic names	2024-01-16 08:58:24 +05:30
Debanjum Singh Solanky	ba37b28fb5	Improve batched error handling. Catch can't connect to server error Break out of batch processing when unable to connect to server or when requests throttled by server	2024-01-14 01:04:44 +05:30
Debanjum Singh Solanky	7dfbcd2e5a	Handle subscribe renew date, langchain, pydantic & logger.warn warnings - Ensure langchain less than 0.2.0 is used, to prevent breaking ChatOpenAI, PyMuPDF usage due to their deprecation after 0.2.0 - Set subscription renewal date to a timezone aware datetime - Use logger.warning instead of logger.warn as latter is deprecated - Use `model_dump' not deprecated dict to get all configured content_types	2024-01-12 01:46:52 +05:30
Debanjum Singh Solanky	5f97357fe0	Delete unused /api/beta API endpoint	2024-01-12 01:11:05 +05:30
Debanjum Singh Solanky	bb1c1b39d8	Move /api/config API controllers into separate module for code modularity	2024-01-12 01:11:04 +05:30
Debanjum Singh Solanky	ba99089a12	Short-circuit API rate limiter for unauthenticated user Calls by unauthenticated users were failing at API rate limiter as it failed to access user info object. This is a bug. API rate limiter should short-circuit for unauthenicated users so a proper Forbidden response can be returned by API Add regression test to verify that unauthenticated users get 403 response when calling the /chat API endpoint	2024-01-12 00:23:50 +05:30
Debanjum Singh Solanky	b1269fdad2	Remove trailing slash to normalize khoj url in obsidian plugin settings	2024-01-11 21:56:36 +05:30
Debanjum Singh Solanky	ffdb291fe0	Fix error message rendering in khoj.el, khoj obsidian chat - Fix failed to index error message in khoj.el - Fix chat model not configured message in khoj obsidian chat	2024-01-11 21:55:54 +05:30
Debanjum Singh Solanky	af9ceb00a0	Show relevant error msg in desktop app, e.g when can't connect to server	2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky	43423432ce	Pass indexed filenames in API response for client validation	2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky	5f9ac5a630	Collect files to index in single dict to simplify index/update controller Simplifies code while maintaining typing	2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky	efe41aaaca	Push 1000 files at a time from the Desktop client for indexing FastAPI API endpoints only support uploading 1000 files at a time. So split all files to index into groups of 1000 for upload to index/update API endpoint	2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky	b6d5392c0c	Release Khoj version 1.2.1	2024-01-04 18:45:37 +05:30
Debanjum Singh Solanky	fca7a5ff32	Push 1000 files at a time from the Obsidian client for indexing FastAPI API endpoints only support uploading 1000 files at a time. So split all files to index into groups of 1000 for upload to index/update API endpoint	2024-01-04 18:43:22 +05:30
Debanjum Singh Solanky	4a234c8db3	Use default offline/openai chat model to extract DB search queries Make usage of the first offline/openai chat model as the default LLM to use for background tasks more explicit The idea is to use the default/first chat model for all background activities, like user message to extract search queries to perform. This is controlled by the server admin. The chat model set by the user is used for user-facing functions like generating chat responses	2024-01-03 14:04:49 +05:30
Debanjum Singh Solanky	e28adf2884	Also index pdf, markdown and plaintext files using khoj emacs client Previously you could only index org-mode files and directories from khoj.el Mark the `khoj-org-directories', `khoj-org-files' variables for deprecation, since `khoj-index-directories', `khoj-index-files' replace them as more appropriate names for the more general case Resolves #597	2024-01-03 11:46:17 +05:30
Debanjum Singh Solanky	5abaed9d08	Use user chosen OpenAI model to extract DB search questions from query Previously Khoj was selecting the first OpenAI model configured on server and not the OpenAI model configured by the user for themselves	2024-01-03 11:45:06 +05:30
Debanjum Singh Solanky	05536aab6b	Merge how users can share personal information in personality prompt	2024-01-03 11:40:14 +05:30
Liam Swayne	455f78b178	Replace var declarations with let declarations (#576 ) * Replace var declaration with let declaration	2023-12-29 10:20:48 +05:30
sabaimran	79913d4c17	Add isort to the pre-commit configuration and apply it to the whole project (#595 ) * Apply isort to the entire repository * Fix missing import issues in text_to_entries * Fix imports in migration files	2023-12-28 18:04:02 +05:30
sabaimran	442c913de3	Update telemetry state for search model only if one is found, fix alt text for language setting	2023-12-28 12:53:53 +05:30
sabaimran	d3ab3f1b70	Rename matrix_blog to web and move the language setting into the content section	2023-12-28 12:44:49 +05:30
sabaimran	00af6baeb6	Resolve merge conflicts with intro message in chat.html web view	2023-12-23 17:52:58 +05:30
sabaimran	afec4394f9	Merge pull request #592 from ayushjha119/Fixed-Health-Check-to-Khoj-api Fixed health check to khoj api	2023-12-23 13:04:50 +05:30
sabaimran	c50eb8a691	Fix mypy/pre-commit issues	2023-12-23 11:44:37 +05:30
Debanjum Singh Solanky	21c55b4c0d	Release Khoj version 1.2.0	2023-12-22 21:43:47 +05:30
Debanjum Singh Solanky	6a8c1fe423	Sanitize rendering chat references in Web, Desktop and Obsidian clients Use textContent instead of innerHTML to append references Resolves #583	2023-12-22 18:11:49 +05:30
Debanjum	6879daccc6	Fix Chat Streaming on Obsidian, Docker Image Version and First-Run, Chat Error Messages in Clients (#589 ) - Fix streaming chat response in Obsidian client - Fix first-run, chat error message in obsidian, desktop and web clients - Set Khoj app version to latest version in Docker images - Tag Khoj Docker image built on release with the `latest` tag This align docker image release cadence with client, server releases	2023-12-22 04:13:01 -08:00
Debanjum Singh Solanky	d101297995	Use markdown formatted chat message in chat modal	2023-12-22 17:01:31 +05:30
Debanjum Singh Solanky	350fd89c8d	Clear chat history html in Obsidian if getChatHistory works too	2023-12-22 17:01:31 +05:30
ayushjha119	e487ec5370	fixed app to api health Check	2023-12-21 17:51:30 +05:30
Debanjum Singh Solanky	70607cbbbb	Update FRE message to get any Khoj client to sync files with server	2023-12-21 15:23:47 +05:30
ayushjha119	b3d7d6a79d	used the Response class from fastapi.responses and set the input for status_code to 200	2023-12-21 14:26:40 +05:30
sabaimran	e1aaff2053	Add more details about functionality in Khoj's intro message	2023-12-21 10:09:30 +05:30
sabaimran	a1211f40d7	Fix type declaration for the cross_encoder_model state variable. Update name of the new update API	2023-12-21 09:15:13 +05:30
sabaimran	089e4bee12	FIx unit tests with new search model configurations	2023-12-20 21:50:44 +05:30
Debanjum Singh Solanky	447c1b90e7	Fix streaming chat response in Obsidian client - Convert renderIncrementalMessage to an async method as MarkdownRenderer is an async method - Simplify code, remove unneeded JSON check	2023-12-20 14:51:19 +05:30
sabaimran	aa23da60a3	Add a notification banner to show temporary messages	2023-12-20 14:22:08 +05:30
Debanjum Singh Solanky	e04fe921eb	Fix first-run, chat error message in obsidian, desktop and web clients - Disable chat input field if getChatHistory had error as Khoj may not be setup correctly to chat	2023-12-20 14:03:07 +05:30
sabaimran	5ff9df9d4c	Add support per user for configuring the preferred search model from the config page - Honor this setting across the relevant places where embeddings are used - Convert the VectorField object to have None for dimensions in order to make the search model easily configurable	2023-12-20 13:25:43 +05:30
sabaimran	0f6e4ff683	Add a model that specifies the user's search model configuration - Update all endpoints that generate embeddings to use the new model. Incl. generating text embeddings, creating embeddings for a search query	2023-12-20 09:22:26 +05:30
sabaimran	6dd2b05bf5	Rebase with master	2023-12-19 21:02:49 +05:30
sabaimran	e3557cd8b7	Update the personality prompt to make Khoj aware that users can share data via the desktop app	2023-12-19 16:42:45 +05:30
sabaimran	927e477f68	Ignore typing error in custom action short description	2023-12-19 16:10:58 +05:30
sabaimran	946305d977	Add function to export conversations for debugging	2023-12-19 16:05:20 +05:30
sabaimran	903a01745f	Use 0px for padding for input row buttons in web	2023-12-18 16:09:06 +05:30
sabaimran	5b092d59f4	Ignore dict assignment typing error	2023-12-17 22:34:54 +05:30
sabaimran	03cb86ee46	Update typing and object assignment for new text to image method return	2023-12-17 21:28:33 +05:30
sabaimran	0288804f2e	Render the inferred query along with the image that Khoj returns	2023-12-17 21:02:55 +05:30
sabaimran	49af2148fe	Miscellaneous improvements to image generation - Improve the prompt before sending it for image generation - Update the help message to include online, image functionality - Improve styling for the voice, trash buttons	2023-12-17 20:25:35 +05:30
sabaimran	7cb64cb2f9	Add telemetry for image generation conversation command	2023-12-17 18:25:03 +05:30
sabaimran	09544dee09	Add TextToImageModelConfig to the admin page	2023-12-17 16:44:19 +05:30
sabaimran	0459666beb	CSRF Cookie not set error in prod. Try fixing https forwarding for mitigation	2023-12-17 12:55:18 +05:30
sabaimran	61dde8ed89	If text to image config isn't set, send back an error message to the client	2023-12-17 12:54:50 +05:30
sabaimran	3065cea562	Address mypy typing issues	2023-12-16 09:24:26 +05:30
sabaimran	5f6dcf9f2e	Add a rate limiter for the transcribe API endpoint	2023-12-16 09:18:56 +05:30
sabaimran	73a107690d	Add a ConversationCommand rate limiter for the chat endpoint	2023-12-16 09:03:52 +05:30
sabaimran	9b961ed496	Merge pull request #580 from khoj-ai/fix-upgrade-chat-to-create-images Support Image Generation with Khoj	2023-12-07 21:17:58 +05:30
Debanjum Singh Solanky	7504669f2b	Fix rendering image on chat response in obsidian client	2023-12-05 03:48:07 -05:00
Debanjum Singh Solanky	408b7413e9	Use global openai client for transcribe, image	2023-12-05 03:36:33 -05:00
Debanjum Singh Solanky	162b219f2b	Throw unsupported error when server not configured for image, speech-to-text	2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky	8f2f053968	Fix rendering image on chat response in web, desktop client	2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky	d124266923	Reduce promise based nesting in chat JS func used in desktop, web client Use async/await to reduce .then() based nesting to improve code readability	2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky	6e3f66c0f1	Use base64 encoded image instead of source URL for persistence The source URL returned by OpenAI would expire soon. This would make the chat sessions contain non-accessible images/messages if using OpenaI image URL Get base64 encoded image from OpenAI and store directly in conversation logs. This resolves the image link expiring issue	2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky	52c5f4170a	Show generated images in the chat modal of the Khoj Obsidian plugin	2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky	8016a57b5e	Show generated images in chat interface on Desktop client	2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky	cc051ceb4b	Show generated images in chat interface on Web client	2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky	252b35b2f0	Support /image slash command to generate images using the chat API	2023-12-05 01:51:14 -05:00
sabaimran	ef21d78c99	Initial changes to support multiple search model configurations - All search models are loaded into memory, and stored in a dictionary indexed by name - Still need to add database migrations and create a UI for user to select their choice. Presently, it uses the default option	2023-12-05 00:35:40 -05:00
Debanjum Singh Solanky	1d9c1333f2	Configure text to image models available on server - Currently supports OpenAI text to image model, by default dall-e-3 - Allow setting the text to image model via CLI during server setup	2023-12-04 21:27:53 -05:00
Debanjum Singh Solanky	f0222f6d08	Make save_to_conversation_log helper function reusable - Move it out to conversation.utils from generate_chat_response function - Log new optional intent_type argument to capture type of response expected. This can be type responses by Khoj e.g speech, image. It can be used to render responses by Khoj appropriately on clients - Make user_message_time argument optional, set the time to now by default if not passed by calling function	2023-12-04 19:42:12 -05:00
sabaimran	d2ddbef08f	Use a unique name for the temp PDF generated	2023-12-04 19:27:00 -05:00
sabaimran	d20746613a	Properly filter out empty PDFs for indexing	2023-12-04 16:15:17 -05:00
Debanjum Singh Solanky	316b7d471a	Handle offline chat model retrieval when no internet Offline chat shouldn't fail on retrieve_model when no internet, if model was previously downloaded and usable offline	2023-12-04 13:46:25 -05:00
Debanjum Singh Solanky	2b09caa237	Make online results an optional argument to the gpt converse method	2023-12-04 12:15:29 -05:00
Debanjum Singh Solanky	7009793170	Migrate to OpenAI Python library >= 1.0	2023-12-03 18:16:00 -05:00
sabaimran	cc064ea57d	Fix circular import issue	2023-12-03 17:46:44 -05:00
sabaimran	21f8d63e89	If a user subscribes to Khoj with an email address that's not present in the DB, create an account	2023-12-03 17:28:40 -05:00
sabaimran	c5d297a9ed	Recursively search through folders for indexing	2023-12-03 16:17:28 -05:00
Debanjum Singh Solanky	a57d529f39	Fix path to system tray icon of Khoj desktop app	2023-12-03 00:12:50 -08:00
Debanjum Singh Solanky	106cdbe455	Release Khoj version 1.1.0	2023-11-30 20:09:08 -08:00
Debanjum Singh Solanky	10ce4ee11c	Ignore null params type check for markdown renderer in Obsidian client	2023-11-30 20:09:08 -08:00
sabaimran	a5ffa2342f	Add documentation for local setup and fix admin panel bugs - Wasn't able to login to the admin panel when KHOJ_DEBUG was not True. Fix this error so self-hosted users can get unblocked from accessing the admin settings - Don't force users to set their KHOJ_DJANGO_SECRET_KEY	2023-11-30 17:55:27 -08:00
Debanjum Singh Solanky	d587632700	Clear result before render thinking placeholder emoji in Obsidian chat	2023-11-30 13:53:09 -08:00
Debanjum Singh Solanky	48719ee0dd	Render newline separation in chat references to improve readability	2023-11-30 13:16:48 -08:00
Debanjum Singh Solanky	1a31a2efcf	Render Khoj chat streaming response as md & show refs in Obsidian - Use new style references for Khoj chat modal in Obsidian - Khoj Chat responses in Obsidian had regressed to not show references for new questions after modal has been opened. Now even those are rendered, and use new references style - Render chat response as markdown while it's being streamed	2023-11-30 13:02:00 -08:00
Debanjum Singh Solanky	0430fa67b6	Show temporary status message when copied to clipboard	2023-11-29 13:49:33 -08:00
Debanjum Singh Solanky	491a1a949a	Render chat responses as markdown in Desktop client too	2023-11-29 13:49:33 -08:00
Debanjum Singh Solanky	20ef5bfc93	Properly stop mediaRecorder stream to clear microphone in-use state	2023-11-29 13:48:35 -08:00
Debanjum Singh Solanky	8faa63c3c6	Convert config page buttons to use stronger yellow	2023-11-28 19:55:43 -08:00
Debanjum Singh Solanky	a6ca2076d5	Open link to Khoj app landing page from nav pane in current tab	2023-11-28 14:20:37 -08:00
Debanjum Singh Solanky	643e018947	Handle if user subscription field doesn't exists in telemetry func Avoid null ref in the method when running Khoj server in anon mode	2023-11-28 14:15:14 -08:00
Debanjum Singh Solanky	110d7646fc	Use milder yellow as primary Khoj theme color for chat, buttons etc.	2023-11-28 14:15:14 -08:00
sabaimran	18254850ab	Set a default value for the khoj django secret key and add additional guidance for setting environment variables on first run	2023-11-28 09:39:44 -08:00
sabaimran	6290b463f5	Compute size of the indexed data only if explicitly requested to avoid heavy load on the DB	2023-11-27 12:05:00 -08:00
sabaimran	eb5e3096e0	Change subscribed scope to premium	2023-11-27 11:39:20 -08:00
sabaimran	6e1ba11e59	Resolve merge conflicts for rendering chat response	2023-11-27 11:33:13 -08:00
Debanjum Singh Solanky	71f2d54258	Render chat response as markdown while streaming on Web, Desktop clients	2023-11-26 20:27:10 -08:00
Debanjum Singh Solanky	9e714d032b	Fix Khoj telemetry server. Add server_version column	2023-11-26 15:05:43 -08:00
Debanjum Singh Solanky	b249bbb5b5	Limit max audio file size allowed for transcription on API endpoint	2023-11-26 14:19:46 -08:00
Debanjum Singh Solanky	a79604b601	Fix return types of offline, online transcribe methods for python 3.9	2023-11-26 06:26:34 -08:00
Debanjum Singh Solanky	06f99ceb3c	Rename /api/speak API endpoint to /api/transcribe	2023-11-26 06:18:44 -08:00
Debanjum Singh Solanky	56a1a61c77	Remove unused button element retrieval code from web, desktop	2023-11-26 06:17:56 -08:00
Debanjum Singh Solanky	877532a167	Speak to Khoj from the Obsidian client - Add transcription button with mic icon - Collect audio recording on pressing mic - Process and send audio recording to server for transcription - Extract the functionality to flash status in chat input for reuse	2023-11-26 06:17:54 -08:00
Debanjum Singh Solanky	cc9eae5d18	Update default chat model to Mistral in GPT4AllProcessor config	2023-11-26 05:55:43 -08:00
Debanjum Singh Solanky	4636390f7f	Transcribe speech to text offline with Whisper - Allow server admin to configure offline speech to text model during initialization - Use offline speech to text model to transcribe audio from clients - Set offline whisper as default speech to text model as no setup api key reqd	2023-11-26 05:55:11 -08:00
Debanjum Singh Solanky	a0a7ab7ec8	Rename conversation.gpt4all package to conversation.offline	2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky	499adf86a0	Move transcription using OpenAI API into independent package	2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky	897170ab15	Use single db migration script for transcribe model, related updates	2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky	28090216f6	Show transcription error status in chatInput placeholder on web, desktop - Extract flashing status message in chat input placeholder into reusable function - Use emoji prefixes for status messages - Improve alt text of transcribe button to indicate what the button does	2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky	fc040825b2	Default to Offline chat with Mistral as minimal setup, no API key reqd.	2023-11-26 01:07:20 -08:00
Debanjum Singh Solanky	5a6547677c	Add type of operation variable in latest migration	2023-11-26 00:38:52 -08:00
Debanjum Singh Solanky	3e252036c3	Remove whitespace: pre-line from chat html, since markdown rendering	2023-11-26 00:27:29 -08:00
Debanjum Singh Solanky	b484795b8e	Merge branch 'master' into add-speak-to-chat - Conflicts: - src/interface/desktop/chat.html Combine and use common class names for speak component - src/khoj/database/adapters/__init__.py Combine imports - src/khoj/interface/web/chat.html Combine and use common class names for speak component - src/khoj/routers/api.py Combine imports	2023-11-26 00:26:21 -08:00
sabaimran	6233a957b4	Merge branch 'master' of github.com:khoj-ai/khoj into features/enforce-subscription-status	2023-11-25 22:46:10 -08:00
sabaimran	52b88de7f4	Indicate in the desktop if the user gets rate limited for indexing	2023-11-25 22:31:23 -08:00
Debanjum	e0a59cff68	Delete Conversation History from Web, Desktop, Obsidian Clients (#551 ) Add delete button to clear conversation history from Web, Desktop and Obsidian Khoj clients Resolves #523	2023-11-25 22:24:12 -08:00
Debanjum Singh Solanky	d0e294d8a5	Clear Conversation History from the Obsidian client - Fix font color for Khoj chat responses in Obsidian. Previous color had too low a contrast to be readable	2023-11-25 22:16:13 -08:00
sabaimran	b2afbaa315	Add support for rate limiting the amount of data indexed - Add a dependency on the indexer API endpoint that rounds up the amount of data indexed and uses that to determine whether the next set of data should be processed - Delete any files that are being removed for adminstering the calculation - Show current amount of data indexed in the config page	2023-11-25 20:28:04 -08:00
Debanjum Singh Solanky	07bf365c7c	Clear any network connections to khoj server via khoj.el on reindex - Ignore errors in deleting network requests to khoj server - Also delete open network connection to khoj server on auto reindex Otherwise when server is unreachable a bunch of failed network connections accrue in the processes list	2023-11-25 20:19:41 -08:00
sabaimran	dd1badae81	Use userwithtoken.user when authenticating with an API key	2023-11-24 22:18:45 -08:00
sabaimran	48b9116195	Fix to use user rather than user_with_token in authenticated credentials	2023-11-24 22:18:00 -08:00
sabaimran	771f9bcfa1	If the user subscription was created over 7 days ago, then their trial is expired	2023-11-24 22:08:32 -08:00
sabaimran	e5b1350523	Enforce API use limits depending on whether the server has billing enabled and whether the given user is subscribed	2023-11-24 21:55:16 -08:00
sabaimran	9c868ee10b	Use the state.billing_enabled field to determine whether to use the subscribed scope	2023-11-24 20:41:19 -08:00
sabaimran	69c8f45830	Use scopes to represent whether the use has a valid subscription in the middleware	2023-11-24 20:29:36 -08:00
Debanjum	25f3f2367e	Handle Server Unavailable Error from Khoj.el (#568 ) - Make auto-update of content index user configurable from khoj.el - Handle server unavailable error on auto-index schedule job in khoj.el Resolves #567	2023-11-24 16:46:07 -08:00
Debanjum Singh Solanky	138f4e3f3c	Make auto-update of content index user configurable from khoj.el	2023-11-24 16:40:50 -08:00
Debanjum Singh Solanky	0885fc6c23	Handle server unavailable error on auto-index schedule job in khoj.el	2023-11-24 16:39:44 -08:00
sabaimran	c13953311a	Add reflective questions to admin pages	2023-11-23 14:01:05 -08:00
sabaimran	c42ec32a95	Merge pull request #552 from khoj-ai/features/internet-enabled-search Support internet-enabled, online searching using Serper.dev	2023-11-23 12:34:05 -08:00
sabaimran	c641b8df58	Update desktop package version	2023-11-22 17:54:53 -08:00
sabaimran	a1b2289074	Release Khoj version 1.0.1	2023-11-22 17:52:07 -08:00
sabaimran	b1b037f0ea	Fix URL configuration issues with reorganized subfolders	2023-11-22 17:03:33 -08:00
sabaimran	e0949e232b	Import random in adapters file for selecting reflective question	2023-11-22 07:52:51 -08:00
sabaimran	256e8de40a	Merge with features/internet-enabled-search	2023-11-22 07:25:24 -08:00
Debanjum Singh Solanky	fd60db766e	Clear Conversation History from the Web Client	2023-11-22 03:35:00 -08:00
Debanjum Singh Solanky	d5a4830761	Clear Conversation History from the Desktop Client	2023-11-22 03:35:00 -08:00
Debanjum Singh Solanky	3096544cf2	Create API endpoint to clear user's chat history	2023-11-22 03:34:59 -08:00
Debanjum Singh Solanky	63675b3299	Speak to Khoj from the Desktop client - Use icons to style speech to text recording state	2023-11-22 02:47:17 -08:00
Debanjum Singh Solanky	2951fc92d7	Speak to Khoj from the Web client - Use icons to style speech to text recording state	2023-11-22 02:47:17 -08:00
Debanjum Singh Solanky	cc77bc4076	Create speech to text API endpoint. Use OpenAI whisper for ASR - Wrap audio transcription in try/catch and delete audio file after processing - Use configured speech to text model, else handle error	2023-11-22 02:47:06 -08:00
Debanjum Singh Solanky	1ca99b6eb0	Add speech to text model configuration to Database	2023-11-22 02:24:31 -08:00
sabaimran	c652a7fd2d	Move text_to_entries under the new content folder	2023-11-21 22:25:17 -08:00
sabaimran	1e2af083f0	Rename the data_sources module to content	2023-11-21 22:11:32 -08:00
sabaimran	4cb28aeffb	Resolve merge conflicts with master	2023-11-21 22:07:41 -08:00
Debanjum Singh Solanky	4cdfe8fc4f	Re-enable Khoj Obsidian plugin for Mobile, as Khoj cloud is available	2023-11-21 16:33:48 -08:00
Debanjum	5d9d50157e	Clean Logs, Improve Message Rendering and Make Khoj Trusted Host Configurable (#561 ) - Append chat message to chat logs as TextNodes in web, desktop clients - Simplify Code to Identify Files from Github, Notion on Web, Desktop Client - Use file source to find entries from github, notion on web, desktop client - Pass file source to clients via text search API response - Make Django Logs Follow Khoj Log Format, Verbosity - Handle image search setup related warning - Format Django initializing outputs using Khoj logger format - Use `KHOJ_HOST` env var to set allowed/trusted domains to host Khoj	2023-11-21 15:14:34 -08:00
Debanjum Singh Solanky	9e736d4340	Use KHOJ_DOMAIN for CORS allow_origins list as well - Default to app.khoj.dev - Remove unnecesary any_path regex in allow_origins. It only cares about host, paths are not set in origin header	2023-11-21 14:02:04 -08:00
sabaimran	5469e81a87	Use full path for the static directory in FastAPI and reflect deeper nesting of the django app	2023-11-21 13:44:45 -08:00
sabaimran	d199c4c35f	Resovle merge conflicts with matser	2023-11-21 13:35:56 -08:00
Debanjum Singh Solanky	76d041f633	Use KHOJ_HOST env var to set allowed/trusted domains to host Khoj Allows hosting Khoj behind other, non "khoj.dev" domains	2023-11-21 13:11:45 -08:00
Debanjum Singh Solanky	90d463c12a	Append chat message to chat logs as TextNodes in web, desktop clients	2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky	befcbcdd5d	Use file source to find entries from github, notion on web, desktop client This is a more robust mechanism of identification than via file name including github or notion domain names	2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky	3f0de45ec6	Pass file source to clients via text search API response Source of entry stored in DB is now passed to clients for processing	2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky	4aec581306	Handle image search setup related warning Ideally should rename model_directory to config_directory or some such but the current image search code will need to be migrated soon. So changing the variable name and creating a migration script for old khoj.yml files using model-directory variable isn't worth it Remove the explicity set of number of threads to use by pytorch. Use the default used by it.	2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky	b06628ee31	Format Django initializing outputs using Khoj logger format - Collect STDOUT from the `migrate', `collectstatic' commands and output using the Khoj logger format and verbosity settings - Only show Django `collectstatic' command output in verbose mode - Fix showing the Initializing Khoj log line by moving it after logger level set	2023-11-21 13:10:50 -08:00
sabaimran	341abf03ff	Handle none for search_type and use equals comparator rather than in for determining Notion type	2023-11-21 12:55:09 -08:00
sabaimran	2bb989e9d8	Resolve merge conflicts and fix some import ordering	2023-11-21 12:30:43 -08:00
sabaimran	244b76ffed	Add isort for automatic import sorting and skip main.py because it's a drama queen 👑	2023-11-21 12:20:41 -08:00
Debanjum	8a0d92e2d7	Fix Connectivity Check in Obsidian Client (#559 ) from dtkav/bugfix-local-connectivity-check Check connection to Khoj server for self-hosted server. This check had regressed during the cloud rearchitecture	2023-11-21 12:05:16 -08:00
sabaimran	0e6f09b241	Merge pull request #562 from khoj-ai/fix/pypi-package-app-not-included Fix PyPi package app reference issue	2023-11-21 11:54:46 -08:00
sabaimran	333cb3445c	Use colon rather than equals to indicate typing	2023-11-21 11:28:51 -08:00
Debanjum Singh Solanky	645fd96634	Search across all content types from Khoj Obsidian client Previously it was only searching for PDF and Markdown files. This was meant to show only content from current vault as results. But it has not scaled well as other clients also allow syncing PDF and markdown files now. So remove this content type filter for now. A proper solution would limit by using file/dir filters on server or client side.	2023-11-21 11:19:33 -08:00
sabaimran	a1460a5bf9	Set operations to typed empty list in migration file	2023-11-21 11:14:40 -08:00
sabaimran	71e794c26f	Remove the sys.append line in the main.py file, as it's not required	2023-11-21 10:57:21 -08:00
sabaimran	a474c31e02	Move the django app into the src/khoj folder for better organization and functionality - Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder - Update associated file paths and system references	2023-11-21 10:56:04 -08:00
Debanjum Singh Solanky	c89bd49973	Fix ranking search results on Obsidian It's reversed since score of entries is now a distance metric on Khoj server. So lesser distance is better. Previously higher score was better	2023-11-21 01:24:59 -08:00
Daniel Grossmann-Kavanagh	f142999bce	fix khoj local server usage	2023-11-20 17:07:30 -08:00
Debanjum Singh Solanky	c07401cf76	Fix, Improve chat config via CLI on first run by using defaults - Fix setting prompt size for online chat - generally improve chat config via cli by using default chat model, prompt size for online and offline chat	2023-11-20 17:01:20 -08:00
sabaimran	b142de15a8	Merge branch 'features/internet-enabled-search' of github.com:khoj-ai/khoj into features/reflective-suggested-questions	2023-11-20 15:56:09 -08:00
sabaimran	a9623ef85a	Add requisite imports in order to instantiate offline model in adapters file	2023-11-20 15:27:42 -08:00
sabaimran	a8f13f334f	Fix merging issues with base after popping the stash	2023-11-20 15:22:50 -08:00
sabaimran	8fa0b69c67	Resolve merge issue with adapters methods	2023-11-20 15:21:06 -08:00
sabaimran	fee99779bf	Add subqueries for internet-connected search results and update client-side code accordingly - Add a wrapper method to help make direct queries to the LLM and determine any intermediate responses needed for handling the request	2023-11-20 15:19:15 -08:00
Debanjum Singh Solanky	d61b0dd55c	Add Khoj Django app package to sys path to load Django module via pip install	2023-11-20 14:55:00 -08:00
sabaimran	b8e6883a81	Merge branch 'master' of github.com:khoj-ai/khoj into features/internet-enabled-search	2023-11-19 16:20:08 -08:00
sabaimran	237195e20e	Make all name-related fields nullable within the GoogleUser	2023-11-19 14:22:32 -08:00
Debanjum	71799add0b	Index Parent Headings of Org-Mode Entries to Improve Search Context (#548 ) ### Overview The parent hierarchy of org-mode entries can store important context. This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index ### Details - Test search uses ancestor headings as context for improved results - Add ancestor headings of each org-mode entry to their compiled form - Track ancestor headings for each org-mode entry in org-node parser Resolves #85	2023-11-19 13:18:19 -08:00
sabaimran	ef5e9d66c1	Resolve merge conflicts in dependency imports	2023-11-19 11:42:20 -08:00
Debanjum Singh Solanky	c3465d6982	Release Khoj version 1.0.0	2023-11-19 09:50:25 -08:00
Debanjum	736744be3a	Update documentation to reflect new multi-user config scenario (#550 ) - Update docs to show how to use Khoj Cloud - Move self-hosting Khoj to separate section - Add page to setup Desktop app - Set default URL to Khoj Cloud URL in Obsidian, Emacs clients	2023-11-18 18:22:46 -08:00
Debanjum Singh Solanky	e1bf1f0e86	Update default Khoj server URL to Khoj cloud on Emacs, Obsidian clients	2023-11-18 16:25:45 -08:00
Debanjum Singh Solanky	8775ce730a	Use URL fragments to allow jumping to config page sections on Web app	2023-11-18 16:25:45 -08:00
sabaimran	f792b1e301	Remove already defined identical function	2023-11-18 14:08:50 -08:00
sabaimran	e2fff5dc47	Don't explicitly use value to get the model type value	2023-11-18 14:01:01 -08:00
sabaimran	a8a25ceac2	Honor user's chat settings when running the extract questions phase - Add marginally better error handling when GPT gives a messed up respones to the extract questions method - Remove debug log lines	2023-11-18 13:31:51 -08:00
sabaimran	67156e6aec	Add new logs for debugging issues with chat references	2023-11-18 12:10:50 -08:00
sabaimran	5de2ab6098	Change parse_obj calls to use model_validate per new pydantic specification	2023-11-18 12:10:36 -08:00
sabaimran	6d249645a6	Fix interpretation of the default search type	2023-11-18 00:04:18 -08:00
sabaimran	f180b2ba94	Resolve mypy errors for various data types	2023-11-17 23:26:15 -08:00
sabaimran	3328a41f08	Update types of base config models for pydantic 2.0	2023-11-17 23:08:52 -08:00
sabaimran	f688529150	Update the default configuration for the AppConfig	2023-11-17 19:26:31 -08:00
sabaimran	11ccb92755	Fix formatting of welcome message to use markdown	2023-11-17 18:55:59 -08:00
Debanjum Singh Solanky	ca87b4ede9	Wrap common API query parameters into shared class to deduplicate code - Upgrade FastAPI to >= latest version. Required upgrade of FastAPI. Earlier version didn't support wrapping common query params in class - Use per fixture app instead of a global FastAPI app in conftest - Upgrade minimum required Django version - Fix no notes chat director test with updated no notes message No notes message was updated in commit `118f1143`	2023-11-17 18:43:49 -08:00
sabaimran	262f3ccb59	Resolve mypy issues with formatting	2023-11-17 17:11:00 -08:00
sabaimran	a7e00898cb	Fix rendering even when no online context references are returned	2023-11-17 16:41:28 -08:00
sabaimran	0fcf234f07	Add support for using serper.dev for online queries - Use the knowledgeGraph, answerBox, peopleAlsoAsk and organic responses of serper.dev to provide online context for queries made with the /online command - Add it as an additional tool for doing Google searches - Render the results appropriately in the chat web window - Pass appropriate reference data down to the LLM	2023-11-17 16:19:11 -08:00
Debanjum Singh Solanky	55785d50c3	Use title, when present, as root ancestor of entries instead of file path	2023-11-17 15:03:27 -08:00
sabaimran	bfbe273ffd	Add some styling to the copy button for programmatic output	2023-11-17 12:18:35 -08:00
sabaimran	9ddf3b58c3	Use the markdown parser for rendering the chat messages in the web interface	2023-11-17 12:14:02 -08:00
sabaimran	a0b12b001a	Provide in-line rendering when output matches certain views	2023-11-17 11:04:36 -08:00
sabaimran	ec06d2c446	Move data indexer files into a separate folder under processor. Update assoc UTs	2023-11-16 17:19:55 -08:00
sabaimran	45a42faec8	Make adjectives more positive for api token generation	2023-11-16 15:55:35 -08:00
sabaimran	118f1143ff	When user tries using the notes slash command without having any data indexed	2023-11-16 12:52:39 -08:00
sabaimran	e8a13f0813	Add multi-user support to Khoj and use Postgres for backend storage (#549 ) - Adds support for multiple users to be connected to the same Khoj instance using their Google login credentials - Moves storage solution from in-memory json data to a Postgres db. This stores all relevant information, including accounts, embeddings, chat history, server side chat configuration - Adds the concept of a Khoj server admin for configuring instance-wide settings regarding search model, and chat configuration - Miscellaneous updates and fixes to the UX, including chat references, colors, and an updated config page - Adds billing to allow users to subscribe to the cloud service easily - Adds a separate GitHub action for building the dockerized production (tag `prod`) and dev (tag `dev`) images, separate from the image used for local building. The production image uses `gunicorn` with multiple workers to run the server. - Updates all clients (Obsidian, Emacs, Desktop) to follow the client/server architecture. The server no longer reads from the file system at all; it only accepts data via the indexer API. In line with that, removes the functionality to configure org, markdown, plaintext, or other file-specific settings in the server. Only leaves GitHub and Notion for server-side configuration. - Changes license to GNU AGPLv3 Resolves #467 Resolves #488 Resolves #303 Resolves #345 Resolves #195 Resolves #280 Resolves #461 Closes #259 Resolves #351 Resolves #301 Resolves #296	2023-11-16 11:48:01 -08:00
Debanjum Singh Solanky	74403e3536	Add ancestor headings of each org-mode entry to their compiled form Resolves #85	2023-11-16 02:54:41 -08:00
Debanjum Singh Solanky	305c25ae1a	Track ancestor headings for each org-mode entry in org-node parser	2023-11-16 02:39:14 -08:00
Debanjum Singh Solanky	cc05013715	Update first run message on Web app with Chat models setup instructions - Link to Django admin panel for user to create Chat Models on their Khoj server - This should only get hit when user is not using Khoj cloud, as Khoj cloud would already have Chat models configured	2023-11-15 22:44:24 -08:00
Debanjum Singh Solanky	6c1693b8f4	Update first run message on Desktop app with API token setup instructions - Open Web app settings in the default browser via link click - Open Desktop app settings via link click	2023-11-15 22:44:11 -08:00
Debanjum Singh Solanky	922983bd53	Set max cos distance to 0.18. Test search API query with max distance	2023-11-15 20:26:21 -08:00
Debanjum Singh Solanky	18dbad5edb	Use Sigmoid to normalize cross-encoder score between 0-1 - While sigmoid normalization isn't required for reranking. Normalizing score to distance metrics for both encoder and cross encoder scores is useful to reason about them - Softmax wasn't required as don't need probabilities, sigmoid is good enough to get distance metric	2023-11-15 19:31:59 -08:00
sabaimran	ea144de438	Merge with master	2023-11-15 18:34:46 -08:00
Debanjum Singh Solanky	348cc0cf0e	Use better name for DB adapter func to create user by Google token	2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky	08a057bdd5	Rename SearchModel to SearchModelConfig DB model, Require Cross-Encoder	2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky	0679b2a7bd	Use embeddings model store from state in text to entries Do not need to instantiating it separately. In all other places we're using the embeddings model store in global state anyway	2023-11-15 17:31:50 -08:00
sabaimran	245a9cbf63	Fix return type of the update_or_create method	2023-11-15 17:31:50 -08:00
sabaimran	bbae7dd83c	Update logic for creating a new user to use aupdate_or_create	2023-11-15 17:31:50 -08:00
sabaimran	8e62af77b9	Update format for return type of the generate token mehtod	2023-11-15 17:03:01 -08:00
sabaimran	4a487aff23	Fix return type of the update_or_create method	2023-11-15 14:35:42 -08:00
sabaimran	b63856ecb4	Update logic for creating a new user to use aupdate_or_create	2023-11-15 12:50:39 -08:00
sabaimran	b8e7488a95	Use a more permissive distance filter for search results from notes	2023-11-15 11:13:47 -08:00
sabaimran	05b7542115	Remove config lock from the state	2023-11-15 10:44:45 -08:00
sabaimran	ecd005cac0	Check if search model is already in DB before creating a new one	2023-11-15 10:41:35 -08:00
Debanjum Singh Solanky	9c6e7bdea2	Upgrade server, desktop app dependencies to resolve CVE bugs	2023-11-15 01:47:53 -08:00
Debanjum Singh Solanky	8f200cf53f	Remove unused parameter from configure_search_type method	2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky	f8e5e118e1	Only create KhojUser on login if doesn't already exist	2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky	3d8d6145f2	Add search model config from khoj.yml to Postgres DB via migration script	2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky	4af194d74b	Make search model configurable on server - Expose ability to modify search model via Django admin interface - Previously the bi_encoder and cross_encoder models to use were set in code - Now it's user configurable but with a default config generated by default	2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky	e98141f4c3	Subscribe default user to standard plan with a far away renewal date Self hosted users in anonymous mode have all capabilities unlocked	2023-11-14 16:31:39 -08:00
Debanjum Singh Solanky	9d30fda26d	Deduplicate, improve name of prompt templates for GPT4All chat models - Do not pass unused rerank_results parameter to text_search.query method	2023-11-14 16:31:09 -08:00
Debanjum Singh Solanky	795ec9eb55	Add KHOJ_prefix to server admin credentials environment variables	2023-11-14 16:13:13 -08:00
sabaimran	ee005de662	Rename django files URL to server instead of django	2023-11-14 12:36:38 -08:00
sabaimran	20ce3d0c78	Update default docker compose configuration with Khoj local mode	2023-11-14 12:21:26 -08:00
sabaimran	8c36079f74	Add a first run experience to intialize the admin user if none exists and setup chat models	2023-11-13 21:07:12 -08:00
Debanjum Singh Solanky	e9adb58c16	Rate limit calls to the /chat API per user, per day/minute	2023-11-13 19:41:46 -08:00
Debanjum Singh Solanky	33a8eb0470	Log when new user is created	2023-11-13 19:37:24 -08:00
sabaimran	603f838115	Block input text field when waiting for chat response	2023-11-11 17:14:37 -08:00
Debanjum Singh Solanky	9c321ac070	Fix cross encoder to use softmax to convert it to a distance metric	2023-11-11 16:12:24 -08:00
sabaimran	8a824167cf	Merge branch 'fix/imports-and-references' of github.com:khoj-ai/khoj into fix/imports-and-references	2023-11-11 12:59:31 -08:00
sabaimran	fa428932a8	Update URL for downloading the desktop application	2023-11-11 12:59:15 -08:00
Debanjum Singh Solanky	941c7f23a3	Only get text search results above confidence threshold via API - During the migration, the confidence score stopped being used. It was being passed down from API to some point and went unused - Remove score thresholding for images as image search confidence score different from text search model distance score - Default score threshold of 0.15 is experimentally determined by manually looking at search results vs distance for a few queries - Use distance instead of confidence as metric for search result quality Previously we'd moved text search to a distance metric from a confidence score. Now convert even cross encoder, image search scores to distance metric for consistent results sorting	2023-11-11 04:11:33 -08:00
Debanjum Singh Solanky	e44e6df221	Reduce data dumped in console log from web, desktop app	2023-11-11 02:05:07 -08:00
Debanjum Singh Solanky	f044a89d50	Show status in Save, Reinitialize button of config page on web app - Show non-transient error message in status element if action fails - On success, just show temporary success message within button	2023-11-11 02:04:58 -08:00
Debanjum Singh Solanky	f17d9da36c	Move Configure, Reinitialize buttons into the Content section on Web app Remove the Results Count button from the web app. It's hanging weirdly with not much context to its purpose. Reintroduce it in the Search card when created under the Features section	2023-11-11 02:01:39 -08:00
Debanjum Singh Solanky	325cb0f7fb	Show message in Save button of Github, Notion config save in web app Show the success, failure message only temporarily. Previously it stuck around after clicking save until page refresh	2023-11-11 02:01:39 -08:00
Debanjum Singh Solanky	b34d4fa741	Save config, update index on save of Github, Notion config in web app Reduce user confusion by joining config update with index updation for each content type. So only a single click required to configure any content type instead of two clicks on two separate pages	2023-11-11 00:33:49 -08:00
Debanjum Singh Solanky	c4364b9100	Weaken asking follow-up qs and q&a mode in notes prompt to OpenAI models - Notes prompt doesn't need to be so tuned to question answering. User could just want to talk about life. The notes need to be used to response to those, not necessarily only retrieve answers from notes - System and notes prompts were forcing asking follow-up questions a little too much. Reduce strength of follow-up question asking	2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky	cba371678d	Stop OpenAI chat from emitting reference notes directly in chat body The Chat models sometime output reference notes directly in the chat body in unformatted form, specifically as Notes:\n['. Prevent that. Reference notes are shown in clean, formatted form anyway	2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky	8585976f37	Revert "Use notes in system prompt, rather than in the user message" This reverts commit `e695b9ab8c`.	2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky	b6441683c6	Increase reference text on 1st expansion to 3 lines and 140 characters	2023-11-10 23:36:43 -08:00
sabaimran	55c97241b5	Merge branch 'fix/imports-and-references' of github.com:khoj-ai/khoj into fix/imports-and-references	2023-11-10 22:38:34 -08:00
sabaimran	e2e96f9aa4	Add default settings to let new users be subscribed on trial - Add the default user to a subscription trial - Update associated unit tests	2023-11-10 22:38:28 -08:00
Debanjum Singh Solanky	501e7606a0	Increase reference text on 1st expansion to 3 lines and 140 characters	2023-11-10 21:27:04 -08:00
sabaimran	0a950d9382	Fix checker to determine if obsidian client is connected	2023-11-10 19:21:58 -08:00
sabaimran	c736604366	Merge with remote	2023-11-10 17:50:15 -08:00
sabaimran	b0b07bde6c	Allow chat reference to expand enough to show the whole reference, rather than constraining the height	2023-11-10 17:49:20 -08:00
sabaimran	14f8c151c8	Fix return type of the generate_chat_response method	2023-11-10 17:48:54 -08:00
Debanjum Singh Solanky	45b8670c25	Fix return type hint for generate_chat_response func	2023-11-10 17:34:19 -08:00
Debanjum Singh Solanky	9b6c5ddba4	Update action row padding in cards on config page of web app	2023-11-10 16:53:25 -08:00
sabaimran	54d4fd0e08	Add chat_model data for logging selected models to telemetry	2023-11-10 16:46:34 -08:00
sabaimran	e695b9ab8c	Use notes in system prompt, rather than in the user message	2023-11-10 15:09:33 -08:00
sabaimran	cec932d88a	Update prompt so that GPT is more context aware with its capabilities	2023-11-10 14:37:11 -08:00
sabaimran	e62788ad79	Await result for determining if user has entries	2023-11-10 13:51:56 -08:00
sabaimran	1a56344f12	Remove the old syncData reference as it no longer exists	2023-11-10 10:10:07 -08:00
Debanjum Singh Solanky	39ad1c6ce6	Release Khoj version 0.14.0 Fix Khoj subtitle in manifest of Khoj Obsidian plugin	2023-11-10 00:28:33 -08:00
Debanjum Singh Solanky	745d6bfeed	Add detailed intro message, mention download desktop app for docs sync	2023-11-10 00:20:28 -08:00
Debanjum Singh Solanky	6eb7df717c	Only show search in web app nav pane if user has documents indexed	2023-11-09 19:14:54 -08:00
Debanjum Singh Solanky	c0789dc57b	Use email to get_user_subscription from DB and other DB adapters - Needing user subscription requires chaining function - Simplify get_file_sources DB adapter	2023-11-09 19:09:57 -08:00
Debanjum Singh Solanky	841ed95521	Move active user profile halo check into nav pane macro on web app	2023-11-09 18:05:19 -08:00
Debanjum Singh Solanky	ddac693762	Hide download desktop app message in web app if synced files exist	2023-11-09 17:47:00 -08:00
Debanjum Singh Solanky	30a9674f25	Mark generated profile pic with subscription circle in web app	2023-11-09 15:22:38 -08:00
Debanjum Singh Solanky	d6e6ed1cfa	Keep single Save button, Show next sync, default to prod Khoj URL in Desktop app - Make mutable syncing variable not a const - Show next sync time to make users aware of data sync is automated - Keep a single Save button to reduce confusion. It does what Save All previously did. Intent to manual sync should Save All - Default to using app.khoj.dev as default Khoj URL to ease setup	2023-11-09 14:04:58 -08:00
Debanjum Singh Solanky	e1f0128576	Change config migration script to update to 0.15.0 version Next release, 0.14.0 wouldn't contain the migration to Postgres	2023-11-09 12:21:58 -08:00
Debanjum Singh Solanky	17cbbb0b01	Use Consistent Environment Variable for KHOJ_DEBUG	2023-11-09 11:01:28 -08:00
Debanjum Singh Solanky	391db80499	Improve subscribed user profile pictures and nav pane selection - Add yellow halo around subscribed user profile - Fix highlighting current page in header nav pane	2023-11-09 00:57:05 -08:00
Debanjum Singh Solanky	605058c72a	Allow null user profile picture from Google OAuth in DB - Fix width of generated profile picture generated for user - Ignore unused Stripe webhook events	2023-11-09 00:46:59 -08:00
Debanjum Singh Solanky	a2609973b8	Disable Subscription if Stripe environment not setup Deduplicate DJANGO_SECRET_KEY and KHOJ_DJANGO_SECRET_KEY to latter name as prefixed with KHOJ as KHOJ app specific	2023-11-08 19:39:32 -08:00
Debanjum Singh Solanky	09e1235832	Auto update billing card UI on (re/un-)subscribe click on web app Previously required a page load to see the updated billing state after clicking resubscribe or unsubscribe buttons	2023-11-08 18:38:12 -08:00
Debanjum Singh Solanky	8b8bb15866	Keep sync state in memory, initialized to false in Desktop app Prevent deadlock if desktop app killed in middle of syncing	2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky	c043eb54ae	Use typed entry source instead of raw str to map source to conf in api.py	2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky	8178004e6d	Move Subscription data into separate table in DB. Merge migrations	2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky	3bb10128ef	Move subscription API to separate, independent router	2023-11-08 16:20:27 -08:00
Debanjum Singh Solanky	ec1395d072	Clean, merge subscription update events, API and functions - Reduce webhook triggers for subscription updates - Merge subscription update API endpoint, functions for (re/un-)subscribe	2023-11-08 15:55:20 -08:00
Debanjum Singh Solanky	ef5c13f968	Keep user subscription state. Update it when user has unsubscribed	2023-11-08 12:08:36 -08:00
Debanjum Singh Solanky	c52affc6d9	Get Khoj Cloud Subscription URL via environment variable	2023-11-08 12:07:53 -08:00
sabaimran	609d358b1a	Use sql datetime comparison for detecting validity of subscription renewal date - Update the unsubscribe endpoint to use query params - Use subscription id to process unsubscribe endpoint, rather than the customer id	2023-11-07 19:17:36 -08:00
sabaimran	98cf095b65	Fix bug for rendering chat references in LLM response	2023-11-07 16:44:41 -08:00
sabaimran	0e1cdb6536	Add additional error handling for processing unknown Stripe events and fix typo in STRIPE_SIGNING env variable	2023-11-07 16:43:05 -08:00
sabaimran	08c86927cb	Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into fix-improve-config-page-on-desktop-and-web-app	2023-11-07 12:46:49 -08:00
sabaimran	cec54e3a8a	Merge pull request #536 from khoj-ai/features/update-chat-ui Update the chat UI to have richer representation of the references	2023-11-07 12:34:57 -08:00
Debanjum Singh Solanky	f466751f4d	Expose card on web app config page to manage subscription to Khoj cloud	2023-11-07 10:21:00 -08:00
Debanjum Singh Solanky	9aaf475c8a	Create API webhook, endpoints for subscription payments using Stripe - Add fields to mark users as subscribed to a specific plan and subscription renewal date in DB - Add ability to unsubscribe a user using their email address - Expose webhook for stripe to callback confirming payment	2023-11-07 10:20:51 -08:00
Debanjum Singh Solanky	156421d30a	Show file type icons for each indexed file in config card of web app	2023-11-07 05:48:44 -08:00
Debanjum Singh Solanky	045c2252d6	Set content enabled status on update via config buttons on web app Previously hitting configure or disable wouldn't update the state of the content cards. It needed page refresh to see if the content was synced correctly. Now cards automatically get set to new state on hitting disable button on card or global configure buttons	2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky	7c424e0d5f	Enable deleting all indexed desktop files from Khoj via Desktop app	2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky	779fa531a5	Prevent Desktop app triggering multiple simultaneous syncs to server Lock syncing to server if a sync is already in progress. While the sync save button gets disabled while sync is in progress, the background sync job can still trigger a sync in parallel. This sync lock prevents that	2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky	404d47f1a1	Bubble up content indexing errors to notify user on client apps	2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky	6e957584ac	Create config page on web app to manage computer files indexed by Khoj Remove the table of all files indexed by Khoj. This seems overkill and doesn't match the UI semantics of the other data sources like Github, Notion. Create instead a data source card for computer files with the same update, disable semantics of the Github and Notion data source cards Users can disable each data source from its card on the main config page. They can see/delete individual files indexed from the computer data source once they click into the computer files data source card on the config page	2023-11-07 04:42:53 -08:00
Debanjum Singh Solanky	d527b644f4	Update content by source via API. Make web client use this API for config	2023-11-07 03:41:19 -08:00
Debanjum Singh Solanky	9ab327a2b6	Store the data source of each entry in database This will be useful for updating, deleting entries by their data source. Data source can be one of Computer, Github or Notion for now Store each file/entries source in database	2023-11-07 02:18:48 -08:00
Debanjum Singh Solanky	c82cd0862a	Delete deprecated content config pages for local files from web client The desktop app now manages syncing local computer files to index The server only manages "cloud" data source like github and notion.	2023-11-06 23:55:37 -08:00
Debanjum Singh Solanky	97cf8339aa	Rename Sync button, Force Sync toggle to Save, Save All buttons	2023-11-06 21:57:37 -08:00
Debanjum Singh Solanky	a08b152358	Improve log messages in text_entries and memory leak unit test	2023-11-06 19:27:31 -08:00
sabaimran	6c8689e4ae	Update corresponding chat UX in the desktop client as well	2023-11-06 16:18:41 -08:00
sabaimran	e01ecf1419	/s/references/reference to fix bug of jumping references	2023-11-06 16:12:25 -08:00
Debanjum	38f24a037d	Improve Indexing Text Entries (#535 ) Major - Ensure search results logic consistent across migration to DB, multi-user - Manually verified search results for sample queries look the same across migration - Flatten indexing code for better indexing progress tracking and code readability Minor - `a4f407f` Test memory leak on MPS device when generating vector embeddings - `ef24485` Improve Khoj with DB setup instructions in the Django app readme (for now) - `f212cc7` Arrange remaining text search tests in arrange, act, assert order - `022017d` Fix text search tests to test updated indexing log messages	2023-11-06 16:01:53 -08:00
sabaimran	270f7b3eb3	Update the chat UI to have richer representation of the references	2023-11-05 15:46:43 -08:00
sabaimran	d697d752c2	Use repeat rather than manually specify auto in grid-template-rows Co-authored-by: Debanjum <debanjum@gmail.com>	2023-11-05 15:23:42 -08:00
sabaimran	5f1e37fff0	Adjust indentation for css property	2023-11-05 14:33:23 -08:00
Debanjum Singh Solanky	a4f407f595	Test memory leak on MPS device when generating vector embeddings Slope threshold of 2.0 determined qualitatively on local Mac device Minor unused import and clean-up	2023-11-05 03:48:54 -08:00
Debanjum Singh Solanky	ef24485ada	Improve Khoj with DB setup instructions in the Django app readme (for now)	2023-11-05 02:04:52 -08:00
sabaimran	084a8becc5	Fix but to prevent default in chat trigger	2023-11-04 20:13:33 -07:00
Debanjum Singh Solanky	5489e98b9c	Do not index org heading entries by default This is to maintain the previous default behavior	2023-11-04 20:09:25 -07:00
Debanjum Singh Solanky	34b5a86d1d	Use SentenceTransformer to disable progress bar when encoding query The Langchain HuggingFaceEmbeddings wrapper doesn't support disabling progressbar, not especially for only query but not documents. This makes the logs noisy with encoding progressbar for each incremental queries No features of the Langchain wrapper for SentenceTransformer was currently being used anyway for now, and we can always switch back to it if required	2023-11-04 20:09:25 -07:00
Debanjum Singh Solanky	dc9946fc03	Flatten nested loops, improve progress reporting in text_to_jsonl indexer Flatten the nested loops to improve visibilty into indexing progress Reduce spurious logs, report the logs at aggregated level and update the logging description text to improve indexing progress reporting	2023-11-04 20:09:25 -07:00
sabaimran	88eeee3f4b	Move try/catch for import one line later	2023-11-04 19:46:47 -07:00
sabaimran	dbaa892665	Flip catching modulenotfound to import error exception	2023-11-04 19:34:10 -07:00
sabaimran	8c3d5a49da	Add try/except around image extraction step	2023-11-04 19:27:18 -07:00
sabaimran	fdfab39942	Update the config UI to show all files indexed with option to delete - Given the separation of the client and server now, the web UI will no longer support configuration of local file paths of data to index - Expose a way to show all the files that are currently set for indexing, along with an option to delete all or specific files	2023-11-04 19:03:34 -07:00
sabaimran	800bb4f458	Remove references to demo - The demo setting is no longer necessary for the time being, as we won't have anymore demo instances	2023-11-04 17:17:04 -07:00
sabaimran	b5972e9311	Use OCR to extract image text in PDFs	2023-11-04 17:15:28 -07:00
Debanjum Singh Solanky	8273bf26b7	Fix multi-line chat input and output render on web, desktop clients - Remove spurious whitespace in chat input box on page load being added because text area element was ending on newline - Do not insert newline in message when send message by hitting enter key This would be more evident when send message with cursor in the middle of the sentence, as a newline would be inserted at the cursor point - Remove chat message separator tokens from model output. Model sometimes starts to output text in it's chat format	2023-11-04 01:09:35 -07:00
Debanjum Singh Solanky	2f1756cc15	Do not use icon for each file, folder to index in desktop app. Other minor fixes based on PR feedback	2023-11-04 00:13:10 -07:00
Debanjum Singh Solanky	e8f568d79c	Make splash screen wider, opaque and fix it's spinner radius Radius should be such that final spin doesn't extend out of the circle Opaque background improves contrast for better visual	2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky	3ef05f4803	Use css var for main font color in search, chat page of desktop app	2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky	a19cbde2d7	Add About page for Khoj to Desktop app. Expose it via system tray - Pass current khoj version from package.json to about page via electron IPC between backend js and frontend page - Update Khoj information in default About screen as well, in case it's exposed anywhere else	2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky	a327294ee9	Rename khoj.js to utils.js in web and desktop client apps	2023-11-03 18:13:37 -07:00
Debanjum Singh Solanky	db57eeaefe	Console log a welcome message on loading Desktop client	2023-11-03 05:15:41 -07:00
Debanjum Singh Solanky	6fae6fb2a4	Merge branch 'features/multi-user-support-khoj' into improve-client-app-theming	2023-11-03 04:58:41 -07:00
Debanjum Singh Solanky	4cd76311ad	Slow down spinning at end of splash sequence. Make animation bigger	2023-11-03 04:28:17 -07:00
Debanjum Singh Solanky	34661c33a2	Show splash screen on starting desktop app	2023-11-03 03:19:08 -07:00
Debanjum Singh Solanky	126d3f4563	Render each file, folder to index row with icon in desktop app Make the file, folders to index look less like an editable field	2023-11-03 02:48:42 -07:00
Debanjum Singh Solanky	80ae132cad	Update Desktop, Obsidian client color theme to lighter yellow - Update background color to a different shade of white - Make primary and primary hover colors less intense and more aligned with lantern flame shade - Add water, leaf, flower color variables	2023-11-03 02:48:42 -07:00
sabaimran	fb6ebd19fc	Fix refactor bugs, CSRF token issues for use in production (#531 ) Fix refactor bugs, CSRF token issues for use in production * Add flags for samesite settings to enable django admin login * Include tzdata to dependencies to work around python package issues in linux * Use DJANGO_DEBUG flag correctly * Fix naming of entry field when creating EntryDate objects * Correctly retrieve openai config settings * Fix datefilter with embeddings name for field	2023-11-02 23:02:38 -07:00
Debanjum Singh Solanky	345856e7be	Merge branch 'master' of github.com:khoj-ai/khoj into features/multi-user-support-khoj Merge changes to use latest GPT4All with GPU, GGUF model support into khoj multi-user support rearchitecture branch	2023-11-02 22:44:25 -07:00
Debanjum Singh Solanky	041074ccd6	Make chat the landing page for the desktop app Chat, unlike search, doesn't knowledge base indexing setup. So you can get started with chat much faster.	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	3801105b2a	Make chat the landing page for the web app Chat, unlike search, doesn't knowledge base indexing setup. So you can get started with chat much faster.	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	0d4e7d46c2	Fix color and size of profile picture circle in nav pane	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	4fbe8ac6b1	Console log a welcome message on loading web client	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	9fc6c97139	Use Khoj standard font family, weight in web client settings page	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	b6f07099cd	Simplify login page styling on web client - Center all elements: icon, text and button - Use khoj icon not logo-text - Simplify login title text	2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky	7b7f6d3bc8	Update web client theme to a lighter - Update background color to a different shade of white - Make primary and primary hover colors less intense and more aligned with lantern flame shade - Add water, leaf, flower color variables	2023-11-02 20:42:21 -07:00
sabaimran	fe860aaf83	Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into features/multi-user-support-khoj	2023-11-02 14:56:01 -07:00
sabaimran	2c9496bcf1	Add additional null checks in the migrate_server_pg script	2023-11-02 14:55:58 -07:00
sabaimran	20df0f5330	Use url_path_for for creating the login page URL in the application	2023-11-02 14:55:14 -07:00
sabaimran	fd11b78552	Fix migration script error when openai not available (#530 )	2023-11-02 11:28:08 -07:00
sabaimran	fe6720fa06	[Multi-User Part 8]: Make conversation processor settings server-wide (#529 ) - Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code. - To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match. - Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings	2023-11-02 10:43:27 -07:00
Debanjum Singh Solanky	12b3eeae9e	Use Khoj fonts on config page of web and desktop apps too Previously pico.css font-families were being selected for the config page. This was different from the fonts used by index.html, chat.html This improves spacing issue of heading further	2023-11-01 17:50:50 -07:00
Debanjum Singh Solanky	022d695309	Switch to narrow view below width of 700px on web client This makes the dropdown menu align better to the profile picture in mobile view	2023-11-01 17:49:44 -07:00
Debanjum Singh Solanky	6a0adfbfbb	Default to profile picture with Initial if user has no profile picture	2023-11-01 17:49:44 -07:00
Tuan Nguyen	354605e73e	Autofocus to chat input when openning chat (#524 )	2023-11-01 16:09:45 -07:00
Debanjum Singh Solanky	d92a2d03a7	Rename Files, Classes from X_To_JSONL to more appropriate X_To_Entries These content processors are converting content into entries in DB instead of entries in JSONL file	2023-11-01 14:51:33 -07:00
Debanjum Singh Solanky	2ad2055bcb	Remove user null check in API controllers that require authentication	2023-11-01 14:38:19 -07:00
Debanjum Singh Solanky	7ac5a4766d	Match spacing of navigation header pane in config vs search/chat pages	2023-11-01 14:38:19 -07:00
Debanjum Singh Solanky	2e3a4a6a9b	Use Jinja macro to deduplicate navigation header HTML	2023-11-01 14:38:12 -07:00
Debanjum Singh Solanky	c631b61a81	Put colors shared by index, chat html into khoj css global variables	2023-11-01 02:13:24 -07:00
Debanjum Singh Solanky	f585a71744	Put logout, settings under dropdown menu with logged in user's profile picture - Create dropdown menu. Put settings page, logout action under it - Make user's profile picture the dropdown menu heading - Create khoj.js to store shared js across web client It currently stores the dropdown menu open, close functionality - Put shared styling for khoj dropdown menu under khoj.css	2023-11-01 02:13:24 -07:00
Debanjum Singh Solanky	58a7171911	Show truncated API key for identification & restrict table width - Use a function to generate API Key table row HTML, to dedup logic - Show delete, copy icon hints on hover - Reduce length of copied message to not expand table width - Truncating API key helps keep the API key table width within width of smaller width displays	2023-10-31 23:10:26 -07:00
Debanjum Singh Solanky	9cebd7f856	Add emoji icons to Search, Chat, Settings items in nav menu of Web client Emoji icons have already been added to the Search, Chat and Settings top navigation menu in the desktop client. This change adds these to the web client as well	2023-10-31 22:38:44 -07:00
Debanjum Singh Solanky	f77336ba61	Add key icon for API keys table in Web client config page	2023-10-31 19:01:09 -07:00
Debanjum Singh Solanky	87e6b1eab9	Rename TextEmbeddings to TextEntries for improved readability Improves readability as name has closer match to underlying constructs	2023-10-31 18:55:59 -07:00
Debanjum Singh Solanky	bcbee05a9e	Rename DbModels Embeddings, EmbeddingsAdapter to Entry, EntryAdapter Improves readability as name has closer match to underlying constructs - Entry is any atomic item indexed by Khoj. This can be an org-mode entry, a markdown section, a PDF or Notion page etc. - Embeddings are semantic vectors generated by the search ML model that encodes for meaning contained in an entries text. - An "Entry" contains "Embeddings" vectors but also other metadata about the entry like filename etc.	2023-10-31 18:50:54 -07:00
sabaimran	54a387326c	[Multi-User Part 6]: Address small bugs and upstream PR comments (#518 ) - `08654163cb`: Add better parsing for XML files - `f3acfac7fb`: Add a try/catch around the dateparser in order to avoid internal server errors in app - `7d43cd62c0`: Chunk embeddings generation in order to avoid large memory load - `e02d751eb3`: Addresses comments from PR #498 - `a3f393edb4`: Addresses comments from PR #503 - `66eb078286`: Addresses comments from PR #511 - Address various items in https://github.com/khoj-ai/khoj/issues/527	2023-10-31 17:59:53 -07:00
sabaimran	5f3f6b7c61	[Multi-User Part 5]: Add a production Docker file and use a gunicorn configuration with it (#514 ) - Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests - Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same	2023-10-26 13:15:31 -07:00
Debanjum	9acc722f7f	[Multi-User Part 4]: Authenticate using API Tokens (#513 ) ### ✨ New - Use API keys to authenticate from Desktop, Obsidian, Emacs clients - Create API, UI on web app config page to CRUD API Keys - Create user API keys table and functions to CRUD them in Database ### 🧪 Improve - Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality - Only load chat model to GPU if enough space, throw error on load failure - Show encoding progress, truncate headings to max chars supported - Add instruction to create db in Django DB setup Readme ### ⚙️ Fix - Fix error handling when configure offline chat via Web UI - Do not warn in anon mode about Google OAuth env vars not being set - Fix path to load static files when server started from project root	2023-10-26 12:33:03 -07:00
sabaimran	4b6ec248a6	[Multi-User Part 3]: Separate chat sesssions based on authenticated users (#511 ) - Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration. - There does _seem_ to be some regression in chat quality, which is most likely attributable to search results. This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.	2023-10-26 11:37:41 -07:00
sabaimran	a8a82d274a	[Multi-User Part 2]: Add login pages and gate access to application behind login wall (#503 ) - Make most routes conditional on authentication if anonymous mode is not enabled. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions. - Add a basic login page and add routes for redirecting the user if logged in	2023-10-26 10:17:29 -07:00
sabaimran	216acf545f	[Multi-User Part 1]: Enable storage of settings for plaintext files based on user account (#498 ) - Partition configuration for indexing local data based on user accounts - Store indexed data in an underlying postgres db using the `pgvector` extension - Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time - Apply filters using SQL queries - Start removing many server-level configuration settings - Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB. - Update the Docker image and docker-compose.yml to work with the new application design	2023-10-26 09:42:29 -07:00
Debanjum Singh Solanky	9677eae791	Expose CLI flag to disable using GPU for offline chat model - Offline chat models outputing gibberish when loaded onto some GPU. GPU support with Vulkan in GPT4All seems a bit buggy - This change mitigates the upstream issue by allowing user to manually disable using GPU for offline chat Closes #516	2023-10-25 17:51:46 -07:00
Debanjum Singh Solanky	0f1ebcae18	Upgrade to latest GPT4All. Use Mistral as default offline chat model GPT4all now supports gguf llama.cpp chat models. Latest GPT4All (+mistral) performs much at least 3x faster. On Macbook Pro at ~10s response start time vs 30s-120s earlier. Mistral is also a better chat model, although it hallucinates more than llama-2	2023-10-22 19:04:23 -07:00
sabaimran	963cd165eb	Resolve merge conflicts	2023-10-19 14:39:05 -07:00
Debanjum Singh Solanky	8346e1193c	Release Khoj version 0.13.0	2023-10-18 03:43:54 -07:00
Debanjum Singh Solanky	6631fc38db	Delete plaintext config via API. Catch any offline model loading exception	2023-10-18 03:37:45 -07:00
Debanjum Singh Solanky	53abd1a506	Mark sync completed on desktop client, even when no files to send Previously Sync spinner on desktop config screen would hang when no files to send to server & the Sync button had been manually triggered	2023-10-18 01:30:56 -07:00
Debanjum Singh Solanky	71b0012e8c	Set offline chat config to default value if unset on server load	2023-10-18 00:59:43 -07:00
Debanjum Singh Solanky	cf1cdc3fe1	Disambiguate input_filter variable names in fs_syncer functions	2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky	e3cd8b4150	Only index files returned by input-filter globs in fs_syncer Ignore .org, .pdf etc. suffixed directories under `input-filter' from being evaluated as files. Explicitly filter results by input-filter globs to only index files, not directory for each text type Add test to prevent regression Closes #448	2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky	51363d280d	Do not configure khoj server for pull based indexing from khoj.el Do not make khoj server pull update index on Obsidian plugin load. Index is updated on push from plugin instead now/	2023-10-17 21:47:19 -07:00
Debanjum Singh Solanky	d9d133dfb9	Read text files as utf-8, instead of default os locale On Windows, the default locale isn't utf8. Khoj had regressed to reading files in OS specified locale encoding, e.g cp1252, cp949 etc. It now explicitly uses utf8 encoding to read text files for indexing Resolves #495, resolves #472	2023-10-17 21:47:19 -07:00
Debanjum	3d4576ae38	Fix encoding binary files for sync from the Desktop, Obsidian client (#506 ) - Fix encoding binary files like PDFs for sync from Desktop client - Fix encoding binary files like PDFs for sync from Obsidian client	2023-10-17 15:37:22 -07:00
Debanjum Singh Solanky	c8293998d9	Fix encoding binary files like PDFs for sync from Obsidian client Use readBinary to read binary files like PDFs instead of read	2023-10-17 15:08:30 -07:00
sabaimran	ba60c869c9	Fix encoding binary files like PDFs for sync from Desktop client Use readFileSync, Buffer to pass appropriately formatted binary data	2023-10-17 15:08:23 -07:00
Andrew Spott	3d7381446d	Changed globbing. Now doesn't clobber a users glob if they want to a… (#496 ) * Changed globbing. Now doesn't clobber a users glob if they want to add it, but will (if just given a directory), add a recursive glob. Note: python's glob engine doesn't support `{}` globing, a future option is to warn if that is included. * Fix typo in globformat variable * Use older glob pattern for plaintext files --------- Co-authored-by: Saba <narmiabas@gmail.com>	2023-10-17 11:26:06 -07:00
sabaimran	2646c8554d	Provide a default value to offline_chat configuration of the conversation processor	2023-10-17 10:35:22 -07:00
Debanjum Singh Solanky	b8976426eb	Update offline chat model config schema used by Emacs, Obsidian clients The server uses a new schema for the conversation config. The Emacs, Obsidian clients need to use this schema to update the conversation config	2023-10-17 07:01:35 -07:00
Debanjum	ecc6fbfeb2	Push Files to Index from Emacs, Obsidian & Desktop Clients using Multi-Part Forms (#499 ) ### Overview - Add ability to push data to index from the Emacs, Obsidian client - Switch to standard mechanism of syncing files via HTTP multi-part/form. Previously we were streaming the data as JSON - Benefits of new mechanism - No manual parsing of files to send or receive on clients or server is required as most have in-built mechanisms to send multi-part/form requests - The whole response is not required to be kept in memory to parse content as JSON. As individual files arrive they're automatically pushed to disk to conserve memory if required - Binary files don't need to be encoded on client and decoded on server ### Code Details ### Major - Use multi-part form to receive files to index on server - Use multi-part form to send files to index on desktop client - Send files to index on server from the khoj.el emacs client - Send content for indexing on server at a regular interval from khoj.el - Send files to index on server from the khoj obsidian client - Update tests to test multi-part/form method of pushing files to index #### Minor - Put indexer API endpoint under /api path segment - Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method - Improve emoji, message on content index updated via logger - Don't call khoj server on khoj.el load, only once khoj invoked explicitly by user - Improve indexing of binary files - Let fs_syncer pass PDF files directly as binary before indexing - Use encoding of each file set in indexer request to read file - Add CORS policy to khoj server. Allow requests from khoj apps, obsidian & localhost - Update indexer API endpoint URL to` index/update` from `indexer/batch` Resolves #471 #243	2023-10-17 06:05:15 -07:00
Debanjum Singh Solanky	6a4f1b2188	Add more client, request details in logs by index/update API endpoint	2023-10-17 05:43:29 -07:00
Debanjum Singh Solanky	5efae1ad55	Update indexer API endpoint query params for force, content type New URL query params, `force' and `t' match name of query parameter in existing Khoj API endpoints Update Desktop, Obsidian and Emacs client to call using these new API query params. Set `client' query param from each client for telemetry visibility	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	84654ffc5d	Update indexer API endpoint URL to index/update from indexer/batch New URL follows action oriented endpoint naming convention used for other Khoj API endpoints Update desktop, obsidian and emacs client to call this new API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	e347823ff4	Log telemetry for index updates via push to API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	05be6bd877	Clicking Update Index in Obsidian settings should push files to index Use the indexer/batch API endpoint to regenerate content index rather than the previous pull based content indexing API endpoint	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	13a3122bf3	Stop configuring server to pull files to index from Obsidian client Obsidian client now pushes vault files to index instead	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	99a2c934a3	Add CORS policy to allow requests from khoj apps, obsidian & localhost Using fetch from Khoj Obsidian plugin was failing due to cross-origin request and method: no-cors didn't allow passing x-api-key custom header. And using Obsidian's request with multi-part/form-data wasn't possible either.	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	541cd59a49	Let fs_syncer pass PDF files directly as binary before indexing No need to do unneeded base64 encoding/decoding to pass pdf contents for indexing from fs_syncer to pdf_to_jsonl	2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky	d27dc71dfe	Use encoding of each file set in indexer request to read file Get encoding type from multi-part/form-request body for each file Read text files as utf-8 and pdfs, images as binary	2023-10-17 04:58:12 -07:00
Debanjum Singh Solanky	8e627a5809	Pass any files to be deleted to indexer API via Khoj Obsidian plugin - Keep state of previously synced files to identify files to be deleted - Last synced files stored in settings for persistence of this data across Obsidian reboots	2023-10-17 03:34:49 -07:00
Debanjum Singh Solanky	f2e293a149	Push Vault files to index to Khoj server using Khoj Obsidian plugin Use the multi-part/form-data request to sync Markdown, PDF files in vault to index on khoj server Run scheduled job to push updates to value for indexing every 1 hour	2023-10-17 03:05:30 -07:00
Debanjum Singh Solanky	6baaaaf91a	Test request body of multi-part form to update content index from khoj.el	2023-10-16 23:54:32 -07:00
Debanjum Singh Solanky	79b3f8273a	Make khoj.el send files to be deleted from index to server	2023-10-16 23:53:02 -07:00
Debanjum Singh Solanky	f64fa06e22	Initialize the Khoj Transient menu on first run instead of load This prevents Khoj from polling the Khoj server until explicitly invoked via `khoj' entrypoint function. Previously it'd make a request to the khoj server every time Emacs or khoj.el was loaded Closes #243	2023-10-16 19:11:46 -07:00
Debanjum	b4949f7f0b	Improve Offline Chat Model Experience (#494 ) - Make offline chat model user configurable. Use `filename` of any [GPT4All supported model](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models.json) like below: - Run GPT4All Chat Model on GPU, when available via [GPT4All Vulcan support](https://blog.nomic.ai/posts/gpt4all-gpu-inference-with-vulkan) - Use default Llama 2 supported by GPT4All - Make `tokenizer` and `max-prompt-size` of chat model user configurable. E.g When using chat models not in [this pre-defined list](https://github.com/khoj-ai/khoj/blob/master/src/khoj/processor/conversation/utils.py) that support larger context window or a different tokenizer. Closes #406, #418	2023-10-16 17:44:49 -07:00
Debanjum Singh Solanky	644c3b787f	Scale no. of chat history messages to use as context with max_prompt_size Previously lookback turns was set to a static 2. But now that we support more chat models, their prompt size vary considerably. Make lookback_turns proportional to max_prompt_size. The truncate_messages can remove messages if they exceed max_prompt_size later This lets Khoj pass more of the chat history as context for models with larger context window	2023-10-16 17:22:28 -07:00
Debanjum Singh Solanky	df1d74a879	Use max_prompt_size, tokenizer from config for chat model context stuffing	2023-10-15 16:52:53 -07:00
Debanjum Singh Solanky	116595b351	Use chat_model specified in new offline_chat section of config - Dedupe offline_chat_model variable. Only reference offline chat model stored under offline_chat. Delete the previous chat_model field under GPT4AllProcessorConfig - Set offline chat model to use via config/offline_chat API endpoint	2023-10-15 16:37:49 -07:00
Debanjum Singh Solanky	feb4f17e3d	Update chat config schema. Make max_prompt, chat tokenizer configurable This provides flexibility to use non 1st party supported chat models - Create migration script to update khoj.yml config - Put `enable_offline_chat' under new `offline-chat' section Referring code needs to be updated to accomodate this change - Move `offline_chat_model' to `chat-model' under new `offline-chat' section - Put chat `tokenizer` under new `offline-chat' section - Put `max_prompt' under existing `conversation' section As `max_prompt' size effects both openai and offline chat models	2023-10-15 16:35:11 -07:00
sabaimran	c125995d94	[Multi-User]: Part 0 - Add support for logging in with Google (#487 ) * Add concept of user authentication to the request session via GoogleUser	2023-10-14 19:39:13 -07:00
Debanjum Singh Solanky	247e75595c	Use AutoTokenizer to support more tokenizers	2023-10-14 16:54:52 -07:00
Saba	ff2dbadc9d	Use computed plaintext_content to set file content rather than calling f.read again	2023-10-14 13:28:34 -07:00
Debanjum Singh Solanky	1ad8b150e8	Add default tokenizer, max_prompt as fallback for non-default offline chat models Pass user configured chat model as argument to use by converse_offline The proper fix for this would allow users to configure the max_prompt and tokenizer to use (while supplying default ones, if none provided) For now, this is a reasonable start.	2023-10-13 22:48:56 -07:00
Debanjum Singh Solanky	56bd69d5af	Improve Llama v2 extract questions actor and associated prompt - Format extract questions prompt format with newlines and whitespaces - Make llama v2 extract questions prompt consistent - Remove empty questions extracted by offline extract_questions actor - Update implicit qs extraction unit test for offline search actor	2023-10-13 22:48:56 -07:00
sabaimran	09bb3686cc	Strip the incoming query from the slash conversation command (#500 ) * Strip the incoming query from the slash conversation command before passing it to the model or for search * Return q when content index not loaded * Remove -n 4 from pytest ini configuration to isolate test failures	2023-10-13 21:11:23 -07:00
Debanjum Singh Solanky	96c0b21285	Sync desktop app package.json with other Khoj clients metadata - Make `bump_version.sh' script set version for the Khoj desktop app too - Sync Khoj desktop app authors, license, description and version with the other interfaces and server - Update description in packages metadata to match project subtitle on Github	2023-10-13 20:43:55 -07:00
sabaimran	80fb56b8a5	Sync deksktop app package version with the other releases	2023-10-13 19:23:00 -07:00
Debanjum Singh Solanky	b669aa2395	Clean and fix the content indexing code in the Emacs client - Pass payloads as unibyte. This was causing the request to fail for files with unicode characters - Suppress messages with file content in on index updates - Fix rendering response from server on index update API call - Extract code to populate body of index update HTTP request with files	2023-10-13 18:00:37 -07:00
Debanjum Singh Solanky	bea196aa30	Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method Previously global state of `url-request-method' would affect the kind of request made to api/config/data API endpoint as it wasn't being explicitly being set before calling the API endpoint This was done with the assumption that the default value of GET for url-request-method wouldn't change globally But in some cases, experientially, it can get changed. This was resulting in khoj.el load failing as POST request was being made instead which would throw error	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	292f0420ad	Send content for indexing on server at a regular interval from khoj.el - Allow indexing frequency to be configurable by user - Ensure there is only one khoj indexing timer running	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	fc99431754	Send files to index on server from the khoj.el emacs client - Add elisp variable to set API key to engage with the Khoj server - Use multi-part form to POST the files to index to the indexer API endpoint on the khoj server	2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky	68018ef397	Use multi-part form to send files to index on desktop client - Add typing for variables in for loop and other minor formatting clean-up - Assume utf8 encoding for text files and binary for image, pdf files	2023-10-12 20:58:49 -07:00
Debanjum Singh Solanky	7190b3811d	Remove all filter terms in user query from defiltered_query Previously only the the last filter's terms were getting effectively applied as the `filter.defilter' operation was being done on `user_query' but was updating the `defiltered_query'	2023-10-12 20:56:17 -07:00
Debanjum Singh Solanky	60e9a61647	Use multi-part form to receive files to index on server - This uses existing HTTP affordance to process files - Better handling of binary file formats as removes need to url encode/decode - Less memory utilization than streaming json as files get automatically written to disk once memory utilization exceeds preset limits - No manual parsing of raw files streams required	2023-10-11 23:58:23 -07:00
Debanjum Singh Solanky	9ba173bc2d	Improve emoji, message on content index updated via logger Use mailbox closed with flag down once content index completed. Use standard, existing logger messages in new indexer messages, when files to index sent by clients	2023-10-11 17:12:03 -07:00
Debanjum Singh Solanky	6aa69da3ef	Put indexer API endpoint under /api path segment Update FastAPI app router, desktop app and to use new url path to batch indexer API endpoint All api endpoints should exist under /api path segment	2023-10-09 21:35:58 -07:00
Debanjum Singh Solanky	f6f7a62d80	Wait for user to stop typing to trigger search from khoj.el in Emacs - Improves user experience by aligning idle time with search latency to avoid display jitter (to render results) while user is typing - Makes the idle time configurable Closes #480	2023-10-06 12:44:45 -07:00
sabaimran	5c4f0d42b7	Return new default config in API endpoint	2023-10-06 12:30:09 -07:00
sabaimran	052b25af0a	Update default configuration passed to Khoj clients to circumvent valiation issues	2023-10-06 12:29:15 -07:00
Debanjum Singh Solanky	a85ff941ca	Make offline chat model user configurable Only GPT4All supported Llama v2 models will work given the prompt structure is not currently configurable	2023-10-04 20:41:14 -07:00
Debanjum Singh Solanky	d1ff812021	Run GPT4All Chat Model on GPU, when available GPT4All now supports running models on GPU via Vulkan	2023-10-04 18:42:12 -07:00
Debanjum Singh Solanky	13b16a4364	Use default Llama 2 supported by GPT4All Remove custom logic to download custom Llama 2 model. This was added as GPT4All didn't support Llama 2 when it was added to Khoj	2023-10-03 19:01:54 -07:00
sabaimran	4a5ed7f06c	Update Khoj package version for Electron, Desktop app (#492 ) * Address package upgrade for Electron application * Update package version for Electron desktop application	2023-10-03 12:21:32 -07:00
sabaimran	3f962a55c3	Fix Linux Desktop Application (#491 ) * Use separate functions for adding files and folders to configuration for indexing * Add a loading bar while data is syncing * Bump the minor version for the application	2023-10-03 11:43:19 -07:00
sabaimran	63b3696af0	Release Khoj version 0.12.3	2023-09-26 22:41:11 -07:00
sabaimran	d2f9bca1cf	Fix null ref issue in query method and update logic for determining whether khoj is already configured in obsidian	2023-09-26 22:33:44 -07:00
sabaimran	2f18383349	Release Khoj version 0.12.2	2023-09-26 11:59:47 -07:00
sabaimran	588f35b6e9	Add max prompt size for gpt-3.5-turbo-16k	2023-09-26 10:57:35 -07:00
sabaimran	4e370d7a18	Release Khoj version 0.12.1	2023-09-26 09:24:53 -07:00
sabaimran	3675aa348a	Update naming of Khoj in manifest.json for Obsidian	2023-09-26 09:24:36 -07:00
sabaimran	a82d1becc3	Release Khoj version 0.12.0	2023-09-26 09:17:56 -07:00
sabaimran	38f0df3d53	Remove unused icons from electron app folder	2023-09-26 07:56:29 -07:00
sabaimran	5e16074b92	Fix comparison for search type in plugins mode	2023-09-25 10:57:17 -07:00
sabaimran	2dd15e9f63	Resolve issues with GPT4All and fix prompt for yesterday extract questions date filter (#483 ) - GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all - Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example. - Update all setup data in conftest.py to use new client-server indexing pattern	2023-09-18 14:41:26 -07:00
sabaimran	b225d1188c	Fix formatting of gpt.py	2023-09-18 11:09:02 -07:00
Jonny-GM	34b202b868	More lenient date searching (#481 ) * Modify DateFilter to use compiled entry key * Instruct search to include date in query * Minor prompt change * Prompt fix	2023-09-18 10:46:00 -07:00
sabaimran	16874e1953	Provide force fallback for regeneration	2023-09-12 16:35:07 -07:00
sabaimran	9f42a1a036	Propagate flags to configure index command	2023-09-11 10:33:44 -07:00
sabaimran	343854752c	Improve docker builds for local hosting (#476 ) * Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions * Move configure_search method into indexer * Add conditional installation for gpt4all * Add hint to go to localhost:42110 in the docs. Addresses #477	2023-09-08 17:07:26 -07:00
sabaimran	dccfae3853	Remove PySide dependency and deprecate desktop builds (#475 ) * Remove PySide, gui option from code * Remove pyside 6 dependency from code * Remove workflows which build desktop applications * Update unit tests and update line in documentation * Remove additional references to pyinstaller, gui * Add uninstall steps to normal uninstall instructions	2023-09-07 11:36:27 -07:00
sabaimran	76562f4250	Add front-end Electron application for Khoj local file syncing (#473 ) * Initial version - setup a file-push architecture for generating embeddings with Khoj * Use state.host and state.port for configuring the URL for the indexer * Fix parsing of PDF files * Read markdown files from streamed data and update unit tests * On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system * Init: refactor indexer/batch endpoint to support a generic file ingestion format * Add features to better support indexing from files sent by the desktop client * Initial commit with Electron application - Adds electron app * Add import for pymupdf, remove import for pypdf * Allow user to configure khoj host URL * Remove search type configuration from index.html * Use v1 path for current indexer routes	2023-09-06 12:04:18 -07:00
bholagabbar	205dc90746	Fix notion title bug (#474 ) * Update notion_to_jsonl.py * Fix try-catch block	2023-09-05 10:47:42 -07:00
sabaimran	4854258047	Move to a push-first model for retrieving embeddings from local files (#457 ) * Initial version - setup a file-push architecture for generating embeddings with Khoj * Update unit tests to fix with new application design * Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request * Use state.host and state.port for configuring the URL for the indexer * On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system	2023-08-31 12:55:17 -07:00
sabaimran	92cbfef7ab	Skip plaintext file indexing if there's a parsing issue and log the file	2023-08-29 14:34:08 -07:00
sabaimran	74409c2c64	Release Khoj version 0.11.4	2023-08-29 11:44:35 -07:00
sabaimran	1b85958bcc	trim chat input start	2023-08-28 19:18:10 -07:00
sabaimran	e592f6eac8	Release Khoj version 0.11.3	2023-08-28 14:46:03 -07:00
sabaimran	7c35da9fc4	Fix bug in /chat endpoint for general and update depdendencies	2023-08-28 14:12:11 -07:00
sabaimran	bc09143856	Release Khoj version 0.11.2	2023-08-28 10:16:13 -07:00
Debanjum Singh Solanky	01b310635e	Enable passing search query filters via chat and test it	2023-08-28 09:24:32 -07:00
Debanjum Singh Solanky	794bad8bcb	Make date_filter.extract_date_range method always return a list type	2023-08-28 00:55:28 -07:00
Debanjum Singh Solanky	d5a2de6222	Add method to extract filter terms from query to all filters - Test the get_filter_term method in all 3 word, file, date filters - Make the existing can_filter method by default in base filter abstract class	2023-08-28 00:55:28 -07:00
Debanjum	150105505b	Add Default chat command. Make Khoj ask clarifying questions (#468 ) - Make Khoj ask clarifying questions when answer not in provided context - Add default conversation command to auto switch b/w general, notes modes - Show filtered list of commands available with the currently input text - Use general prompt when no references found and not in Notes mode - Test general and notes slash commands in offline chat director tests	2023-08-28 00:52:57 -07:00
Debanjum Singh Solanky	eb6cd4f8d0	Use general prompt when no references found and not in Notes mode	2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky	edffbad837	Make Khoj ask clarifying questions when answer not in provided context Previously it would just refuse ask for clarification. This improves the chat quality score for the existing director tests	2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky	75c1016ec0	Show filtered list of commands available with the currently input text	2023-08-28 00:46:10 -07:00
Debanjum Singh Solanky	74605f6159	Add default conversation command to auto switch b/w general, notes modes This was the default behavior but behavior regressed when adding slash commands in PR #463	2023-08-28 00:46:10 -07:00
sabaimran	cbc978ea08	Update help links for notion, github to point to the main docs	2023-08-27 15:02:55 -07:00
sabaimran	b45e1d8c0d	Fix plaintext HTML parsing and rendering (#464 ) * Store conversation command options in an Enum * Move to slash commands instead of using @ to specify general commands * Calculate conversation command once & pass it as arg to child funcs * Add /notes command to respond using only knowledge base as context This prevents the chat model to try respond using it's general world knowledge only without any references pulled from the indexed knowledge base * Test general and notes slash commands in openai chat director tests --------- Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>	2023-08-27 11:24:30 -07:00
Debanjum	7919787fb7	Use Slash Commands and Add Notes Slash Command (#463 ) * Store conversation command options in an Enum * Move to slash commands instead of using @ to specify general commands * Calculate conversation command once & pass it as arg to child funcs * Add /notes command to respond using only knowledge base as context This prevents the chat model to try respond using it's general world knowledge only without any references pulled from the indexed knowledge base * Test general and notes slash commands in openai chat director tests * Update gpt4all tests to use md configuration * Add a /help tooltip * Add dynamic support for describing slash commands. Remove default and treat notes as the default type --------- Co-authored-by: sabaimran <narmiabas@gmail.com>	2023-08-26 18:11:18 -07:00
sabaimran	e64357698d	Skip indexing single bad markdown, plaintext file (#460 )	2023-08-23 15:34:56 -07:00
sabaimran	84bd579077	Format the chat outputted message with code, bolding, or italics. Add a copy button for code. Closes #445 .	2023-08-19 20:02:57 -07:00
sabaimran	f9e09ba490	Do not try downloading model from GPT4All if the user is not connected to the internet	2023-08-19 19:09:21 -07:00
Debanjum Singh Solanky	3ff4e19dd2	Release Khoj version 0.11.1	2023-08-16 22:53:29 -07:00
sabaimran	4fb8c2c5e1	Pass a SIGTERM to tell the uvicorn server to exit and gracefully kill the thread	2023-08-16 21:27:05 -07:00
sabaimran	4e03dfea43	Attach the parent to the server thread, allowing the kill signal to trigger a graceful exit (#446 )	2023-08-16 19:36:10 -07:00
Debanjum Singh Solanky	26c3977fb9	Remove info hint to reindex khoj on unexpected search results The index corruption was issue resolved a while ago in #325 and hasn't cropped up again	2023-08-16 00:58:59 -07:00
sabaimran	def909a913	Revert "Open Web interface within Desktop app in GUI mode" (#444 )	2023-08-15 23:26:28 -07:00
sabaimran	6562ec6531	Release Khoj version 0.11.0	2023-08-14 19:25:03 -07:00
sabaimran	0ea901c7c1	Allow indexing to continue even if there's an issue parsing a particular org file (#430 ) * Allow indexing to continue even if there's an issue parsing a particular org file * Use approximation in pytorch comparison in text_search UT, skip additional file parser errors for org files * Change error of expected failure	2023-08-14 07:56:33 -07:00
sabaimran	7b907add77	Add support for indexing plaintext files (#420 ) * Add support for indexing plaintext files - Adds backend support for parsing plaintext files generically (.html, .txt, .xml, .csv, .md) - Add equivalent frontend views for setting up plaintext file indexing - Update config, rawconfig, default config, search API, setup endpoints * Add a nifty plaintext file icon to configure plaintext files in the Web UI * Use generic glob path for plaintext files. Skip indexing files that aren't in whitelist	2023-08-09 15:44:40 -07:00
Ellen7ions	26bddcb65c	Add support for starting a new line with shift-enter (#412 ) * Add support for starting a new line with shift-enter * Remove useless comments. Set font-size: medium. * Update src/khoj/interface/web/chat.html Update the styling to have the padding, margin and line-height like before. Co-authored-by: Debanjum <debanjum@gmail.com> * Update src/khoj/interface/web/chat.html Make the chat-body scroll to the bottom after resizing Co-authored-by: Debanjum <debanjum@gmail.com> --------- Co-authored-by: Debanjum <debanjum@gmail.com>	2023-08-07 19:49:07 -07:00
Debanjum Singh Solanky	97609e4995	Use 500px png of khoj logo instead svg for much smaller asset size The khoj logo svg was 1.3Mb. The 500px png of it is 38Kb. Given all usage of khoj-logo are below 230px this should work fine	2023-08-07 18:27:11 -07:00
Debanjum	14a816d173	Open Web interface within Desktop app in GUI mode (#429 ) Previously the GUI mode (with khoj --gui or using the desktop app) would open the web interface in the users default web browser. Now the web interface is just rendered within the app itself using PyQT's Webview. This gives it a more proper app like feel	2023-08-07 17:48:30 -07:00
Debanjum Singh Solanky	378b96ec1b	Open the khoj app window maximized on startup	2023-08-07 15:39:05 -07:00
Debanjum Singh Solanky	ea734ba1c8	Open app in native view on starting it in GUI mode instead of on web browser - Opens settings page on first run and landing page after in GUI mode Previously was only opening the GUI on linux after first run as it doesn't have a system tray - Both the views are from the web interface but are rendered within the app instead of the browser	2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky	9c494705a8	Open the search, chat or config view in app from the system tray menu	2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky	cc36b87345	Render the web interface directly within the desktop app as a webview	2023-08-07 13:41:12 -07:00
Jason Qin	3ef1b7073d	Update obsidian/manifest.json Closes #426	2023-08-07 10:41:39 -07:00
sabaimran	738cf650b3	Explicitly set Khoj to use the default locale of the user (#425 ) - Explicitly set locale using `locale.setLocale(locale.LC_ALL, '')` for localization. Relevant for datetime libraries. See [Python 3 documentation](https://docs.python.org/3/library/locale.html#locale.setlocale).	2023-08-07 09:23:24 -07:00
Muftawo	c8ef619090	fixed reference link to landing page (#417 ) * Fixed zsh error no matches found * Fixed home page 404 error	2023-08-04 10:38:14 -07:00
sabaimran	78012b8111	Avoid null ref issue when setting model state for web UI. Closes #410	2023-08-03 00:39:06 -07:00
sabaimran	0baed742e4	Add checksums to verify the correct model is downloaded as expected (#405 ) * Add checksums to verify the correct model is downloaded as expected - This should help debug issues related to corrupted model download - If download fails, let the application continue * If the model is not download as expected, add some indicators in the settings UI * Add exc_info to error log if/when download fails for llamav2 model * Simplify checksum checking logic, update key name in model state for web client	2023-08-02 23:26:52 -07:00
Debanjum Singh Solanky	e6e3acdbe4	Release Khoj version 0.10.1	2023-08-01 23:55:13 -07:00
Debanjum Singh Solanky	7c1d70aa17	Bump GPT4All response generation batch size to 512 from 256 A batch size of 512 performs ~20% better on a XPS with no GPU and 16Gb RAM. Seems worth the tradeoff for now	2023-08-01 23:34:02 -07:00
Debanjum	16c6bfce8e	Improve Quality and Reliability of Offline Chat (#393 ) # Incoming ## Major ### Fix Prompt Size Exceeded Issue - Fix issues related to prompt size, Closes #386. Use the correct tokenizer to calculate whether the input needs to be truncated or not. ### Improve Llama 2 Model Download - Use the correct download link for LlamaV2 -- should have been using the small model, but was using the medium - Add better downloading logic to retry download if it failed, Closes #379 ### Fix Segmentation Fault due to Race - Add a lock around generating chat responses from the offline model to avoid segmentation faults. Closes #367. - Add a loading symbol to the web chat UI when the model is thinking. Closes #392 ### Improve Chat Response Latency - Improve performance of offline chat by increasing batch size (via `n_batch`) to automatically engage more cores/GPU, using smaller model and fixing prompt vs response token generation numbers. Closes #363 ### Fix Fake Dialogue Continuation - Fix formatting of user query with offline chat, this was contributing to #398 - Stop Llama 2 from Creating Fake Dialogue Continuations. Closes #398 ## Minor - Improve default message for Chat window on web when it's not configured. Include hint to use offline chat. - Add null check in `perform_chat_checks` method - Add offline chat director unit tests ## Performance Analysis (Time to First Token) \| \| v0.10.0 \| this branch \| \|-\|-\|-\| \| Query 1 \| 52s \| 28s \| \| Query 2 \| 33s\| 42s \| \| Query 3 \| 67s\| 38s\|	2023-08-01 22:07:27 -07:00
Debanjum Singh Solanky	44292afff2	Put offline model response generation behind the chat lock as well Not just the chat response streaming	2023-08-01 21:53:52 -07:00
Debanjum Singh Solanky	1812473d27	Extract new schema version for each migration script into a variable This should ease readability, indicates which version this migration script will update the schema to once applied	2023-08-01 21:41:08 -07:00
Debanjum Singh Solanky	b9937549aa	Simplify migration scripts management. Make them use static version - Only make them update config when it's run conditions are satisfies - Use static schema version to simplify reasoning about run conditions	2023-08-01 21:28:20 -07:00
Debanjum Singh Solanky	185a1fbed7	Remove old chat setup timer. It is mislabelled, irrelevant since streaming	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	c2b7a14ed5	Fix context, response size for Llama 2 to stay within max token limits Create regression text to ensure it does not throw the prompt size exceeded context window error	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	6e4050fa81	Make Llama 2 stop generating response on hitting specified stop words It would previously some times start generating fake dialogue with it's internal prompt patterns of <s>[INST] in responses. This is a jarring experience. Stop generation response when hit <s> Resolves #398	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	aa6846395d	Fix offline model migration script to run for version < 0.10.1 - Use same batch_size in extract question actor as the chat actor - Log final location the chat model is to be stored in, instead of it's temp filename while it is being downloaded	2023-08-01 20:51:53 -07:00
Ikko Eltociear Ashimine	49abb9df9c	Fix typo in orgnode.py (#397 ) Fix spelling of Ouput in org parser property drawer comment to Output.	2023-08-01 19:54:57 -07:00
sabaimran	f409e16137	Update some of the extract question prompts for llamav2	2023-08-01 12:23:36 -07:00
sabaimran	b11b00a9ff	Add log line for time to first response	2023-08-01 10:57:38 -07:00
sabaimran	778df6be71	Add a logline when the offline model migration script runs	2023-08-01 09:27:42 -07:00
sabaimran	3a5d93d673	Add migration script for getting the new offline model	2023-08-01 09:25:05 -07:00
sabaimran	90efc2ea7a	Update comments and add explanations	2023-08-01 09:24:03 -07:00
sabaimran	f7e03f6d63	Switch spinner snake case -> camel case	2023-08-01 08:52:25 -07:00
sabaimran	1c52a6993f	add a lock around chat operations to prevent the offline model from getting bombarded and stealing a bunch of compute resources - This also solves #367	2023-08-01 00:23:17 -07:00
sabaimran	6c3074061b	Disable the input bar when chat response is in flight	2023-08-01 00:21:39 -07:00
sabaimran	c14cbe926a	Add a loading symbol to web chat. Closes #392	2023-07-31 23:35:48 -07:00
sabaimran	8054bdc896	Use n_batch parameter to increase resource consumption on host machine (and implicitly engage GPU)	2023-07-31 23:25:08 -07:00
sabaimran	e55e9a7b67	Fix unit tests and truncation logic	2023-07-31 21:37:59 -07:00
sabaimran	2335f11b00	Add better error handling for download processes incase of failure	2023-07-31 21:07:38 -07:00
sabaimran	209975e065	Resolve merge conflicts: let Khoj fail if the model tokenizer is not found	2023-07-31 19:12:26 -07:00
sabaimran	2d6c3cd4fa	Misc. quality improvements for Llama V2 - Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S - Use a proper Llama Tokenizer for counting tokens for truncation with Llama - Add additional null checks when running	2023-07-31 19:11:20 -07:00
sabaimran	ca195097d7	Update chat hint message at first run	2023-07-31 17:46:09 -07:00
Debanjum Singh Solanky	ded606c7cb	Fix format of user query during general conversation with Llama 2	2023-07-31 17:21:14 -07:00
Debanjum Singh Solanky	48e5ac0169	Do not drop system message when truncating context to max prompt size Previously the system message was getting dropped when the context size with chat history would be more than the max prompt size supported by the cat model Now only the previous chat messages are dropped or the current message is truncated but the system message is kept to provide guidance to the chat model	2023-07-31 17:21:14 -07:00
sabaimran	88ef86ad5c	Fix typing issues for mypy (#372 )	2023-07-30 19:27:48 -07:00
sabaimran	ca2c942b65	Add typing to compiled_references and inferred_queries	2023-07-30 19:10:30 -07:00
sabaimran	3646fd1449	Add a warning to indicate that Khoj is not configured to work with personal data sources	2023-07-30 18:52:10 -07:00
sabaimran	996832dc72	Allow user to chat even if content types aren't configured - use empty references	2023-07-30 18:47:45 -07:00
Debanjum Singh Solanky	53810a0ff7	Create khoj config dir if non-existant, before writing to khoj env file	2023-07-30 01:35:36 -07:00
sabaimran	f65d157244	Release Khoj version 0.10.0	2023-07-28 19:27:47 -07:00
Debanjum Singh Solanky	f76af869f1	Do not log the gpt4all chat response stream in khoj backend Stream floods stdout and does not provide useful info to user	2023-07-28 19:14:04 -07:00
sabaimran	5ccb01343e	Add Offline chat to Obsidian (#359 ) * Add support for configuring/using offline chat from within Obsidian * Fix type checking for search type * If Github is not configured, /update call should fail * Fix regenerate tests same as the update ones * Update help text for offline chat in obsidian * Update relevant description for Khoj settings in Obsidian * Simplify configuration logic and use smarter defaults	2023-07-28 18:47:56 -07:00
Debanjum	b3c1507708	Merge pull request #361 from khoj-ai/configure-offline-chat-from-emacs - Configure using Offline Chat from Emacs: - Enable, Disable Offline Chat from Emacs - Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup - Benefits: Offline chat models are better for privacy but not great at answering questions	2023-07-28 18:06:58 -07:00
sabaimran	9f78db0579	Let Offline chat override OpenAI API settings (#362 ) * Let Offline chat override OpenAI API settings * Download the offline model whenever offline chat is enabled * Add progressbar for download for llamav2 model to track progress * Change ordering of n due to switch of default processor * Flip ordering of offline/openai checks when extracting questions from query	2023-07-28 17:26:20 -07:00
Debanjum Singh Solanky	ebfbef1f68	Configure using offline chat from Emacs Closes #358	2023-07-28 16:07:33 -07:00
sabaimran	29081f4429	Adjust parameters for offline chat	2023-07-27 22:22:09 -07:00
sabaimran	124d97c26d	Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352 ) * Working example with LlamaV2 running locally on my machine - Download from huggingface - Plug in to GPT4All - Update prompts to fit the llama format * Add appropriate prompts for extracting questions based on a query based on llama format * Rename Falcon to Llama and make some improvements to the extract_questions flow * Do further tuning to extract question prompts and unit tests * Disable extracting questions dynamically from Llama, as results are still unreliable	2023-07-27 20:51:20 -07:00
Debanjum Singh Solanky	715d56d4f0	Use new schema to update khoj.yml config from khoj.el	2023-07-26 17:34:16 -07:00
sabaimran	8b2af0b5ef	Add support for our first Local LLM 🤖🏠 (#330 ) * Add support for gpt4all's falcon model as an additional conversation processor - Update the UI pages to allow the user to point to the new endpoints for GPT - Update the internal schemas to support both GPT4 models and OpenAI - Add unit tests benchmarking some of the Falcon performance * Add exc_info to include stack trace in error logs for text processors * Pull shared functions into utils.py to be used across gpt4 and gpt * Add migration for new processor conversation schema * Skip GPT4All actor tests due to typing issues * Fix Obsidian processor configuration in auto-configure flow * Rename enable_local_llm to enable_offline_chat	2023-07-26 16:27:08 -07:00
sabaimran	23d77ee338	Fix import issues in desktop image builds (#343 )	2023-07-26 15:45:52 -07:00
Debanjum Singh Solanky	7722a9c347	Default to using the gpt-3.5-turbo model for chat from khoj.el	2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky	f0d4a4cf9a	Revert "Make configure_content functional. Do not pass content index state to it." This reverts commit `2ddee7e745` as it broke partial updates of the content index for just the specified content types	2023-07-21 13:59:09 -07:00
sabaimran	82c725817e	Merge branch 'master' of github.com:khoj-ai/khoj	2023-07-21 13:24:05 -07:00
sabaimran	596e11ec6d	Use the same function for computing entries for IDs regardless of whether it has prev entries	2023-07-21 13:23:56 -07:00
Debanjum Singh Solanky	2ddee7e745	Make configure_content functional. Do not pass content index state to it.	2023-07-20 23:24:08 -07:00
sabaimran	1610d2ebd9	📝 Add a documentation base for Khoj! (#333 ) * Add docs for more organized, accessible information detailing Khoj setup * Delete duplicated files * Add a coverpage without enabling it. Add logo and theme * Remove obsidian README.md * Add plausible script to index.html via docsify	2023-07-20 22:34:25 -07:00
Debanjum Singh Solanky	3e59be7f1d	Release Khoj version 0.9.0	2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky	d078e7b1f6	Clean up search type usage in khoj server, tests and Readme	2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky	4d910936b7	Fix triggering index update on khoj server from khoj.el	2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky	5c7d7f558d	Make AI model used for Khoj chat configurable from khoj.el - Fix bug. Set the unused model-name to a standad default value	2023-07-18 19:57:54 -07:00
Debanjum	5f2be2a9bb	Merge pull request #298 from HyunggyuJang/patch-1 Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config	2023-07-18 17:54:11 -07:00
Debanjum Singh Solanky	429e1b4b48	Regenerate index to apply corruption fixes on first run of new khoj	2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky	83e1088d42	Manage khoj.yml config migrations on app start. Version the schema - Add version to khoj.yml schema Versioning the khoj.yml config schema will simplify future migrations	2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky	71e8ddd9a2	Check if PDF is configured before showing it as an option in khoj.el	2023-07-17 15:49:20 -07:00
Debanjum	d00c5da8b7	Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing ## Stabilize and Simplify Content Indexing ### Major Updates - `9bcca43` Unify logic to update entries when indexing from scratch or incrementally - `89c7819` Unify logic to update embeddings when indexing from scratch or incrementally - `6a0297c` Stable sort new entries when marking entries for update - `58d86d7` Unify logic to configure server from API or on server start - Create tests to ensure old entries, embeddings in index are unaffected on adding new entries - Refer: `1482fd4`, `7669b85`, `88d1a29` - `ad41ef3` Make normalization of embeddings configurable to test this in `c73feeb` ### Minor Updates - `1673bb5` Add todo state to compiled form of each entry - `6e70b91` Remove unused `dump_jsonl` helper method - `7ad9603` Improve naming of lock - `b02323a` Improve naming text search test methods Resolves #190	2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky	3e3a1ecbc8	Start app even if server init fails to let user fix it Show stacktrace on error to help debugging	2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky	ef6a0044f4	Drop embeddings of deleted text entries from index Previously the deleted embeddings would continue to be in the index, even after the entry was deleted	2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky	ad41ef3991	Make normalizing embeddings configurable	2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky	89c7819cb7	Unify logic to generate embeddings from scratch and incrementally This simplifies the `compute_embeddings' method and avoids potential later divergence in handling the index regenerate vs update scenarios	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6a0297cc86	Stable sort new entries when marking entries for update	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6e70b914c2	Remove unused dump_jsonl method The entries index is stored ingzipped jsonl files for each content type	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	9bcca43299	Use single func to handle indexing from scratch and incrementally Previous regenerate mechanism did not deduplicate entries with same key So entries looked different between regenerate and update Having single func, mark_entries_for_update, to handle both scenarios will avoid this divergence Update all text_to_jsonl methods to use the above method for generating index from scratch	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	1673bb5558	Add todo state to compiled form of each org-mode entry	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	7ad96036b0	Improve lock name to config_lock instead of search_index_lock It is used to lock updates to all app config state, including processor	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	58d86d7876	Use single func to configure server via API and on server start Improve error messages on failure to configure server components	2023-07-16 01:45:53 -07:00
sabaimran	a15711e635	Fix null type checks in get /config	2023-07-15 15:53:56 -07:00
sabaimran	e590d75b20	Start Khoj even when config is not valid (#320 ) * Add icon to indicate bad config, start Khoj even if there was an issue setting up the index	2023-07-15 14:11:54 -07:00
sabaimran	49ab201c30	Fix issues importing PySide in Docker container (#322 ) * Rather than installing PyQT dependencies, remove codepaths that require pyqt files in no-gui mode	2023-07-15 13:33:13 -07:00
sabaimran	ba47f2ab39	Merge branch 'master' of github.com:debanjum/khoj	2023-07-14 22:28:05 -07:00
sabaimran	874cffd256	Add additional support for parsing notion workspaces	2023-07-14 22:27:56 -07:00
Debanjum	52f68167ce	Merge pull request #317 from khoj-ai/reduce-memory-consumption-by-search-model-duplication Reuse Search Models across Content Types to reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled text content type - Split model state into 2 separate state objects, `search_models` and `content_index`. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - The change should cut down memory utilization quite a bit for most users. I see a >50% drop in memory utilization on my Khoj instance. But this will vary for each user based on the amount of content indexed vs number of plugins enabled. - This change does not solve the RAM utilization scaling with size of the index, as the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 19:54:12 -07:00
Debanjum Singh Solanky	f08e9539f1	Release lock after updating index even if update fails to prevent deadlock Wrap acquire/release locks in try/catch/finally when updating content index and search models to prevent lock not being released on error and causing a deadlock	2023-07-14 16:57:27 -07:00
sabaimran	37f7f9fd1d	Add additional telemetry for system understanding (#316 ) * Add additional telemetry in order to understand which data sources are the most useful * Make actions side by side in the configuration page * Restore main run command * Update links to point to wiki pages for Github, Notion integrations * Stanardize nomenclature of the api_type to use _config suffix Remove header fields that aren't actually helpful for understanding config usage	2023-07-14 10:14:07 -07:00
Debanjum Singh Solanky	86e2bec9a0	Reuse Search Models across Content Types to Reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types as well. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled content type - Split model state into 2 separate state objects, `search_models' and `content_index'. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - This should cut down memory utilization quite a bit for most users. I see a ~50% drop in memory utilization. This will, of course, vary for each user based on the amount of content indexed vs number of plugins enabled - This does not solve the RAM utilization scaling with size of the index. As the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 01:27:22 -07:00
Debanjum	b2718d330c	Merge pull request #304 from migrate-from-pyqt-to-pyside Migrate from PyQT6 to PySide6	2023-07-13 11:54:47 -07:00
sabaimran	31e933207f	Set default values for sys.stdout if they're unavailable	2023-07-12 22:22:49 -07:00
Debanjum Singh Solanky	9c76150895	Migrate from PyQT6 to PySide6	2023-07-11 18:43:44 -07:00
HyunggyuJang	88c42b3043	Encode data as utf-8 otherwise it will complain, see `1c85531090`	2023-07-11 17:06:05 +09:00
Debanjum Singh Solanky	f664a74e77	Update Khoj server to run on non standard port, 42110 instead of 8000 Resolves #295	2023-07-10 21:27:58 -07:00
sabaimran	effb52f859	Fix demo rendering with the new header	2023-07-10 21:16:19 -07:00
sabaimran	55f5be7b03	Release Khoj version 0.8.2	2023-07-10 14:39:32 -07:00
sabaimran	9a63f89f33	Merge branch 'master' of github.com:debanjum/khoj	2023-07-10 14:31:19 -07:00
sabaimran	53809298c0	Release Khoj version 0.8.1	2023-07-10 14:30:04 -07:00
tjsousa	5b37e988e6	Allow using configured GPT chat model (#292 ) My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.	2023-07-10 14:24:40 -07:00
Debanjum Singh Solanky	75ff871217	Release Khoj version 0.8.0	2023-07-10 13:37:51 -07:00
Debanjum Singh Solanky	979088b3dc	Add tooltip helper text on web settings page buttons - Provide more details on what clicking configure, initialize buttons or changing the results count slider does - This shows up on user hovering over those buttons	2023-07-10 13:32:41 -07:00
Debanjum Singh Solanky	255781e135	Use relative link on logo to jump to correct page on local and cloud	2023-07-10 13:22:20 -07:00
Debanjum Singh Solanky	b2d229c116	Move header pane style to base khoj.css for reuse. Fix logo size	2023-07-10 13:10:17 -07:00
Debanjum Singh Solanky	20cb314171	Open the Khoj config page in the browser on first run	2023-07-10 12:10:20 -07:00
sabaimran	07cf5a214a	Check if PDF files are present in the Obsidian vault before initializing the Khoj configuration (#293 )	2023-07-10 10:33:04 -07:00
sabaimran	7364bac8ae	Make the header take up less space - Use a single row for the header - Needed custom styling for each page because each of them are different in subtle ways, unfortunately	2023-07-09 22:31:37 -07:00
sabaimran	62704cac09	Add a plugin which allows users to index their Notion pages (#284 ) * For the demo instance, re-instate the scheduler, but infrequently for api updates - In constants, determine the cadence based on whether it's a demo instance or not - This allow us to collect telemetry again. This will also allow us to save the chat session * Conditionally skip updating the index altogether if it's a demo isntance * Add backend support for Notion data parsing - Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token - Make corresponding updates to the default config, raw config to support the new notion addition * Add corresponding views to support configuring Notion from the web-based settings page - Support backend APIs for deleting/configuring notion setup as well - Streamline some of the index updating code * Use defaults for search and chat queries results count * Update pagination of retrieving pages from Notion * Update state conversation processor when update is hit * frequency_penalty should be passed to gpt through kwargs * Add check for notion in render_multiple method * Add headings to Notion render * Revert results count slider and split Notion files by blocks * Clean/fix misc things in the function to update index - Use the successText and errorText variables appropriately - Name parameters in function calls - Add emojis, woohoo * Clean up and further modularize code for processing data in Notion	2023-07-09 15:29:26 -07:00
Debanjum	77755c0284	Fix Packaging the Khoj Desktop Apps (#289 ) * Add langchain static files and pytorch metadata to Khoj native app * Add pillow static files, metadata & hidden imports to Khoj native app * Fix path to web interface static files on Khoj native app * Add tiktoken hidden imports to make chat work from Khoj native app * Fix Khoj native app to run with GUI mode enabled This got broken when we moved from using the --no-gui flag to using --gui in https://github.com/khoj-ai/khoj/pull/263	2023-07-09 10:21:16 -07:00
sabaimran	4c135ea316	Make streaming optional for the /chat endpoint (#287 ) * Update the /chat endpoint to conditionally support streaming - If streams are enabled, return the threadgenerator as it does currently - If stream is disabled, return a JSON response with the response/compiled references separated out - Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian - Rename chat/init/ to chat/history * Update khoj.el to use the /history endpoint - Update corresponding unit tests to use stream=true * Remove & from call to /chat for obsidian * Abstract functions out into a helpers.py file and clean up some of the error-catching	2023-07-09 10:12:09 -07:00
Debanjum Singh Solanky	0a86220d42	Use default values, delete content config on disable and update state	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	362063f5fe	By default, connect to Khoj server over IPv4 from Obsidian plugin	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	571e8c2548	Add rerank, index corruption hint on search page of web interface Similar to the hint alrady in the Obsidian search modal Closes #272	2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky	61e131f95c	Hide unused model field from chat settings on web interface	2023-07-07 18:43:53 -07:00
Debanjum Singh Solanky	af30d01e85	Move to newer chat models to extract questions & summarize chats Deprecate usage of the older gpt3 models in-place of the newer chat based models - text-davinci-003 is only 50% cheaper than gpt4 and less reliable for question extraction - Using gpt-3.50turbo for summarization should reduce cost of chat - Keep conversation.chat_session as a list instead of a string - Update completion_with_backoff func to use ChatML format	2023-07-07 17:32:27 -07:00
Debanjum Singh Solanky	171ce19e1f	Update date filter to allow quoting values in single quotes	2023-07-07 17:13:47 -07:00
Debanjum Singh Solanky	e588f7c528	Deprecate unused beta search and answer API endpoints	2023-07-07 16:38:07 -07:00
Debanjum Singh Solanky	c9fc4d1296	Revert to using cross-encoder to improve search results used by chat	2023-07-07 15:31:34 -07:00
Debanjum Singh Solanky	11f0a9f196	Fix chat tests since streaming. Pass args correctly to chat methods - Fix testing gpt converse method after it started streaming responses - Pass stop in model_kwargs dictionary and api key in openai_api_key parameter to chat completion methods. This should resolve the arg warning thrown by OpenAI module	2023-07-07 15:23:44 -07:00
Debanjum Singh Solanky	48870d9170	Fix parsing questions generated by extract_questions actor into list The previous json parsing was failing to handle questions with date filters Fix the chat actor tests to run without throwing error with freezegun complaining about importing transformers.local_llama model Remove quote escapes from date filter examples provided to extract_questions actor	2023-07-07 15:18:55 -07:00
Debanjum Singh Solanky	279662620b	Move results count to settings page on web. Use it for search & chat - Before Only the search interface had the results count configuration option - After - The results count is set on the settings page instead of the search page - Both search and chat can use the configured results count instead of just search	2023-07-07 14:08:08 -07:00
Debanjum Singh Solanky	2ec8da89e8	Remove Update button from Khoj Search page on the Web interface The settings page on the Khoj web interface already has a configure button. Don't need the Update button on the search page as well	2023-07-07 12:49:58 -07:00
Debanjum Singh Solanky	bf427cd8dd	Set no. of results used to generate chat response from Khoj Emacs	2023-07-07 12:34:50 -07:00
Debanjum Singh Solanky	1d77fe712c	Set no. of results used to generate chat response from Khoj Obsidian	2023-07-07 12:32:32 -07:00
Debanjum Singh Solanky	2f31de5ed5	Set no. of references to use for chat configurable in Chat API	2023-07-07 12:29:36 -07:00
Debanjum Singh Solanky	d97682fdac	Use tooltip, placeholders to guide Khoj setup via web settings page	2023-07-06 21:37:48 -07:00
Debanjum Singh Solanky	f5cf09424b	Use more descriptive field names for content type settings on Khoj web Resolves #281	2023-07-06 20:47:39 -07:00
Debanjum Singh Solanky	a2c668268f	Use node-fetch >=3.1.0 in khoj obsidian plugin to avoid security vulnerability	2023-07-06 13:05:52 -07:00
sabaimran	d688ddf92c	Re-instate the scheduler for the demo instances (#279 ) * For the demo instance, re-instate the scheduler, but infrequently for api updates - In constants, determine the cadence based on whether it's a demo instance or not - This allow us to collect telemetry again. This will also allow us to save the chat session * Conditionally skip updating the index altogether if it's a demo isntance	2023-07-06 11:01:32 -07:00
Debanjum Singh Solanky	8f36572a9b	Improve typing, null checks in controllers and gpt functions	2023-07-05 20:49:25 -07:00
Debanjum	6c2a8a5bce	⚡️ Stream Responses by Khoj Chat on Web, Obsidian - What - Stream chat responses from OpenAI API to Web, Obsidian clients - Implement using a callback function which manages a queue where new tokens can be placed as they come on. As the thread is read from, tokens are removed. - When the final token has been processed, add the `compiled_references` to the queue to be rendered by the `chat` client - When the thread has been closed, save the accumulated conversation log in the user's history using a `partial func` - Incrementally decode tokens on the front end and add them as they appear from the streamed response - Why This significantly reduces perceived latency and OpenAI API request timeouts for Chat Closes https://github.com/khoj-ai/khoj/issues/257	2023-07-05 20:02:11 -07:00
Debanjum Singh Solanky	e111eda6ae	Make client, app_config optional in telemetry logger for correct typing	2023-07-05 18:57:38 -07:00
Debanjum Singh Solanky	e562114f6b	Improve comments, var names in js for chat streaming on web interface	2023-07-05 18:57:27 -07:00
Debanjum Singh Solanky	46269ddfd3	Fix chat logging messages to get context without flooding logs	2023-07-05 18:27:06 -07:00
Debanjum Singh Solanky	0ba838b53a	Show temp status message in Khoj Obsidian chat while Khoj is thinking - Scroll to bottom after adding temporary status message and references too	2023-07-05 18:02:43 -07:00
Debanjum Singh Solanky	8271abe729	Use optional chaining operator to extract khojBannerSubmit from conditional	2023-07-05 18:02:43 -07:00
Debanjum Singh Solanky	c12ec1fd03	Show temp status message in Khoj web chat while Khoj is thinking - Scroll to bottom after adding temporary status message and references too	2023-07-05 18:02:30 -07:00
sabaimran	257a421e45	Bonus: add try-catch logic around telemetry upload in case of JSON serializability issues	2023-07-05 15:12:18 -07:00
sabaimran	4e6b66b139	Add support for streaming chat response from OpenAI to Obsidian - I needed to installed node-fetch to accomplish this, as the built-in request object from Obsidian doesn't seem to support streaming and the built-in fetch object is very sensitive to any and all cross origin requests	2023-07-05 15:01:22 -07:00
sabaimran	3ff5074cf5	Log the end-to-end time of generating a streamed response from OpenAI	2023-07-05 14:59:44 -07:00
sabaimran	68e635cc32	Remove additional comments and debug statements	2023-07-05 11:33:56 -07:00
sabaimran	67a8795b1f	Clean-up commented out code	2023-07-05 11:24:40 -07:00
sabaimran	79b1b1d350	Save streamed chat conversations via partial function passed to the ThreadGenerator	2023-07-04 17:33:52 -07:00
sabaimran	afd162de01	Add reference notes to result response from GPT when streaming is completed - NOTE: results are still not being saved to conversation history	2023-07-04 12:47:50 -07:00
sabaimran	8f491d72de	Initial code with chat streaming working (warning: messy code)	2023-07-04 10:14:39 -07:00
Debanjum Singh Solanky	5889eceba4	Make text selectable in Khoj chat modal on Obsidian Previously the text in the Khoj chat modal couldn't be copied as it was not selectable Resolves #206	2023-07-03 23:24:04 -07:00
sabaimran	89354def9b	Update request timeout window to 20 seconds	2023-07-03 22:28:18 -07:00
sabaimran	b1940519c3	Log error if unable to decode chunk from Github	2023-07-03 16:29:32 -07:00
Debanjum Singh Solanky	ecf9730cd7	Disable Chat, Search on Web if Khoj not configured & show next steps	2023-07-03 16:04:32 -07:00
sabaimran	017e8c1aef	Skip indexing a PDF that has an indexing error (#274 )	2023-07-03 15:55:11 -07:00
sabaimran	a6f313589e	Release Khoj version 0.7.1	2023-07-03 12:26:41 -07:00
sabaimran	8bfd5828e6	Remove deprecation notice since we're opening the web UI by default	2023-07-03 12:01:09 -07:00
sabaimran	92d81d3b16	Initialize the search.model field to SearchModels() and fix Reinitialize API call (#273 )	2023-07-03 11:32:44 -07:00
sabaimran	61403138d5	Merge pull request #269 from khoj-ai/features/simplify-configuration-steps Simplify some common configuration steps	2023-07-03 00:16:51 -07:00
sabaimran	ea3dc2cfa3	Simplify rendering of content type pages and logic of selecting config	2023-07-03 00:15:29 -07:00
sabaimran	260272dca2	Check if state.config is populated before configuring via the update method	2023-07-03 00:10:56 -07:00
sabaimran	bf8914d0c8	Fix default config initialization for for chat.html	2023-07-03 00:00:47 -07:00
Debanjum	faad1297f4	Drop Support for Org Music, Ledger Content Types Removing unused content types will reduce khoj code to manage - `0f993b3` Drop support for Ledger as a separate content type Khoj will soon get a generic text indexing content type in Index plain text files #237. This along with a file filter should suffice for searching through Ledger transactions - `c9db532` Remove unused org-music as an indexable content type from Khoj Org-music was just a custom content type that worked with org-music. It was mostly only useful for me.	2023-07-02 17:48:29 -07:00
Debanjum Singh Solanky	0f993b332e	Drop support for Ledger as a separate content type Khoj will soon get a generic text indexing content type. This along with a file filter should suffice for searching through Ledger transactions, if required. Having a specific content type for niche use-case like ledger isn't useful. Removing unused content types will reduce khoj code to manage.	2023-07-02 16:57:49 -07:00
sabaimran	fa218ff5aa	Fix call to update for Reinitialize button	2023-07-02 16:31:30 -07:00
sabaimran	a8b83da872	Merge branch 'master' of github.com:debanjum/khoj into features/simplify-configuration-steps	2023-07-02 16:21:54 -07:00
Debanjum Singh Solanky	c9db5321e7	Remove unused org-music as an indexable content type from Khoj Org-music was just a custom content type that worked with org-music. It was mostly only useful for me. Cleaning up that code will reduce number of content types for khoj to manage.	2023-07-02 16:21:21 -07:00
sabaimran	b86a3bb0c5	Merge branch 'master' of github.com:debanjum/khoj into fix/obsidian-setup-issues	2023-07-02 16:21:05 -07:00
sabaimran	a52c1c8380	Use built-in app.vault to determine whether there are any PDF files within	2023-07-02 16:20:43 -07:00
sabaimran	eff1436857	Overwrite existing PDFs in Obsidian as well, make if-block more legible	2023-07-02 16:17:25 -07:00
Debanjum Singh Solanky	30459ee4ba	Fix Khoj subtitle in desktop entry, pyproject, cli and Obsidian Readme	2023-07-02 16:09:07 -07:00
sabaimran	1a1b044d12	Simplify settings pages for configuration - Add one-click disablement - Remove fields that probably don't need to be edited (our implementation details) - Add a green tick if a given field is configured	2023-07-02 16:04:05 -07:00
sabaimran	e4c445f805	Add try-except-finally blocks around configure calls in /update	2023-07-02 13:35:02 -07:00
sabaimran	4b02a8c788	Fix PDF setup in Obsidian plugin and force Obsidian configuration for markdown	2023-07-02 12:37:24 -07:00
sabaimran	2a7e4f2b71	Escape special characters in the URL when adding a link to the remote file	2023-07-02 09:13:28 -07:00
sabaimran	c747562897	Update the GUI to just be a simple box with a button for the web UI	2023-07-01 20:37:21 -07:00
sabaimran	bab7f39d47	Move logic to open the web browser into the GUI section	2023-07-01 20:11:27 -07:00
sabaimran	36537606da	Update unit test and preserve prior operational ordering in main.py	2023-07-01 20:02:35 -07:00
sabaimran	ea9ae4ae28	Configure Khoj to automatically open the browser to their web home page when Khoj is up	2023-07-01 19:46:31 -07:00
sabaimran	d2083dd395	Remove bespoke processing for GithubToJsonl file demo	2023-07-01 19:09:22 -07:00
sabaimran	a71440f62a	Update the guidance in the error message if config is not set	2023-07-01 19:09:00 -07:00
sabaimran	7db97d8aa9	Fix: don't try to render the search_type.ALL	2023-07-01 19:08:19 -07:00
sabaimran	f0f6390366	Make --no-gui the default behavior of Khoj and update corresponding documentation	2023-07-01 19:07:59 -07:00
Debanjum Singh Solanky	d77e05c279	Release Khoj version 0.7.0	2023-07-01 05:44:22 -07:00
Debanjum Singh Solanky	30d87a9a01	Update color of Khoj chat in Obsidinan plugin to Lantern theme	2023-07-01 02:18:47 -07:00
Debanjum Singh Solanky	51826d28d6	Ensure clicking Update in Khoj Obsidian indexes PDF files too	2023-07-01 02:18:47 -07:00
sabaimran	dac2d14380	Handle file names appropriately for md files and render commits in github results	2023-07-01 01:20:58 -07:00
sabaimran	dbe713604d	Fix error in tests for markdown_to_jsonl	2023-07-01 00:49:40 -07:00
sabaimran	931aab4464	Handle case for when headers value is None	2023-07-01 00:37:30 -07:00
sabaimran	d01afb3ee4	Fix path issues for URL-based markdown files	2023-07-01 00:25:11 -07:00
sabaimran	31655447e7	Add the sign-up list to the chat page as well and update copy	2023-06-30 21:43:01 -07:00
sabaimran	796102c74e	Add separate configuration if the given Khoj instance is meant for demo - In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network) - Prevent re-indexing for Github data if this is a demo instance - Fix up some issues with the CSS which made settings page small in mobile - In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page	2023-06-30 20:38:55 -07:00
sabaimran	db3026739d	Resolve diffs in api.py to make /chat endpoint async with new request parameter	2023-06-30 00:25:37 -07:00
sabaimran	ef72508914	Try/catch around github file decoding, await call to search in chat API, fix img width	2023-06-30 00:23:21 -07:00
Debanjum Singh Solanky	b950889f47	Fix org-mode web renderer to handle results containing list in block - Break out of rendering list if at end of org block in org.js - This would previous hang rendering results in web interface Should try fix this upstream in org.js as well	2023-06-29 19:01:25 -07:00
sabaimran	780c769567	Add additional request headers to improve telemetry	2023-06-29 18:51:24 -07:00
sabaimran	6c10d68262	Merge pull request #253 from khoj-ai/features/github-issues-indexing Support indexing Github issues as well as corresponding comments	2023-06-29 16:02:47 -07:00
sabaimran	b2dd946c6d	Rename issue to entry method for accuracy	2023-06-29 15:23:50 -07:00
Debanjum Singh Solanky	51dfa48e2b	Have Khoj support Python 3.11 as Pytorch supports it now - Previously Khoj could only support Python upto 3.10 due to pytorch. But lots of folks had python 3.11 installed by default on their machines. This required installing python 3.10 and dealing with virtual envs. With Torch >= 2.0.1 now able to support python 3.11, at least one class of installation troubles for Khoj should drop. See https://github.com/pytorch/pytorch/issues/86566 for reference - Preliminary testing indicates using the new torch 2.x may reduce search time by 25% (from 80ms to 60ms on Mac M1) - Update Docs to not require mentioning python <=3.10 required - Update Github test workflow to run khoj tests with python 3.11 too	2023-06-29 15:13:26 -07:00
sabaimran	65bf894302	Interpret org files as a list and put them in separate divs. Update styling of search results to separate into cards	2023-06-29 15:12:48 -07:00
Debanjum Singh Solanky	d212298573	Make Configure button on web interface incrementally update by default We should add a way to force index everything. But force indexing should not be the default when user is just trying update content to index	2023-06-29 14:52:51 -07:00
Debanjum Singh Solanky	da2de21339	Only return requested result count even if search in multiple content types - Set results_count to default value at start so it is an int, never None	2023-06-29 14:49:05 -07:00
sabaimran	77672ac0ae	Demarcate different results with a border box - Add back support for searching by type Github - Remove custom class name in markdown js file	2023-06-29 14:14:25 -07:00
sabaimran	6edc32f2f4	Accept current changes to include issues in rendering flow	2023-06-29 12:25:29 -07:00
sabaimran	ab7dabe74f	Explicitly use Union type for function parameters for lint checks	2023-06-29 11:44:30 -07:00
sabaimran	fecf6700d2	Limit small image rendering to just the avatar images	2023-06-29 11:27:18 -07:00
sabaimran	70e550250a	Add an additional data source for issues from Github repositories + quality of life updates - Use a request session to reduce the overhead of setting up a new connection with the Github URL each request - Use the streaming feature for the REST api to reduce some of the memory footprint	2023-06-29 10:59:54 -07:00
Debanjum Singh Solanky	5f2717cc4b	Use logger.warning since logger.warn is deprecated	2023-06-28 22:15:27 -07:00
Debanjum Singh Solanky	56ce97ef9e	Use async/await in tests for query method of text and image search The text, image search query method has become async. So async/await is required to get results correctly in tests etc	2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky	b1767f93d6	Get any configured asymmetric search model to encode query for search - Set image_search.query to async to use it with multi-threading This is same as text_search.query being set to an async method - Exit search early if no search_model is defined in state.model	2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky	8eae7c898c	Put each result under org heading when query for "all" content type in khoj.el - Add "all" as default content type when no content type retrieved from server	2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky	630bf995f1	Style each result based on its content type in same view on Khoj web - So when searching across content types (with content-type = "all") org-mode results get rendered differently than markdown, PDF etc. results - Set div class for each result separately instead of a single uber div for styling. This allows styling div of each result based on the content-type of that result - No need to create placeholder "all" content type on web interface as server is passing an all content type by itself	2023-06-28 22:07:01 -07:00
Debanjum Singh Solanky	1773a78339	Fix createRequestUrl method signature to fetch results from khoj web	2023-06-28 12:10:45 -07:00
Debanjum Singh Solanky	212b1a96c8	Create "all" search type for search across all content types on khoj server Allows moving logic to handle search across all content types to server from clients	2023-06-28 11:34:26 -07:00
Debanjum Singh Solanky	0636ceaf14	Merge branch 'master' of github.com:khoj-ai/khoj into parallelize-search-across-all-asymmetric-text-content-types Conflicts: - src/khoj/routers/api.py: Use theirs	2023-06-27 16:10:32 -07:00
Debanjum Singh Solanky	510bb7e684	Use typing union in text_search for python 3.8 compatible type hinting	2023-06-27 15:59:50 -07:00
Debanjum Singh Solanky	1b11d5723d	Extract search request URL builder into js function in web interface	2023-06-27 15:50:41 -07:00
Debanjum Singh Solanky	09f739b8cc	Null check config, log warning instead of error when configuring search	2023-06-27 15:48:48 -07:00
sabaimran	9d62d66a77	Simplify construction of repo shorthand in GithubToJsonl	2023-06-27 15:05:03 -07:00
sabaimran	227169ebde	Support configuration of multiple Github repositories in the settings interface - Add cards to configure each of the Github repositories - Fix a bug in the API which caused all other settings to be wiped when updating one of the content types - Provide an error message to the user if they have a misconfiguration in their chat settings	2023-06-27 14:10:09 -07:00
sabaimran	37a1f15c38	Add backend support for indexing multiple repositories - Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view - Support indexing a list of repositories	2023-06-27 12:06:15 -07:00
sabaimran	ddd550e6f4	Add call to use X-CSRFToken in relevant POST methods	2023-06-26 12:38:00 -07:00
sabaimran	35e24d7851	Fix null checking in state for content config API and telemetry API	2023-06-26 11:37:34 -07:00
sabaimran	5e39421f56	Merge branch 'master' of github.com:debanjum/khoj	2023-06-25 11:41:47 -07:00
sabaimran	4410a3bb4b	Limit max width of the pre tag to 100% of the screen width	2023-06-25 11:41:15 -07:00
sabaimran	ffe66b848a	Use a single column tempalte for config plugins when in mobile	2023-06-25 11:27:41 -07:00
Debanjum Singh Solanky	b1890aa050	Null check intermediary objects when config not fully initialized	2023-06-24 15:34:18 -07:00
Debanjum Singh Solanky	946af0889d	Improve showing status message on saving config via web interface - Show success/failure status message much closer to the save button Previously status message was shown on top of the page, which wasn't always in view and wasn't easily seen - Improve the status message to more clearly show next steps on success	2023-06-24 00:49:57 -07:00
Debanjum Singh Solanky	40d1abfe50	Update the new /config APIs to configure Khoj for first time users - Setup state.config and sub-components from unset state - Setup search types with default settings	2023-06-24 00:45:30 -07:00
Debanjum Singh Solanky	edabede93a	Fix post configuration state update on error or success on config html	2023-06-23 14:52:25 -07:00
Debanjum Singh Solanky	4744d69221	Resolve button name, anchor tag feedback. Add status message to settings page - Use "Configure" name for settings config action - Use more standard anchor tag instead of button - Add configure status message	2023-06-23 09:48:38 -07:00
Debanjum Singh Solanky	26abafa658	Highlight currently active tab in web interface for orientation	2023-06-22 00:33:28 -07:00
Debanjum Singh Solanky	2728c714d7	Put pico.css in local assets. Move common css styling into khoj.css	2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky	20a37697de	Add Khoj header with navigation pane to Search and Chat Interfaces	2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky	c467a0cbb0	Update UI of config sub pages to use khoj lantern theme styling	2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky	0ce2ec590a	Update main config page on khoj server to match khoj lantern theme	2023-06-21 20:25:25 -07:00
Debanjum Singh Solanky	d30a9ddd33	Use Khoj Logo on Search, Chat pages of Web Interface	2023-06-21 12:34:53 -07:00
Debanjum Singh Solanky	6d4aad57e1	Use new Khoj Lantern Logo in Web, Emacs, Obsidian UIs and Docs	2023-06-21 01:57:22 -07:00
Debanjum Singh Solanky	69d4fa6525	Rename project links across repo from debanjum/khoj to khoj-ai/khoj	2023-06-21 00:13:21 -07:00
Debanjum Singh Solanky	5c4eb950d5	Search across all content types via khoj.el on Emacs If no content-type selected in transient menu option, khoj.el queries khoj server without content-type parameter (t) set. This results in search across all enabled asymmetric search text content types	2023-06-20 23:39:56 -07:00
Debanjum Singh Solanky	2cd3e799d3	Improve null and type checks	2023-06-20 23:30:59 -07:00
Debanjum Singh Solanky	d5fb4196de	Update web interface to allow querying all content types at once	2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky	5c7c8d1f46	Use async/await to fix parallelization of search across content types	2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky	1192e49307	Pass default value matching argument types expected by text_search methods	2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky	0144e610d6	Only search across content types that work with asymmetric search	2023-06-20 22:21:46 -07:00
Debanjum Singh Solanky	f6a7aa6c96	Style Khoj chat on web interface with new lantern theme - Color khoj chat message with new yellow theme color - Update Khoj chat emoji to lantern - Add page type to title of pages on web interface	2023-06-20 01:39:33 -07:00
Debanjum Singh Solanky	6d94d6e75a	Encode the asymmetric, symmetric search queries in parallel for speed Use timer to measure time to encode queries and total search time	2023-06-20 01:18:17 -07:00
Debanjum Singh Solanky	d292dc03b3	Use new Khoj Logotype in Web interface	2023-06-20 01:13:06 -07:00
Debanjum Singh Solanky	db07362ca3	Encode user query as same across search types to speed up query time - Add new filter abstract method to remove filter terms from query - Use the filter method to remove filter terms, encode this defiltered query and pass it to the query methods of each search types TODO: Encoding query is still taking 100-200 ms unlike before. Need to investigate why	2023-06-19 23:29:54 -07:00
Debanjum Singh Solanky	285d17af2a	Search in parallel across all enabled content types requested via API - Update API to return content from all enabled content types when type is not set to specific type in HTTP request param - To do this efficiently run the search queries in parallel threads	2023-06-19 23:29:06 -07:00
Debanjum Singh Solanky	79d325fbb6	Fix triggering @general queries in Khoj Chat	2023-06-19 23:05:33 -07:00
Debanjum Singh Solanky	e97a20d70c	Set conversation type if query param set, else return chat history Only initialize variables if query is not empty, to avoid unnecessary compute, variable null checks etc. Fixes #230	2023-06-19 19:59:16 -07:00
sabaimran	4722a2c16d	Add Github configuration page and success notifications	2023-06-18 10:06:45 -07:00
sabaimran	668135c763	Merge branch 'master' of github.com:debanjum/khoj into features/pretty-config-page	2023-06-18 08:35:09 -07:00
sabaimran	81183a1fe1	Address misc PR comments and update logo in all clients - Rename the new logo to reflect accuracy on size (e.g., 128x128) - Update the icns file for Mac - Update nomenclature in settings pages	2023-06-18 08:34:58 -07:00
Debanjum Singh Solanky	a44cde2865	Show hint to re-index vault if wonky results in Obsidian search modal Remove spurious indentation in Obsidian styles.css Resolves #207	2023-06-18 04:53:51 -07:00
Debanjum Singh Solanky	595cc5b0f5	Use printer icon for PDF logs. Only split lines if file at web link in web interface	2023-06-18 02:26:03 -07:00
Debanjum Singh Solanky	e31a540a5e	Get all md files recursively in repository by passing recursive param Previously the `get_markdown_files' method was only getting files at root of the repository Fix, improve logger messages in github to jsonl processor	2023-06-18 01:47:15 -07:00
Debanjum Singh Solanky	6fdac24416	Set page size to 100 to reduce requests required to Github API to 1/3 - Default is 30. So number of paginated requests required to get all items (commits, files) will reduce by 67% - No need to increase page size for the get tree Github API request from `get_markdown_files' Get tree Github API doesn't support pagination and return 100K items in response. This should be way more than enough for our current use-cases	2023-06-18 01:44:36 -07:00
Debanjum Singh Solanky	87975e589a	Fix passing auth token to Github API to increase rate limits by x85 - Previously wasn't prefixing "token" to PAT token in Auth header This resulted in the request being considered unauthenticated - Unauthenticated requests to Github API are limited to 60 requests/hour Authenticated requests to Github API are allowed 5000 requests/hour	2023-06-18 01:19:26 -07:00
Debanjum Singh Solanky	9c70af960c	Extract logic to get file content from Github into a separate method	2023-06-18 01:19:13 -07:00
Debanjum Singh Solanky	10d4c38ce9	Extract Wait for rate limit reset logic into a function for reuse	2023-06-18 01:06:46 -07:00
sabaimran	aad7f825e0	Remove music configuration	2023-06-17 21:23:56 -07:00
sabaimran	5f97afbfac	Ignore type checks from mypy in subindexed fields	2023-06-17 16:53:36 -07:00
sabaimran	c2d46de8bc	Add endpoint for regenerating directly from the config page and add music content-type	2023-06-17 15:47:33 -07:00
sabaimran	ded3100caf	Update the configuration page to make config management easier - Add a central configuration management page to make management of config details easier - Add relevant api endpoints both for client and server to update/request data as necessary - Attempt to update the favicon	2023-06-17 15:21:28 -07:00
Debanjum Singh Solanky	3f24e53b6e	Render URL as link in web interface if file param of result is a web link	2023-06-17 04:26:40 -07:00
Debanjum Singh Solanky	63ec84ad78	Store Github URL of Markdown files on Github in file jsonl param	2023-06-17 04:23:01 -07:00
Debanjum Singh Solanky	0c1c7583b5	Handle pagination, API rate limits. Get all commits from Github repo	2023-06-17 04:21:39 -07:00
Debanjum Singh Solanky	31d17d0b22	Index commits message from repository with the github plugin	2023-06-17 02:59:54 -07:00
Debanjum Singh Solanky	c29c141a7e	Use Github Rest API to index Markdown files in Github Repository The Llama_Hub Github plugin is fairly limited. The Github Rest API is well supported and can easily be extended to index commit messages, issues, discussions, PRs etc.	2023-06-17 02:16:13 -07:00
Saba	ac96f43b1b	Remove try-catch specific to Github plugin; consolidate GUI logic	2023-06-16 23:46:25 -07:00
Saba	019d3732de	Rename orgmode_search to org_search	2023-06-13 16:06:54 -07:00
Saba	08d79f5ba4	Unify types used in Github and other text-based configs. Fix typing issues	2023-06-13 15:52:36 -07:00
Saba	a6cd96a6a9	Add a Github plugin which can be used to read from a Github repository	2023-06-13 14:40:06 -07:00
Debanjum	c68cde4803	Log clients calling API endpoints on Khoj server - Make API endpoints on Khoj server accept `client` as request parameter - Khoj API endpoints: /chat, /search, /update - Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server - Khoj clients: Emacs, Obsidian and Web - Also log khoj server_version running to telemetry server	2023-06-09 18:36:49 +05:30
sabaimran	59fa48036f	Merge pull request #224 from debanjum/fix/message-exceeds-prompt-size Pass truncated message as string in ChatMessage when exceeding max prompt size	2023-06-08 17:32:53 -07:00
Debanjum Singh Solanky	139a3ba060	Update server to log new server version field to telemetry db	2023-06-08 14:14:21 +05:30
Saba	5d5ebcbf7c	Rename truncate messages method and update unit tests to simplify assertion logic	2023-06-06 23:25:43 -07:00
Saba	7119ed0849	Run pre-commit script	2023-06-05 19:29:23 -07:00
Saba	6212d7c2e8	Remove debug line	2023-06-05 19:00:25 -07:00
Saba	f65ff9815d	Move message truncation logic into a separate function. Add unit tests with factory boy.	2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky	eb6175e9b0	Update description field in webmanifest of Khoj, Khoj Chat PWA	2023-06-06 01:53:42 +05:30
Debanjum Singh Solanky	bb2363f324	Set client request param when calling khoj server APIs from Web	2023-06-06 00:05:00 +05:30
Debanjum Singh Solanky	caab55fbdd	Set client request param when calling khoj server APIs from Obsidian	2023-06-06 00:04:46 +05:30
Debanjum Singh Solanky	de2494154f	Set client request param when calling khoj server APIs from Emacs	2023-06-06 00:02:10 +05:30
Debanjum Singh Solanky	168c11cea7	Make server API endpoints accept client as query param - The chat, search and update API will accept client as request param. - This will allow logging the client from which these APIs was called.	2023-06-05 23:57:08 +05:30
Debanjum Singh Solanky	8617cf1389	Push telemetry to Posthog to grok Khoj usage	2023-06-05 22:47:49 +05:30
Debanjum Singh Solanky	d13db2e666	Make old telemetry server forward requests to new server	2023-06-05 13:06:45 +05:30
Saba	5f4223efb4	Increase timeout to OpenAI call	2023-06-04 20:49:47 -07:00
Saba	0e63a90377	Fix the mechanism to retrieve the message content	2023-06-04 20:25:37 -07:00
Saba	f0efe0177e	Pass truncated message as string in ChatMessage when exceeding max prompt size	2023-06-04 19:33:46 -07:00
Saba	068ee0ac5e	Swap elif with else, as usage of this method does not use openai_api_key	2023-06-04 02:25:08 -07:00
Saba	6508379d7b	Use api_key keyword argument to set the openai_api_key parameter for GPT	2023-06-04 00:57:00 -07:00
Debanjum Singh Solanky	7af8a56434	Remove filename from reference before rendering references in khoj.el Fixes bug where actual reference heading in next line jumping out of references footnote section	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	ec280067ef	Do not retrieve relevant notes when having a general chat with Khoj - This improves latency of @general chat by avoiding unnecessary compute - It also avoids passing references in API response when they haven't been used to generate the chat response. So interfaces don't have to add logic to not render them unnecessarily	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	90439a8db1	Update Khoj subtitle to AI personal assistant for your digital brain	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	e9ed7a19fd	Update search prompt to extract PDF search type. Fix extract_question prompt	2023-06-02 10:06:03 +05:30
Debanjum Singh Solanky	bbe3bf9733	Render PDF search results in Khoj Obsidian interface - Make plugin update khoj server config to index PDF files in vault too - Make Obsidian plugin update index for PDF files in vault too - Show PDF results in Khoj Search modal as well - Ensure combined results are sorted by score across both types - Jump to PDF file when select it PDF search result from modal	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	e3892945d4	Render PDF search results in Khoj.el Emacs interface	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	85144006a1	Render PDF search results in khoj web interface	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	acd14a5e41	Wire up PDF to jsonl processor to Khoj server layer (API, config) - Specify PDF content to index via khoj.yml - Index PDF content on app start, reconfigure - Expose PDF as a search type via API	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	286b500f66	Create PDF to JSONL processor using PyPDF and LangChain Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts throwing typing error for python 3.8, 3.9	2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky	1b3effd8e6	Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor	2023-06-01 09:13:31 +05:30
Debanjum Singh Solanky	1cd9ecd449	Truncate last message if still over max supported prompt size by model	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	ed4d0f9076	Simplify argument names used in khoj openai completion functions - Match argument names passed to khoj openai completion funcs with arguments passed to langchain calls to OpenAI - This simplifies the logic in the khoj openai completion funcs	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	703a7c89c0	Reduce retry count and request timeout for faster response or failure - Fix bug where both LangChain and Khoj retry requests 6 times each. So a total of 12 requests at >1minute intervals for each chat response in case of OpenAI API being down - Retrying too many times when the API is failing doesn't help - The earlier 60 second request timeout was spacing out the interval between retries way too much. This slowed down chat response times quite a bit when API was being flaky - With these updates you'll know if call to chat API failed in under a minute	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	18081b3bc6	Use LangChain to call GPT over API	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	277d2f5c96	Do not add "Notes:" suffix to chat messages when no notes retrieved This was causing spurious "Notes:" suffix being added to Khoj Chat in response	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	334be4e600	Use LangChain to call OpenAI for Khoj Chat - Use ChatModel and ChatOpenAI to call OpenAI chat model instead of using OpenAI package directly - This is being done as part of migration to rely on LangChain for creating agents and managing their state	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	efcf7d1508	Extract prompts as LangChain Prompt Templates into a separate module Improves code modularity, cleanliness. Reduces bloat in GPT.py module	2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky	b484953bb3	Import app state correctly to generate embeddings with OpenAI model Resolves #216	2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky	a0d0dbaca7	Fix link to Khoj Obsidian Demo video in Readmes	2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky	ebb5d7b8e5	Release Khoj version 0.6.2	2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky	d02415edcc	Write generated server id to env file when env file does not contain it	2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky	dc0626856e	Put the telemetry db in a separate directory by default	2023-05-17 18:58:47 +05:30
Debanjum Singh Solanky	e9f04dc644	Add dockerfile to containerize telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	07b19964d4	Schedule jobs at (co-)prime intervals to reduce overlap in job runs	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	d42f0f5055	Add basic telemetry server for khoj	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	134cce9d32	Batch upload telemetry data at regular interval instead of while querying	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	3ede919c66	Log usage of /search, /chat, /update API endpoints to telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	f2e89f6f46	Add khoj app helper methods to log app usage to a telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	9ca61d62ff	Enable/disable logging telemetry by setting bool in khoj.yml config We log usage telemetry by default, unless setting explicitly set in khoj.yml	2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky	131b8407b5	Allow Khoj Chat to respond to general queries not in reference notes - Khoj chat will now respond to general queries if: 1. no relevant reference notes available or 2. when explicitly induced by prefixing the chat message with "@general" - Previously Khoj Chat would a lot of times refuse to respond to general queries not answerable from reference notes or chat history - Make chat quality tests more robust - Add more equivalent chat response options refusing to answer - Force haiku writing to not give any preable, just the haiku	2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky	cc75f986b2	Test text search index only updates on changes to text content	2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky	f9ccce430e	Allow configuring OpenAI chat model for Khoj chat - Simplifies switching between different OpenAI chat models. E.g GPT4 - It was previously hard-coded to use gpt-3.5-turbo. Now it just defaults to using gpt-3.5-turbo, unless chat-model field under conversation processor updated in khoj.yml	2023-05-03 23:01:13 +08:00
Debanjum Singh Solanky	6b535cc345	Snip prepended heading to avoid crossing model max_token limits Otherwise if heading > max_tokens than the search models will just see a heading (with repeated filename) for each compiled entry and not actual content. 100 characters should be sufficient to include filename (not path) and entry heading. If longer rather truncate to pass entry unique text to model for search context	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	02aeee60aa	Set filename as top heading of org entries for better search context Previously filename was only being appended to markdown entries. Test filename getting prepended to compiled entry as heading	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	94825a70b9	Set heading of md entries to improve search context for long entries Otherwise if a markdown entry is longer than max_tokens, the split entries (apart from first one) do not get their heading context set	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	5de04621b5	Set filename as top heading of md entries for better search context Previously filename was appended to the end of the compiled entry. This didn't provide appropriate structured context Test filename getting prepended as heading to compiled entry	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	0e3fb59e09	Entries with no md headings should not get heading prefix prepended Files with no headings would previously get their entry be prefixed with a markdown heading prefix (#)	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	45a991d75c	Prepend entry heading to all compiled org snippets to improve search context All compiled snippets split by max tokens (apart from first) do not get the heading as context. This limits search context required to retrieve these continuation entries	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	3386cc92b5	Fix khoj server config update in khoj.el by unquoting list to cl-push to - cl-push expects a generatlized variable. Else throws (setf quote) undefined warning - This results in the config call failing on calling khoj entrypoint	2023-05-03 15:10:56 +08:00
Debanjum Singh Solanky	948a4274e4	Fix documentation strings and simplify not null checks	2023-05-02 21:47:50 +08:00
Debanjum Singh Solanky	731ef5688f	Use cl-pushnew to fix byte-compile errors with using add-to-list	2023-05-02 21:47:38 +08:00
Debanjum Singh Solanky	f046523b33	Improve khoj.el messages to convey state of khoj server - Remove waiting for server message as it hides the messages from the server - Fix the nil message that were being rendered, by checking before showing messages from server - Consistently prefix messages from khoj with khoj.el	2023-04-28 11:15:13 +08:00
Debanjum Singh Solanky	76df393eb5	Only call khoj server configure API from khoj.el when config updated Previously khoj.el was calling the server configure API even when config was same as before. This had broken the khoj search as you type experience from emacs Also show more details to user about what in khoj is being configured	2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky	ceae06ae9d	Fix khoj.el compilation warnings around unused variables	2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky	8269adf849	Refactor khoj-setup in khoj.el for readability. No functional change	2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky	865d12b6f2	Fix escaping quote in chat references to prevent it breaking out of html	2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky	26cb878327	Add Yarn lockfile for Khoj Obsidian	2023-04-18 00:57:11 +07:00
Debanjum Singh Solanky	e3180d63e6	Sync Khoj Obsidian Tagline with Khoj tagline	2023-04-18 00:56:50 +07:00
Debanjum Singh Solanky	62e6e09521	Release Khoj version 0.6.1	2023-04-17 23:31:35 +07:00
Debanjum Singh Solanky	b079fb31bc	Replace Windows path separators in indexName configured via Khoj Obsidian Resolves #185, #199 - Issue IndexName created from Obsidian Absolute Vault path wasn't replacing windows path, drive separators with underscore. It was only replacing unix path separators - Fix Also replace windows drive and path separators with _ while creating IndexName in Khoj Obsidian plugin	2023-04-17 16:55:33 +07:00
Debanjum Singh Solanky	d90df966a9	Make khoj logger use utf-8 encoding when writing to khoj log file Resolve logger error issue mentioned in #199	2023-04-17 16:55:07 +07:00
Debanjum Singh Solanky	dc3f399f91	Fix to get score associated with SearchResponse in result as string	2023-04-16 20:22:51 +07:00
Debanjum Singh Solanky	d5000c63e1	Update Readmes to use python -m pip install khoj-assistant Makes it easier to tell pip associated with which python is being used. Easier to debug when users have different versions of python installed (e.g 3.10 and 3.11)	2023-04-16 20:17:20 +07:00
Debanjum Singh Solanky	453c84ab79	Add Screenshots of Khoj Chat Interface on Emacs, Obsidian to Readmes	2023-04-07 23:19:47 +07:00
Debanjum Singh Solanky	35aa06067f	Release Khoj version 0.6.0 Upload styles.css via release workflow	2023-03-31 18:13:16 +07:00
Debanjum Singh Solanky	5673bd5b96	Keep original formatting in compiled text entry strings - Explicity split entry string by space during split by max_tokens - Prevent formatting of compiled entry from being lost - The formatting itself contains useful information No point in dropping the formatting unnecessarily, even if (say) the currrent search models don't account for it (yet)	2023-03-30 14:02:46 +07:00
Debanjum Singh Solanky	a2ab68a7a2	Include filename of markdown entries for search indexing Append originating filename to compiled string of each entry for better search quality by providing more context to model Update markdown_to_jsonl tests to ensure filename being added Resolves #142	2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky	67129964a7	Create Note with Query as title from within Khoj Search Modal This follows expected behavior for obsidain search modals E.g Ominsearch and default Obsidian search. The note creation code is borrowed from Omnisearch. Resolves #133	2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky	d3257cb24e	Style the search result. Use Obsidian theme colors and font-size Based on PR #135	2023-03-30 12:35:29 +07:00
Debanjum Singh Solanky	40091489c0	For each result: snip it by lines, show filename, remove frontmatter Based on PR #135 Resolves #134	2023-03-30 12:34:55 +07:00
Debanjum Singh Solanky	240db7b4f0	Add screenshot of Khoj chat on Obsidian to Readme. Fix links	2023-03-30 02:49:05 +07:00
Debanjum Singh Solanky	234be96e53	Fix processor key used to configure chat model in khoj obsidian	2023-03-30 01:47:09 +07:00
Debanjum Singh Solanky	c8c0cfd10e	Add Chat features, setup and usage to Khoj Obsidian plugin Readme	2023-03-30 00:32:24 +07:00
Debanjum Singh Solanky	7ecae224e7	Configure OpenAI API Key from the Khoj plugin setting in Obsidian	2023-03-29 23:54:08 +07:00
Debanjum Singh Solanky	3d616c8d65	Use Obsidian font sizes. Improve input field, reference indexing - Give space in the input field. Too narrow previously - References should be indexed from 1 instead of 0 - Use Obsidian font size variables to scale fonts in chat appropriately	2023-03-29 22:13:55 +07:00
Debanjum Singh Solanky	23bd737f6b	Use chat input element to send message on Enter. No send button required	2023-03-29 22:13:30 +07:00
Debanjum Singh Solanky	81e98c3079	Scroll to bottom of modal on open and message send	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	59ff1ae27f	Use obsidian theme colors for bg, text. Restrict css namespace via prefix	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	001ac7b5eb	Style Obsidian Chat Modal like Khoj Chat Web Interface - Add message sender, date metadata as message footer - Use css directly from Khoj Chat Web Interface. - Modify it to work under a Obsidian modal - So replace html, body styling from web interface to instead styling new "khoj-chat" class attached to contentEl of modal	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	112f388ada	Render references next to chat responses by khoj in chat modal	2023-03-28 18:11:03 +07:00
Debanjum Singh Solanky	1d3d949962	Render conversation logs on page load	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	cd46a17e5f	Add Khoj Chat Modal, Command in Khoj Obsidian to Chat using API	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	c0972e09e6	Rename KhojModal to KhojSearchModal, a more specific name for it In preparation to introduce Khoj chat in Obsidian	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	64fff1d372	Release Khoj version 0.5.0	2023-03-28 03:35:59 +07:00
Debanjum Singh Solanky	fc218508f9	Update khoj.el docs and Emacs Readme for chat, simplified setup	2023-03-27 22:02:47 +07:00
Debanjum Singh Solanky	83a7ccd729	Fix docstrings and method ordering in khoj.el	2023-03-27 18:33:09 +07:00
Debanjum Singh Solanky	5c2327ee4f	Configure org directories to index from khoj.el Converts paths to glob style regexes that will index all org files recursively under the specified list of path Should help setup for org-roam users from khoj.el	2023-03-27 18:30:53 +07:00
Debanjum Singh Solanky	6e8a40906d	Allow disabling automatic server setup. Fix server start vs ready logic - khoj-auto-setup controls whether to automatically check for and setup khoj server from within Emacs - extract install, start, configure sequence into public, interactive method. Allows calling khoj-setup during package load via init.el - Fix: Do not attempt to configure or wait for server ready if user has said no to auto-setup request - Fix logic to mark server started vs ready - Previously the started/running vs ready variables defs were getting intertwined - Server started indicates server bootup has been triggered - Server ready indicates server API ready to accept requests	2023-03-27 17:53:08 +07:00
Debanjum Singh Solanky	526a927bce	Fix org entry extraction test, variable prefixed with khoj in khoj.el Discovered via failing build and test workflows on Github	2023-03-27 16:44:50 +07:00
Debanjum Singh Solanky	7243059507	Track index update asynchronously via moon phase progressbar in khoj.el	2023-03-27 06:01:04 +07:00
Debanjum Singh Solanky	8a9055f918	Restrict server messages show in echo area to main server files	2023-03-27 04:59:55 +07:00
Debanjum Singh Solanky	ae535a06eb	Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs	2023-03-27 04:59:54 +07:00
Debanjum Singh Solanky	36b17d4ae0	Generalize the directory from config extraction elisp method	2023-03-27 03:44:03 +07:00
Debanjum Singh Solanky	924424c754	Throw actionable exceptions when content types or chat not configured	2023-03-27 02:47:44 +07:00
Debanjum Singh Solanky	359a2cacef	Fix khoj--server-running to work with unconfigured or external server - If khoj server started outside emacs, khoj--server-ready should be set to true by khoj--server-running method (instead of waiting for proc msg) - If khoj server is unconfigured the /config/types endpoint wouldn't return anything. Using config/data/default allows checking khoj server running status without requiring it to be configured as well	2023-03-27 02:45:59 +07:00
Debanjum Singh Solanky	d7fb9a596e	Auto configure server before loading khoj-menu If the config hasn't changed there'll be no update. If config has changed indexing will get triggered asynchronously. But user cannot make query till indexing done As easier to know when server ready to configure	2023-03-27 02:44:02 +07:00
Debanjum Singh Solanky	8a21aff438	Make khoj.el server start, stop, restart, setup methods interactive No need to erase temporary buffers before working on them	2023-03-27 01:53:15 +07:00
Debanjum Singh Solanky	cb40a96c85	Index configured org files from khoj.el - Set `khoj-org-files-index' to list of files to index - Defaults to indexing org-agenda-files - Uses khoj server api to configure org files to index	2023-03-27 01:05:26 +07:00
Debanjum Singh Solanky	50760acc37	Wait for Khoj server to get ready before opening khoj.el transient menu - Use process filter, sentinel to mark when khoj server is ready or not - Display server messages for visibility into server boot-up process - Wait until server ready to open khoj transient menu in Emacs Until then khoj features wouldn't work anyway, so avoids confusion	2023-03-26 13:00:01 +07:00
Debanjum Singh Solanky	82eb4bfd0d	Setup Khoj server on opening khoj from with Emacs - Create helper methods to check, stop, restart, setup khoj server - (Ask to) setup khoj server on calling khoj main entrypoint function	2023-03-26 10:12:06 +07:00
Debanjum Singh Solanky	99d19dcf43	Start Khoj server from Emacs using khoj.el	2023-03-26 09:38:46 +07:00
Debanjum Singh Solanky	c92d79118a	Install Khoj server from Emacs using khoj.el	2023-03-26 08:50:03 +07:00
Debanjum Singh Solanky	e281a498b4	Style Khoj search org buffer via elisp instead of in-buffer settings	2023-03-26 06:34:18 +07:00
Debanjum Singh Solanky	4f655d20ae	Style Khoj chat directly via elisp instead of via in-buffer settings	2023-03-26 06:03:30 +07:00
Debanjum Singh Solanky	f6ff7b1beb	Render foonote reference links as superscript for Khoj Chat on Emacs	2023-03-26 05:33:08 +07:00
Debanjum Singh Solanky	67c850a4ac	Add retry logic to OpenAI API queries to increase Chat tenacity - Move completion and chat_completion into helper methods under utils.py - Add retry with exponential backoff on OpenAI exceptions using tenacity package. This is officially suggested and used by other popular GPT based libraries	2023-03-26 05:12:35 +07:00
Debanjum Singh Solanky	ff846f05c5	Clean-up khoj.el based on linting helpers and manual review	2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky	7e36f421f9	Truncate message logs to below max supported prompt size by model - Use tiktoken to count tokens for chat models - Make conversation turns to add to prompt configurable via method argument to generate_chatml_messages_with_context method	2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky	4725416fbd	Use shortcut keybindings in buffer to ease sending messages to Khoj	2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky	508b2176b7	Update Chat API, Logs, Interfaces to store, use references as list - Remove the need to split by magic string in emacs and chat interfaces - Move compiling references into string as context for GPT to GPT layer - Update setup in tests to use new style of setting references - Name first argument to converse as more appropriate "references"	2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky	b08745b541	Keep chat messages at 1 empty line visible distance in khoj.el - Clean redundant concat, format string - Improve variable name to emojified sender	2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky	27217a330d	Time chat API sub-components for performance analysis Time and the search query extraction, search and response generation components	2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky	5e9558d39d	Stylize references shown as footnote links in chat messages - Render references as superscript - Show reference definitions on hover over reference links to ease access - Truncate reference def shown on hover to 70 char - Add continuation suffix, ..., when reference definition truncated	2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky	cf28f104c7	Register separate timestamps for user query and response by Khoj Chat	2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky	93e2aff786	Add references as org footnotes instead of links	2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky	d78454d4ad	Load Khoj Chat buffer before asking for query to provide context	2023-03-24 13:43:46 +07:00
Debanjum Singh Solanky	863933daaa	Resolve build issues found by melpazoid	2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky	e9ca04af0d	Require dash, org to run ERT tests for khoj.el	2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky	06df394d6c	Style chat messages as org-mode entries in Emacs - Style Message as Org Entries instead of List - Put khoj response as child of user query entry - Improves color coding for readability - Allows folding each back-n-forth - Put timestamp of message received into property drawer - Use standardized time format for new and old chat messages	2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky	364e6c11af	Render chat history from API in chat buffer on first run - Generalize the render-chat-response method to handle rendering history or chat response from chat API reponse - Trigger rendering of khoj chat history if Khoj chat buffer not created for this session yet	2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky	36b52fdd0a	Properly escape reference links before rendering - Use org-insert-link method to improve link rendering robustness Previous simple mechanism to crete org-links would result in links escaping out of formating. Use a user-facing org-mode method to remove/reduce probability of this - Replace newlines with space to render reference notes as links	2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky	72f63a6ef7	Add basic chat interface for Khoj on Emacs - Query khoj chat API to get Khoj Chat response to user message - Render chat messages as a org-mode list in format: - [sender-name]: [message] - /[receive-date]/ - Add references as org links with context visible on hover, but no jump to note - Require dash library for khoj.el to simplify list manipulation. Use `-map-indexed' method from dash	2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky	e4d67694e1	Add search to method, variable names meant for khoj search in khoj.el In preparation to introduce Khoj chat in Emacs	2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky	2f6284872d	Mention Khoj needs Python version 3.10 or lower in docs	2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky	601ff2541b	Revert to using GPT to extract search queries from users message - Reasons: - GPT can extract date aware search queries with date filters better than ChatGPT given the same prompt. - Need quality more than cost savings for now. - Need to figure ways to improve prompt for ChatGPT before using it	2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky	e28526bbc9	Extract search queries from users message using ChatGPT as Search Actor - Reasons - ChatGPT should be better at following instructions than GPT - At 1/10th the cost, it's much cheaper than using older GPT models	2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky	939d7731da	Fix-up Search Actor GPT's response for decoding it as valid JSON	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	f63fd0995e	Pass more search results as context to Chat Actor to improve inference	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	10836dedee	Search should return user message if GPT response is not valid JSON Previously would throw if GPT response is not valid JSON. Better to return original message to use for search instead	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	08f5fb315f	Add answers to context for Search Actor to generate relevant queries Update Search Actor prompt with answers, more precise primer and two more examples for context Mark the 3 chat quality tests using answer as context to generate queries as expected to pass. Verify that the 3 tests pass now, unlike before when the Search Actor did not have the answers for context	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	45cb510421	Loosen search results score thresold used by chat for more context	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	d871e04a81	Use past user messages, inferred questions as context to extract questions - Keep inferred questions in logs - Improve prompt to GPT to try use past questions as context - Pass past user message and inferred questions as context to help GPT extract complete questions - This should improve search results quality - Example Expected Inferred Questions from User Message using History: 1. "What is the name of Arun's daughter?" => "What is the name of Arun's daughter" 2. "Where does she study?" => => "Where does Arun's daughter study?" OR => "Where does Arun's daughter, Reena study?"	2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky	1a5d1130f4	Generate search queries from message to answer users chat questions The Search Actor allows for 1. Looking up multiple pieces of information from the notes E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches 2. Allow date aware user queries in Khoj chat Answer time range based questions Limit search to specified timeframe in question using date filter E.g "What national parks did I visit last year?" adds dt>="2022-01-01" dt<"2023-01-01" to Khoj search Note: Temperature set to 0. Message to search queries should be deterministic	2023-03-18 16:28:51 -06:00
Debanjum	e75e13d788	Create Tests to Measure Chat Quality, Capabilities Create Rubric to Test Chat Quality and Capabilities ### Issues - Previously the improvements in quality of Khoj Chat on changes was uncertain - Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities ### Fix 1. Create an Evaluation Dataset to assess Chat Capabilities - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋) - Add a few of Paul Graham's more personal essays. [Easy to get as markdown](https://github.com/ofou/graham-essays) 2. Write Unit Tests to Measure Chat Capabilities - Measure quality at 2 separate layers - Chat Actor: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py` - Chat Director: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes. - Mark desired but not currently available capabilities as expected to fail <br /> This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat	2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky	7526a50dd4	Extract conversation processor utility funcs from gpt.py into utils.py	2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky	24ddebf3ce	Make converse prompt more precise. Fix default arg vals in gpt methods - Set conversation_log arg default to dict - Increase default temperature to 0.2 for a little creativity in answering - Make GPT be more reliable in looking at past conversations for forming response	2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky	8609e3129e	Fix, improve displaying chat messages, sources by Khoj in web interface Pretty pretty json in conversation logs	2023-03-14 11:24:47 -06:00
Debanjum	6c0e82b2d6	Merge Improve Khoj Chat PR #183 from debanjum/improve-chat-interface # Improve Khoj Chat ## Main Changes - Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost - Improve Prompt to answer query using indexed notes - Previously was asking GPT to summarize the notes - Both the chat and answer API use this new prompt - Support Multi-Turn conversations - Pass previous messages and associated reference notes to ChatGPT for context - Show note snippets referenced to generate response - Allows fact-checking, getting details - Simplify chat interface by using only single unified chat type for now ## Miscellaneous - Replace summarize with answer API. Summarize via API not useful for now - Only pass Khoj search results above a threshold confidence to GPT for context - Allows Khoj to say don't know if it can't find answer to query from notes - Allows relying on (only) conversation history to generate response in multi-turn conversation - Move Chat API out of beta. Update Readme	2023-03-10 19:03:44 -06:00
Debanjum Singh Solanky	cccd225247	Deduplicate and simplify logic to render chat message with reference	2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky	b9caad458e	Type score_threshold with union, not \|, to support python <3.10	2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky	a71f168273	Move the chat API out of beta. Save chat sessions at 15min intervals	2023-03-10 17:20:52 -06:00
Debanjum Singh Solanky	8bb8824d0c	Bump khoj versions in obsidian, emacs files	2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky	e16d0b6d7e	Open references notes used for chat on mobile too (by clicking) Requires clicking the reference as hover doesn't work on mobile	2023-03-09 17:13:07 -06:00
Debanjum Singh Solanky	c3c7b8a951	Make Khoj chat a separate Progressive Web App (PWA) for easier access	2023-03-09 13:45:06 -06:00
Debanjum Singh Solanky	3838f9d8e3	Remove explicitly asking GPT to say I don't know in prompt for now GPT still mostly says I don't know when answer not in notes or chats But with this its more inclined to answer general questions not in chats or notes while informing user that the information is not from existing chats or notes	2023-03-09 12:11:44 -06:00
Debanjum Singh Solanky	f7b8cdd02e	Log prompts being passed to GPT for debugging	2023-03-08 19:17:52 -06:00
Debanjum Singh Solanky	2739a492b4	Log message metadata along with Khoj message instead of user message References should be attached to khoj chat messsage rather than the users message in the chat interface	2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky	87d1e1341d	Show reference notes used as response context in chat interface	2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky	280061e1fa	Do not deduplicate search results used for chat context - Chat uses compiled form of search results, not the raw entries to provide context for chat. The compiled snipped search results themselves are unique and using multiple of them for context from the same raw note is fine if they cross the score and rank thresholds This should improve the context provided for chat - Also apply score_threshold, no deduplication to the answers API	2023-03-06 23:51:31 -06:00
Debanjum Singh Solanky	672f61529e	Make getting deduped search results configurable via Search API	2023-03-06 23:48:46 -06:00
Debanjum Singh Solanky	4fb628975c	Fix jumping to note from Khoj Obsidian search modal result on Windows - Issue The file path separator by khoj server and the Obsidian vault were different on Windows - Fix Normalize file path to use forward slash(/) to find the matching note file in the Obsidian vault for jump to it Resolves #177	2023-03-05 21:07:54 -06:00
Debanjum Singh Solanky	b6cdc5c7cb	Do not expose answer API as a chat type in chat web interface or API Answer does not rely on past conversations, just the knowledge base. It is meant for one off interactions, like search rather than a continuing conversation like chat For now it is only exposed via API. Later it will be expose in the interfaces as well Remove ability to select different chat types from the chat web interface as there is only a single chat type Stop appending answers to the conversation logs	2023-03-05 18:21:59 -06:00
Debanjum Singh Solanky	7f994274bb	Support multi-turn conversations in chat mode - Only use decent quality search results, if any, as context - Pass source results used by previous chat messages as context - Loosen prompt to allow looking at previous chats and notes to answer - Pass current date for context - Make GPT provide reason when it can't answer the question. Gives user context to tune their questions	2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky	d73042426d	Support filtering for results above threshold score in search API	2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky	45f461d175	Keep search results passed to GPT as context in conversation logs This will be useful to 1. Show source references used to arrive at answer 2. Carry out multi-turn conversations	2023-03-05 16:00:19 -06:00
Debanjum Singh Solanky	7cad1c9428	Only use past chat message, not session summaries as chat context Passing only chat messages for current active, and summaries for past session isn't currently as useful	2023-03-05 16:00:18 -06:00
Debanjum Singh Solanky	ad1f1cf620	Improve and simplify Khoj Chat using ChatGPT - Set context by either including last 2 chat messages from active session or past 2 conversation summaries from conversation logs - Set personality in system message - Place personality system message before last completed back & forth This may stop ChatGPT forgetting its personality as conversation progresses given: - The conditioning based on system role messages is light - If system message is too far back in conversation history, the model may forget its personality conditioning - If system message at end of conversation, the model can think its the start of a new conversation - Inserting the system message before last completed back & forth should prevent ChatGPT from assuming its the start of a new conversation while not losing personality conditioning from the system message - Simplfy the Khoj Chat API to for now just answer from users notes instead of trying to infer other potential interaction types. - This is the default expected behavior from the feature anyway - Use the compiled text of the top 2 search results for context - Benefits of using ChatGPT - Better model - 1/10th the price - No hand rolled prompt required to make GPT provide more chatty, assistant type responses	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	9d42b5d60d	Use multiple compiled search results for more relevant context to GPT Increase temperature to allow GPT to collect answer across multiple notes	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	c3b624e351	Introduce improved answer API and prompt. Use by default in chat web interface - Improve GPT prompt - Make GPT answer users query based on provided notes instead of summarizing the provided notes - Make GPT be truthful using prompt and reduced temperature - Use Official OpenAI Q&A prompt from cookbook as starting reference - Replace summarize API with the improved answer API endpoint - Default to answer type in chat web interface. The chat type is not fit for default consumption yet	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	7184508784	Mention Python and Pip need to be installed in Main and Emacs Readme	2023-03-02 21:28:54 -06:00
Debanjum Singh Solanky	211e460398	Output date filter from cache log at debug level. Remove unused imports Other logs not directly useful to user have already been converted to debug log levels in `1ae4016`. Just forgot to convert this log line too	2023-03-02 15:41:32 -06:00
Debanjum Singh Solanky	b6dbe4dd1d	Do not try retrieve an unconfigured core content type in Config GUI Previous behavior was resulting in a null reference error. As key for the core content/search type was not present in current config Fallback to using default config for unconfigured core content type instead See #165 for details	2023-03-02 11:09:31 -06:00
Debanjum Singh Solanky	1ae40163a9	Show user friendly information logs by default for context - Use emojis to make info logs easier to read - Inform when khoj is ready to use - Provide information on what khoj is doing while starting up - Inform when content/search types and processors are setup - Inform when models are being loaded from the web as this step can take time - Convert all other info logs to be only shown in verbose mode	2023-03-01 16:39:07 -06:00
Debanjum Singh Solanky	fe03ba3dce	Index intro text before headings in org files - Text before headings was not being indexed due to buggy orgnode parsing logic - Resolved indexing intro text from files with and without headings in them - Ensure intro text node has heading set to all title lines collected from the file Resolves #165	2023-03-01 12:11:33 -06:00
Debanjum Singh Solanky	7ad251b8ef	Log and Continue on OSError while collating dates for date filters Log to understand if error, date can be handled better Mitigates #172	2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky	2bed4c3b50	Fix configuring search types & /config/types API when no plugin configured - Test /config/types API when no plugin configured, only plugin configured and no content configured scenarios - Do not throw null reference exception while configuring search types when no plugin configured - Do not throw null reference exception on calling /config/types API when no plugin configured Resolves bug introduced by #173	2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky	8914dbd073	Fix creating GUI panels for unconfigured search, processor types Repro: 1. Open khoj server with `khoj` on first run 2. Install/enable Khoj Obsidian plugin (to configure khoj server) 3. Restart khoj server with `khoj` Bug: - Unconfigured processor and search_types are instantiated as None in self.current_config - While creating the desktop GUI, these null configs are attempted to be accessed as valid dictionaries for creating their GUI panels - This results in the null ref errors Fix: Use default config to create their GUI elements for unconfigured search and processor types Resolves #167	2023-03-01 01:20:58 -06:00
Debanjum Singh Solanky	b09350c052	Fix to return only enabled content types via the new config/types API - Previously was return all core content types even if they had not been setup - Add test to validate only configured content types are returned by the api/config/types API endpoint	2023-02-28 22:08:26 -06:00
Debanjum Singh Solanky	b177adf3a7	Return value of search_type in /config/type API endpoint - Remove need for interfaces to downcase content types returned by API before using the type in search and other API endpoint - Fix to check for search_type.name in plugin keys instead of value	2023-02-28 21:49:26 -06:00
Debanjum Singh Solanky	88344f9ed2	Improve rendering search results of plugin content types on web interface Render only the entry from plugin search response instead of raw json Use the results-ledger styling for results-plugin styling	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	c2814fce58	Improve rendering search results of plugin content types in khoj.el Render only the entry from plugin search response instead of raw json	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	f3f24387ec	Use new config/types API to set enabled content types on web interface	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	1e43f1a12e	Use new config/types API to set enabled content types in khoj.el menu	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	9d38eadd42	Return enabled content types via api/config/types API endpoint Simplifies dynamically populating enabled content types for interfaces	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	68bd5d9ebc	Configure API routes after set up search types while configuring server Configure app routes after configuring server. Import API routers after search type is dynamically populated. Allow API to recognize the dynamically populated plugin search types as valid type query param. Enable searching for plugin type content.	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	d91c7e2761	Search for plugin content via the search API	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	47b58a2a4d	Configure, use dynamically instantiated SearchType enum on app start The SearchType is now dynamically populated with core and configured plugin types Use the new dynamic SearchType enum from state.py across codebase	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	ab0d3a08e2	Index configured plugins on app start and via update API endpoint	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	55a032e8c4	Add processor to index entries from jsonl files for plugins - Read, merge entries from input jsonl files and filters - Mark new, modified entries for update	2023-02-24 02:54:12 -06:00
Debanjum Singh Solanky	fcbbe8c759	Read content plugin configs from Khoj config YAML Configure external text content plugins via the Khoj YAML Reuse existing TextContentConfig definition for external text content plugins	2023-02-23 23:57:32 -06:00
Debanjum Singh Solanky	61b6ee2857	Use helper script to bump khoj pre-release versions	2023-02-17 20:31:51 -06:00
Debanjum Singh Solanky	053d6141f3	Ignore ts typing error, Fix SPDX license identifier in Obsidian plugin	2023-02-17 18:19:01 -06:00
Debanjum Singh Solanky	36be3c4b8f	Fix or ignore MyPy issues in PyQt desktop GUI code - Remove unneeded type ignore for mps with the latest mypy - Stop excluding PyQT desktop GUI code from MyPy checks - Do not warn about unused ignores. Some issue with mypy giving different errors in different environments (venv, system and pre-commit)	2023-02-17 16:13:05 -06:00
Debanjum Singh Solanky	051f0e3fb5	Add, configure and run pre-commit locally and in test workflow	2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky	5e83baab21	Use Black to format Khoj server code and tests	2023-02-17 11:55:17 -06:00
Debanjum Singh Solanky	8b293edd7c	Move mypy config into pyproject.toml. Ignore 2 remaining mypy issues	2023-02-16 03:33:08 -06:00
Debanjum Singh Solanky	c641eb4ad6	Improve rendering log and error stacktraces using the Rich package - Use Rich to render uvicorn, fastAPI logs as well The previous CustomFormatter only worked on khoj logs - Improve rendering stacktrace on errors using Rich	2023-02-15 16:19:32 -06:00
Debanjum Singh Solanky	bc7477ea3e	Move Emacs, Obsidian plugin code out from under src/khoj directory - What - The Emacs and Obsidian interfaces stay in their original directories under src/ - src/khoj now only contains code meant for pypi packaging - Benefits - This avoids having to update khoj MELPA, Obsidian plugin config as the Emacs, Obsidian code is under their original directories - It separates the code in src/khoj meant for python packaging from code for external interfaces like Emacs and Obsidian	2023-02-14 15:44:22 -06:00
Debanjum Singh Solanky	25a749ca1d	Use the src/ layout to fix packaging Khoj for PyPi - Why The khoj pypi packages should be installed in `khoj' directory. Previously it was being installed into `src' directory, which is a generic top level directory name that is discouraged from being used - Changes - move src/* to src/khoj/* - update `setup.py' to `find_packages' in `src' instead of project root - rename imports to form `from khoj.*' in complete project - update `constants.web_directory' path to use `khoj' directory - rename root logger to `khoj' in `main.py' - fix image_search tests to use the newly rename `khoj' logger - update config, docs, workflows to reference new path `src/khoj'	2023-02-14 15:19:06 -06:00
Debanjum	84322b2a45	Demo using Search in Khoj Obsidian Plugin	2023-02-14 08:43:50 -08:00
Debanjum Singh Solanky	a4dcb20622	Add setting to toggle auto configuring of khoj backend from Obsidian - By default the obsidian plugin automatically configures the khoj backend to index the current vault - For more complex scenarios, users can manage their ~/.khoj/khoj.yml manually by toggling the auto-configure setting off in the khoj plugin settings Resolves #156	2023-02-13 20:15:28 -06:00
Debanjum Singh Solanky	24aa696ef5	Indicate indexing active on Update button in Obsidian plugin settings Use moon rotating through phases to indicate notes indexing in progress Resolves #129	2023-02-13 19:28:19 -06:00
Debanjum Singh Solanky	11517ba8eb	Encode jsonl data as utf8 for gzip write for consistent read/write encoding Should help with issue #89	2023-02-12 17:33:23 -06:00
Debanjum Singh Solanky	3ec41c4d64	Wrap lines for org, markdown results in khoj search results buffer	2023-02-12 07:33:50 -06:00
Debanjum Singh Solanky	9a013ec48f	Add more details to setup Khoj backend in Obsidian plugin readme	2023-02-12 07:31:13 -06:00
Jason Axelson	6d5930363a	Fix obsidian plugins doc link Also make it more obvious where the link is going, initially I thought the link was to another official khoj documentation site.	2023-02-10 07:11:21 -10:00
Debanjum Singh Solanky	215235efd2	Bump khoj pre-release version	2023-02-08 20:24:36 -03:00
Debanjum Singh Solanky	2445664d40	Deprioritize searching for Music content over other text content	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	2e052913b6	Search in first configured content type when no search type set Instead of searching through all configured content types but only returning results of the last configured content type	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	a26ab31d20	Allow chat with markdown notes if no org-mode content configured	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	fbb7747dcc	Read Markdown file as utf8 instead of the default encoding used by OS - Background 1. Obsidian stores markdown notes as utf8[1] 2. By default, the python `open' command uses the OS locale encoding[2] This was causing the `UnicodeDecodeError: <locale_encoding> codec can't decode byte' error - Fix - Read markdown files as utf8 The Obsidian plugin is the main use-case for markdown files in khoj currently and that stores md files as utf8. Do not assume utf8 for other content types like org-mode, beancount for now. - Fail if error in reading file as utf8, instead of ignoring errors. Would rather have user realize that their files are not going to get indexed correctly. [1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3 [2]: https://docs.python.org/3/library/functions.html#open	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	66dca6cf33	Add Docs to Search across Languages, Uninstall Khoj to Readme Add details and fixes to Obsidian, Main readme based on feedback, confusion from the Obsidian plugin announcement	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	cba9a6a703	Use List, Tuple, Set from typing to support Python 3.8 for khoj Before Python 3.9, you can't directly use list, tuple, set etc for type hinting Resolves #130	2023-02-06 01:23:52 -03:00
Debanjum Singh Solanky	f26cee604d	Update Khoj Plugin Install Instructions. Rename main Readme to README Khoj plugin page from within Obsidian isn't recognized. Seems like it needs an uppercase readme file only. So it doesn't show the Khoj readme from within Obsidian itself.	2023-01-27 20:01:31 -03:00
Debanjum Singh Solanky	2e13e15625	Ensure markdown entries in khoj.el results separated by empty line - Update khoj.el test to reflect updated rendering logic - Move ledger render function before image rendered to group functions with similar logic closer	2023-01-26 19:13:02 -03:00
Debanjum Singh Solanky	85ae46f429	Use thread_last to make results rendering funcs more readable in khoj.el	2023-01-26 18:59:44 -03:00
Debanjum Singh Solanky	b415f87093	Split code in onChooseSuggestion method to make it more readable Split find file, jump to file code to make onChooseSuggestion more readable - Use find, instead of using return in forEach to get first match - Move the jump to file+heading code out from forEach	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	37063f6a38	Truncate query to 8k chars for find similar notes from obsidian plugin Truncate current file data passed to khoj backend API via query string below default query size supported by popular servers	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	4456cf5c8f	No need to use then or finally in async functions after an await	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	4070be637c	Pass app object from plugin instance to child objects and functions Do not reference global app object from child objects and funcs directly. It is only available for debugging purposes and access to it maybe dropped in the future.	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	c203c6a3fd	Use Sentence case for Find Similar Note command name in Khoj Obsidian	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	e18124ef6f	Add badge for tests and update project subtitle in khoj.el Readme	2023-01-23 20:52:03 -03:00
Debanjum Singh Solanky	86e808abfb	Test get-current-text helpers for Find Similar feature in khoj.el	2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky	be6acda212	Create khoj.el tests. Test rendering results of each content types	2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky	0d0bf3b5aa	Simplify get-current-text functions for Find Similar in khoj.el Use existing functions like `string-trim', `thing-at-point' and remove unneeded code from the two functions	2023-01-23 19:15:52 -03:00
Debanjum Singh Solanky	07e9e4ecc3	Get current paragraph text when point at start of paragraph in khoj.el Previously if cursor was at start of current paragraph, it would get text for the current and next paragraph, instead of just the current one	2023-01-23 18:05:54 -03:00
Debanjum Singh Solanky	a0b03c8bb1	Get current entry text when point at heading for Find Similar in khoj.el Previously if cursor was at heading of current entry, it would find entries similar to the previous outline heading, instead of the current one	2023-01-23 10:01:25 -03:00
Debanjum Singh Solanky	013c7c10a4	Bump khoj pre-release version	2023-01-22 18:45:56 -03:00
Debanjum Singh Solanky	ad3c9b5f44	Bump khoj version to 0.2.5 in preparation for release	2023-01-22 18:18:21 -03:00
Debanjum Singh Solanky	9ed056c7e7	Use consistent indentation in Khoj Emacs Readme	2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky	0980c6e87f	Update Emacs Usage section in Readme. Add find-similar, menu usage	2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky	6908b6eed3	Truncate image queries below max tokens length supported by ML model This would previously return the infamous tensor size mismatch error Verify this error is not raised since adding the query truncation logic	2023-01-21 14:11:00 -03:00
Debanjum Singh Solanky	3d9ed91e42	Search by image at path only if query of form "file:/path/to/image" Previously no query syntax helpers, like the "file:" prefix, were used before checking if query contains file path. This made query to image search brittle to misinterpretation and pointless checking Add test to verify search by image at file works as expected	2023-01-21 14:06:56 -03:00
Debanjum Singh Solanky	b7aa22a059	Change order of arg passed to query-api-and-render-results by importance	2023-01-20 22:13:24 -03:00
Debanjum Singh Solanky	936a88fa7e	Find items of specified type similar to current text item at point - Support querying with text surrounding point in any text buffer Previously could only find items similar to org entry at point - Find similar items of specified content type indexed on khoj Previously only looked for similar org entries indexed on khoj Now uses the content-type configured in khoj transient menu to find items of the specified content type - Details - Generalize the get-current-org-entry-text func to get text for any outline section - Replace leading whitespaces from query text as well - Create method to get current paragraph text from non-outline mode buffers - Update transient, find-similar funcs to pass, use content-type configured in khoj transient menu - Generalize query title creation logic to remove markdown headings prefix (#) apart from org heading prefix (*) as well - Update last used khoj content-type and results from the find-similar and update funcs for later reuse - Jump to top of results buffer after results rendered	2023-01-20 22:12:54 -03:00
Debanjum Singh Solanky	17aaadea1f	Find notes similar to current org entry at point	2023-01-20 05:14:54 -03:00
Debanjum Singh Solanky	44bbc0a417	Add section separators to khoj.el for easier code traversal	2023-01-19 23:36:54 -03:00
Debanjum Singh Solanky	48ad3c535e	Use default content types if fail to call backend on khoj.el load Do not want khoj.el to fail on init/load if khoj backend not running	2023-01-19 20:13:49 -03:00
Debanjum Singh Solanky	9f0bd0a361	Add Github workflow for khoj.el build and quality checks Add khoj.el build badge to khoj.el Readme	2023-01-19 20:13:19 -03:00
Debanjum Singh Solanky	0dd1cba272	Rename configuration sections in khoj.el transient menu	2023-01-19 03:03:08 -03:00
Debanjum Singh Solanky	5d0f369186	Add ability to quit khoj transient with standard q keybinding	2023-01-19 02:47:07 -03:00
Debanjum Singh Solanky	87c7cf4272	Use single khoj func as entrypoint. Group khoj.el code into sections - Give more relevant, specific name to khoj suffix commands - Remove `khoj-simple'. Have single `khoj' function for entrypoint	2023-01-19 02:38:19 -03:00
Debanjum Singh Solanky	9d64a009fd	Allow updating khoj content index from within khoj.el - Split transient config menu by type	2023-01-18 23:07:59 -03:00
Debanjum Singh Solanky	a8d0c7d905	Rename search type to more apt content type in khoj.el	2023-01-18 22:13:49 -03:00
Debanjum Singh Solanky	00daea16df	Allow setting default-search-type to image. Make docstrings compact	2023-01-18 22:01:17 -03:00
Debanjum Singh Solanky	216b17cfd0	Dynamically populate content type choices when khoj transient invoked	2023-01-18 22:00:56 -03:00
Debanjum Singh Solanky	5f446b1440	Convert main khoj.el entrypoint into transient menu for richer configuration	2023-01-18 21:50:07 -03:00
Debanjum Singh Solanky	5c07dcd219	Fix, update Obsidian Readme. Add Find Similar Notes to Implementation section	2023-01-18 00:22:26 -03:00
Debanjum	b7fc344be1	Search for Similar Notes from Obsidian Plugin Enable searching for notes similar to the current note being viewed ## Main Changes - `39a18e2` Extend search modal to search for similar notes - Hide input field on init, Trigger search on opening modal when in similar notes mode - Set input to contents of current markdown file and get notes similar to it - Re-rank, by default, when searching for similar notes - Filter out current note from similar note search results - `0bed410` Only show `Find Similar Note' command in Editor	2023-01-18 00:10:10 -03:00
Debanjum Singh Solanky	6119d0a69e	Add usage of "Find Similar Notes" command to the Khoj Obsidian Readme	2023-01-18 00:03:13 -03:00
Debanjum Singh Solanky	657e455785	Remove unused `onunload' method in main.ts of khoj obsidian plugin	2023-01-17 23:46:38 -03:00
Debanjum Singh Solanky	0bed410712	Limit Find Similar Note command to be triggered from Editor Fixup indentation and comments	2023-01-17 19:34:48 -03:00
Debanjum Singh Solanky	39a18e2080	Add ability to search for similar notes in Khoj Obsidian - Hide input field on init, Trigger search on opening modal in similar notes mode - Set input to current markdown file and get similar notes to it - Enable rerank when searching for similar notes - Filter out current note from similar note search results	2023-01-17 19:07:18 -03:00
Debanjum Singh Solanky	ffaef92476	Encode query string before passing as query param to search API	2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky	d5a7cc5b0f	Compact code to map results from search API into SearchResult objects Make code compact for readability Remove unneeded temporary variables and return statements	2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky	8ab7a26bde	Update Khoj on Obsidian screenshots in Main and Plugin Readme - Screenshot querying "Setup Editor" on test vault with Khoj Readmes - New features showcase: - information keybindings, rerank keybinding at bottom of modal - fixed top level headings in search results - search results snipped if greater than N words	2023-01-17 13:58:50 -03:00
Debanjum Singh Solanky	7b4f78776c	Fix extracting Markdown Entries with Top Level Headings - Previously top level headings would have get stripped of the space between heading text and the prefix # symbols. That is, `# Top Level Heading' would get converted to `#Top Level Heading' - This would mess up their rendering as a heading in search results - Add unit tests to text_to_jsonl processors to prevent regression	2023-01-17 13:06:28 -03:00
Debanjum Singh Solanky	1a296518c5	Limit total words for each Search Result rendered in search modal Provides a more consistent rendering of results in modal. Makes it easier to see more results in modal. To see complete entry, user can always just jump to entry from modal	2023-01-17 13:06:14 -03:00
Debanjum Singh Solanky	e7b89f7fd0	Return compiled entry in additional details of /api/search response This can be used to highlight portion of raw entry to highlight and for passing to summarizer to stay with max_tokens limit supported by GPT models	2023-01-16 22:56:06 -03:00
Debanjum Singh Solanky	7071d081e9	Increase max_tokens returned by GPT summarizer. Remove default params	2023-01-16 22:55:36 -03:00
Debanjum Singh Solanky	3d9cdadbbb	Add codebase visualization of Khoj Obsidian to Khoj Obsidian Readme	2023-01-15 14:09:21 -03:00
Debanjum Singh Solanky	d02ba325aa	Handle empty chat history returned by API to chat.html on web interface	2023-01-15 13:51:16 -03:00
Debanjum	3f2ea039a7	Add Chat page to the Khoj Web Interface ### Overview - Provide a chat interface to engage with and inquire your notes - Simplify interacting with the beta `chat` and `summarize` APIs ### Use - Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize - Type your queries, see summarized response by Khoj from your notes Note: - You will need to add an API key from OpenAI to your khoj.yml - Your query and top note from search result will be sent to OpenAI for processing ## Details - `177756b` Show chat history on loading chat page on web interface - `d8ee0f0` Save chat history to disk for persistence, seeing chat logs - `5294693` Style chat messages as speech bubbles - `d170747` Add khoj web interface and chat styling to new chat page on khoj web - `de6c146` Implement functional, unstyled chat page for khoj web interface	2023-01-13 23:02:19 -03:00
Debanjum Singh Solanky	16d4560ff8	Comment css styling of chat page for later reference	2023-01-13 22:40:01 -03:00
Debanjum Singh Solanky	cfef346d03	Do not update query field to ever chat message It doesn't work as well with chat, unlike for search page Use more appropriate thinking face emoji for you instead of surprise face	2023-01-13 22:24:26 -03:00
Debanjum Singh Solanky	177756be7e	Fetch chat history from backend and render it on chat page load	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	330febaa1a	Update conversation logs from /beta/summary API endpoint too	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	cb6f0b53c9	Make user_message_metadata arg to message_to_log in gpt.py optional - Use a default user_message_metadata if arg not set - Update conversation to use `by' as `you' and `khoj'	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	cc2456e411	Update /beta/chat API to return chat history if no query param passed	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	d8ee0f0e9a	Use scheduler to save chat history to disk every 5 minutes - The previous mechanism to trigger saving on shutdown event did not work - Use scheduler to persist chat sessions to disk at a 5 minute interval - This improve time granularity, fixed interval of saving chat logs - It may lose ~5 minutes of chat history until mechanism to also write on shutdown found/resolved - Create conversation directory if it doesn't exist before attempting write - Reset chat_session after writing it to disk	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	5294693e97	Style message as speech bubbles on chat page of web interface - Wrap messages into speech bubbles - Color messages by khoj blue, sender grey - Add those standard protrusions to the speech bubbles for fun - Align bubbles left or right based on sender - messages by khoj are left aligned, message by self are right aligned - Put message metadata like sender and time under speech bubble - use data-* attribute and ::after css pseudo-selector for this - Update renderMessage func to accept time param, remove unused type_ param	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	7723d656dc	Do not force GPT to summarize note using past tense Not all notes are in the past. Notes can be about stuff in the future. Casting them to past tense gives the impression that they've already happened / been done.	2023-01-13 13:10:35 -03:00
Debanjum Singh Solanky	2842e3a035	Automatically scroll to bottom of chat body on new messages	2023-01-13 13:09:51 -03:00
Debanjum Singh Solanky	34014635d0	Improve colors, fix contrast for accessability on web interface - Changes - Use blue color for khoj heading font - This fixes the title color issue - Update background to lighter shade - This fixes the body text color issue - Update colors for todo, done, miscellaneous todo state, tag color - This does not fix the color contrast issue but seems like an acceptable solution - Using white text rather than black text on blue background better even though the black text on blue background passes the WCAG acceptable contrast score - For details see blog post: https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/ - Add border to tags to give them tag pills look and differntiate from todo states - Buttons and inputs - Change background color of input fields like type dropdown, update button and results count counter, to match background color of page - Add shadow on hover over button, dropdowns Resolves #111	2023-01-12 21:59:50 -03:00
Debanjum Singh Solanky	d170747ec2	Add khoj web interface & chat styling to new chat page on khoj web - Ensure message input box sticks to bottom of screen - Ensure chat logs div is scrollable when logs become longer than screen Do not make the whole page scroll, just the chat logs body div	2023-01-12 21:58:46 -03:00
Debanjum Singh Solanky	de6c146290	Implement functional, unstyled chat page for khoj web interface Expose it at /chat URL	2023-01-12 21:53:25 -03:00
Debanjum Singh Solanky	e6793816f9	Upgrade Khoj.el Readme. Add TOC, Screenshot, Features Sections - Update Query filter details	2023-01-12 02:14:02 -03:00
Debanjum Singh Solanky	26f791e9ad	Update Obsidian Plugin Readme. Add Khoj icon to Khoj Modal Placeholder text - Fold Query Filter, Demo Description - Add Limitations to Readme - Add Update index bullet to Troubleshooting Options	2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky	3e63af5c94	Constrain grid rows to fix layout of Khoj web interface on Chrome	2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky	50c797962c	Jump to Search Result from Khoj Modal even on Obsidian Android Uses longest file path match to find markdown file in vault corresponding to file of search result returned by Khoj Allow jumping to search result from khoj plugin modal on Android too	2023-01-11 19:44:11 -03:00
Debanjum Singh Solanky	51ea6d9c9b	Do not force index update when configure backend on plugin load - Backend can handle incremental updates - Avoid khoj usability delay by avoiding recomputed everytime vault opened	2023-01-11 17:17:08 -03:00
Debanjum Singh Solanky	5996d47d7c	Trigger input event to Get, Render Reranked results from Khoj backend Previous mechanism of manually triggering getSuggestions, renderSuggestions flow was corrupting traversing and opening reranked search results in KhojModal Emulate event that would anyway trigger the get & render of results in modal. This lets obsidian core handle the flow without digging too deep into obsidian cores handling of the flow. Lowers the chance of breakage	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	1c813a6884	Convert results count setting to slider in plugin settings pane	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	4e1abd1b72	Disable update button while indexing vault in plugin settings	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	513c86c6a1	Set index file paths relative to current or default path on khoj backend We need the index file paths to make sense on the khoj backend server Having path of index on backend relative to current vault directory on frontend ignores the fact that the frontend maybe on a different machine than the khoj backend server Using unique index name per vault allows switching vaults without overwriting indices of other vaults created on khoj backend when khoj obsidian plugin is loaded on opening a different vault	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	4407e23c19	Only index current vault on Khoj. Remove plugin setting to configure it - Overview Limits using Khoj with a single vault at a time. This is automatically configured to the most recently opened vault. Once directory filters are supported on backend, the plugin will be updated to index multiple vault but search only current vault from current vaults khoj obsidian plugin - Code Details - Remove setting to configure Vault directory from Khoj Obsidian plugin - Automatically configure Khoj to index only current Vault. - Overwrites any previous vaults that were intended to be indexed by Khoj backend - Force update of index after configuring vault - Why It's not helpful for now and can lead to more problems, confusion. Once directory filters	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	86a1e43605	Return HTTP Exception on /api/update API call failure - Previously the backend was just throwing backend error. The frontend calling the /update API wasn't getting notified - Now the frontend can react appropriately and make the issue visible to the user	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	5af2b68e2b	Update plugin notifications for errors and success - Only show notification on plugin load and failure. - In settings page, set current backend status at top of pane instead of showing notification Notices bubbles cluttered the UI while typing updates to settings - Show notification once index updated via settings pane button click There was no notification on index updated, which usually takes time on the backend	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	853192932a	setCTA on Khoj Obsidian plugin button. Minor cleanup of space, tabs	2023-01-10 23:36:02 -03:00
Debanjum Singh Solanky	da49ea272c	Add placeholder text to modal in Khoj Obsidian plugin	2023-01-10 22:50:11 -03:00
Debanjum Singh Solanky	580f4aca23	Add hints to Modal for available Keybindings	2023-01-10 22:03:47 -03:00
Debanjum Singh Solanky	b52cd85c76	Allow Reranking Results using Keybinding from Khoj Search Modal	2023-01-10 21:59:38 -03:00
Debanjum Singh Solanky	7991ab7a86	Add button in Obsidian plugin settings to force re-indexing your vault	2023-01-10 19:49:12 -03:00
Debanjum Singh Solanky	f046a95f3d	Track connectedToBackend as a setting. Use it across obsidian plugin - Display warning at top of khoj obsidian plugin settings - Make search command available only if connected to backend - Show warning notice on clicking khoj search ribbon button - Call saveData after configureKhojBackend to ensure connnectedToBackend setting saved after being (potentially) updated in configureKhojBackend function	2023-01-10 17:28:47 -03:00
Debanjum Singh Solanky	768e874185	Load obsidian plugin even if fail to connect to backend but show warning - Previously the plugin would not load if cannot connect to Khoj backend - Silently failing to load with no reason provided is not helpful - Load plugin to allow user to fix the Khoj URL in their plugin setting - Show reason for khoj plugin not working. More helpful than failing silently	2023-01-10 17:20:02 -03:00
Debanjum Singh Solanky	aa22d83172	Create and use a context manager to time code Use the timer context manager in all places where code was being timed - Benefits - Deduplicate timing code scattered across codebase. - Provides single place to manage perf timing code - Use consistent timing log patterns	2023-01-09 19:48:16 -03:00
Debanjum Singh Solanky	93f39dbd43	Add typing to text_search. Reformat code to set existing_embedding	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	db7483329c	Only import type hint packages for type checking. Avoids circular imports Use annotations from the __future__ package to avoid having to quote type hints. This import will not be required after Python 3.11	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	e5254a8e56	Create BaseEncoder class. Make OpenAI encoder its child. Use for typing - Set type of all bi_encoders to BaseEncoder - Make load_model return type Union of CrossEncoder and BaseEncoder	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	cf7400759b	Remove unused render_results method from text and image search It's a relic from when khoj was being used as a python module	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	afcfc3cd62	Split text_search.query logic into separate methods for modularity The query method had become too big. Extract out filter, score, sort and deduplicate logic used by text_search.query into separate methods. This should improve readabilty of code.	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	8dc6ee8b6c	Pass `model' arg to extract_search_type method from beta search API Issue caught by mypy	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	8498903641	Fix, add typing to Filter and TextSearchModel classes - Changes - Fix method signatures of BaseFilter subclasses. Else typing information isn't translating to them - Explicitly pass `entries: list[Entry]' as arg to `load' method - Fix type of `raw_entries' arg to `apply' method to list[Entry] from list[str] - Rename `raw_entries' arg to `apply' method to `entries' - Fix `raw_query' arg used in `apply' method of subclasses to `query' - Set type of entries, corpus_embeddings in TextSearchModel - Verification Ran `mypy --config-file .mypy.ini src' to verify typing	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	eace7c6215	Use torch.tensor as torch.Tensor cannot create tensor on MPS device - `torch.Tensor' is apparently a legacy tensor constructor - Using that to create tensor on MPS devices throws error: RuntimeError: legacy constructor expects device type: cpu but device type: mps was passed - `torch.tensor' can handle creating tensors on Mac GPU (MPS) fine	2023-01-09 19:47:19 -03:00
Debanjum Singh Solanky	9def3f8c6f	Add exception handling to beta APIs, in case OpenAI API call fails	2023-01-09 01:27:06 -03:00
Debanjum Singh Solanky	7b164de021	Add beta API to summarize top search result using an OpenAI model This is unlike the more general chat API that combines summarization of top search result and conversing with the OpenAI model This should give faster summary results. As no intent categorization API call required	2023-01-09 01:25:59 -03:00
Debanjum Singh Solanky	d36da46f7b	Truncate prompt to not exceed OpenAI prompt limit Truncate prompt containing the top retrieved entry to 500 words to avoid triggering the max_token limit error	2023-01-09 00:51:46 -03:00
Debanjum Singh Solanky	237123d18c	Fix tests for the conversation processor - Use latest davinci model for tests - Wrap prompt in triple quotes to improve legibilty - `understand' method returns dictionary instead of string. Fix its test - Fix prompt for new model to pass `chat_with_history' test	2023-01-09 00:22:26 -03:00
Debanjum Singh Solanky	918af5e6f8	Make OpenAI conversation model configurable via khoj.yml - Default to using `text-davinci-003' if conversation model not explicitly configured by user. Stop using the older `davinci' and `davinci-instruct' models - Use `model' instead of `engine' as parameter. Usage of `engine' parameter in OpenAI API is deprecated	2023-01-09 00:17:51 -03:00
Debanjum Singh Solanky	74e779f8d0	Fix /beta/chat API to use Entry class instead of old dictionary pattern Search returns response of type SearchResponse instead of a dict now	2023-01-08 15:28:26 -03:00
Debanjum Singh Solanky	f2436039a0	Improve readability of GPT prompt strings in conversation processor	2023-01-08 15:27:41 -03:00
Debanjum Singh Solanky	6119005838	Improve comments, exceptions, typing and init of OpenAI model code	2023-01-08 00:36:18 -03:00
Debanjum Singh Solanky	c0ae8eee99	Allow using OpenAI models for search in Khoj - Init processor before search to instantiate `openai_api_key' from `khoj.yml'. The key is used to configure search with openai models - To use OpenAI models for search in Khoj - Set `encoder' to name of an OpenAI model. E.g text-embedding-ada-002 - Set `encoder-type' in `khoj.yml' to `src.utils.models.OpenAI' - Set `model-directory' to `null', as online model cannot be stored on disk	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	826f9dc054	Drop long words from compiled entries to be within max token limit of models Long words (>500 characters) provide less useful context to models. Dropping very long words allow models to create better embeddings by passing more of the useful context from the entry to the model	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	6a30a13326	Only create model directory if the optional field is set in SearchConfig	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	2fe37a090f	Make type of encoder to use for embeddings configurable via khoj.yml - Previously `model_type' was set in the setup of each `search_type' - All encoders were of type `SentenceTransformer' - All cross_encoders were of type `CrossEncoder' - Now `encoder-type' can be configured via the new `encoder_type' field in `TextSearchConfig' under `search-type` in `khoj.yml`. - All the specified `encoder-type' class needs is an `encode' method that takes entries and returns embedding vectors	2023-01-07 23:09:12 -03:00
Debanjum Singh Solanky	d55d7d53dc	Fix GPU usage by Khoj on Macs to speed up search and indexing - Ensure all tensors are on MPS device before doing operations across them - Background - GPU is used by default for Khoj on MacOS now - Needed PyTorch > 1.13.0 on Macs to use GPU, which we do now - MPS should speed up search and indexing on MacOS	2023-01-05 15:39:09 -03:00
Debanjum	abd035e2fa	Merge PR #112 to fix quote usage in khoj.el docstring from suliveevil/master Fix usage warning for unescaped single quote in `khoj.el' docstring. Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs	2023-01-05 13:24:11 -03:00
Debanjum Singh Solanky	e792523849	Bump version in metadata packages for khoj, khoj.el and obsidian plugin	2023-01-05 12:50:27 -03:00
suliveevil	b2812b409f	fix docstring usage warning ⛔ Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)	2023-01-05 16:47:38 +08:00
Debanjum Singh Solanky	47015ee6cc	Fold Demo video descriptions, analysis by default in main Readme	2023-01-04 20:13:43 -03:00
Debanjum Singh Solanky	da17ff6ac8	Add Upgrade instructions for Khoj.el Readme. Fix version of khoj.el	2023-01-04 20:06:39 -03:00
Debanjum Singh Solanky	66ccd0c970	Create Obsidian plugin for Khoj - Features - Search using Khoj from within the Obsidian app Allow Natural language search on your (markdown) notes in Obsidian Vault - Show search results as rendered (instead of raw) Markdown Improve legibility of the results - Jump to selected note from search result in Khoj search modal Simplify seeing result within its original note context - Automatically configure khoj to index markdown files in current vault Reduce khoj setup steps for plugin users by using reasonable defaults - Code updates the markdown config in khoj.yml and triggers index update - It can be configured by user in khoj plugin settings, if required - Add Demo and detailed Readme for the Obsidian plugin Ease setup and usage. Give context about capabilities - Miscellaneous - Trying keep a mono repo until the Khoj project is mature enough to reduce maintainance burden	2023-01-04 18:28:16 -03:00
Debanjum Singh Solanky	feddb6ce62	Add start_url to khoj webmanifest to show Khoj as PWA on Chrome	2023-01-04 13:37:56 -03:00
Debanjum Singh Solanky	3dee1aed9e	Create /config/data/default API endpoint to serve default khoj config This can ease configuring khoj from the different interfaces - Don't need to know all the (default) config used by khoj. - Just get default config by calling the above API endpoint. - Then modify desired portions and call POST /api/config/data to configure khoj.	2023-01-03 21:52:34 -03:00
Debanjum Singh Solanky	ce945f7a90	Configure processors too on calling /update API - Previously only search was being reconfigured - But Processors are configured on app start too - Match that behavior on calling /update API	2023-01-03 21:51:02 -03:00
Debanjum Singh Solanky	9d31988f42	Allow starting khoj in non-GUI mode without config file instantiated - Start khoj server (in non-GUI mode) without needing config file already instantiated. - But throw warning to configure khoj to use it - This allows plugins to configure the app via the /config/data APIs - To be used by the Khoj obsidian plugin to configure markdown content in khoj	2023-01-03 21:36:59 -03:00
Debanjum Singh Solanky	52664dd96c	Allow recursive glob pattern (**) to add files to search index - Simplify configuring files to index For Obsidian/Org-Roam type systems with lots of small files in khoj.yml using `input-filter'	2023-01-03 01:32:58 -03:00
Debanjum Singh Solanky	152e5f1661	Return the file of each search result in response - Useful for enabling jump to note functionality in interfaces - It will be used in the Khoj plugin for Obsidian	2023-01-03 01:25:34 -03:00
Debanjum Singh Solanky	c535953915	Update index automatically in non GUI mode too - Poll scheduler every minute using threading.Timer - Use 60 seconds polling interval to avoid fork bombing - Schedule next via the same poll scheduler - Allow clean program interrupt by running scheduler in daemon mode	2023-01-01 21:03:19 -03:00
Debanjum Singh Solanky	701d92e17b	Lock the index before updating it via API or Scheduler - There are 3 paths to updating/setting the index (stored in state.model) - App start - API - Scheduler - Put all updates to the index behind a lock. As multiple updates path that could (potentially) run at the same time (via API or Scheduler)	2023-01-01 17:09:36 -03:00
Debanjum Singh Solanky	3b0783aab9	Automate updating embeddings, search index on a hourly schedule - Use the schedule pypi package - Use QTimer to poll schedule.run_pending() regularly for jobs to run	2023-01-01 17:09:36 -03:00
Debanjum	06c25682c9	Split text entries by max tokens supported by ML models ### Background There is a limit to the maximum input tokens (words) that an ML model can encode into an embedding vector. For the models used for text search in khoj, a max token size of 256 words is appropriate [1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1#:~:text=model%20was%20just%20trained%20on%20input%20text%20up%20to%20250%20word%20pieces),[2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#:~:text=input%20text%20longer%20than%20256%20word%20pieces%20is%20truncated) ### Issue Until now entries exceeding max token size would silently get truncated during embedding generation. So the truncated portion of the entries would be ignored when matching queries with entries This would degrade the quality of the results ### Fix - `e057c8e` Add method to split entries by specified max tokens limit - Split entries by max tokens while converting [Org](https://github.com/debanjum/khoj/commit/c79919b), [Markdown](https://github.com/debanjum/khoj/commit/f209e30) and [Beancount](https://github.com/debanjum/khoj/commit/17fa123) entries to JSONL - `b283650` Deduplicate results for user query by raw text before returning results ### Results - The quality of the search results should improve - Relevant, long entries should show up in results more often	2022-12-26 18:23:43 +00:00
Debanjum Singh Solanky	17fa123b4e	Split entries by max tokens while converting Beancount entries To JSONL	2022-12-26 15:14:32 -03:00
Debanjum Singh Solanky	f209e30a3b	Split entries by max tokens while converting Markdown entries To JSONL	2022-12-26 13:14:15 -03:00
Debanjum Singh Solanky	24676f95d8	Fix comments, use minimal test case, regenerate test index, merge debug logs - Remove property drawer from test entry for max_words splitting test - Property drawer is not required for the test - Keep minimal test case to reduce chance for confusion	2022-12-25 22:33:04 -03:00
Debanjum Singh Solanky	b283650991	Deduplicate results for user query by raw text before returning results - Required because entries are now split by the max_word count supported by the ML models - This would now result in potentially duplicate hits, entries being returned to user - Do deduplication after ranking to get the top ranked deduplicated results	2022-12-25 21:36:15 -03:00
Debanjum Singh Solanky	53cd2e5605	Regenerate initial model in asymmetric reload test to reduce flakyness - Fix logger message when converting org node to entries - Remove unused import from conftest	2022-12-25 21:36:15 -03:00
Debanjum Singh Solanky	c79919bd68	Split entries by max tokens while converting Org entries To JSONL - Test usage the entry splitting by max tokens in text search	2022-12-25 21:36:00 -03:00
Debanjum Singh Solanky	08dc5e3324	Update instructions in khoj.el to install it from MELPA stable - The instructions suggest installing khoj-assistant via pip install. This installs the latest tagged/release version of khoj - To match that version user should install khoj.el from MELPA stable instead of MELPA	2022-12-23 19:08:38 -03:00
Debanjum Singh Solanky	e057c8e208	Add method to split entries by specified max tokens limit - Issue ML Models truncate entries exceeding some max token limit. This lowers the quality of search results - Fix Split entries by max tokens before indexing. This should improve searching for content in longer entries. - Miscellaneous - Test method to split entries by max tokens	2022-12-23 16:24:04 -03:00
Debanjum Singh Solanky	d3e175370f	Update readme to install khoj.el from MELPA stable unless using pre-release khoj Update readme to ask user to install khoj.el from MELPA when a pre-release version of the main khoj app is installed. Else install khoj.el from MELPA Stable	2022-12-20 23:29:22 -03:00
Debanjum Singh Solanky	cd463c5085	Update Khoj.el Install Instructions on Emacs	2022-12-20 11:06:33 -03:00
Debanjum Singh Solanky	23ca5a2d43	Improve (un-)quoting of funcs used in `khoj--get-enabled-content-types' - Based on melpa package feedback for khoj.el - Verified these changes don't affect behavior of the function	2022-12-19 18:02:23 -03:00
Debanjum Singh Solanky	5db3a67df5	Fix Khoj Emacs package URL in khoj.el	2022-12-14 22:49:19 -03:00
Debanjum Singh Solanky	abad6d5f44	Declare external khoj.el funcs. Remove undefined func warnings on install	2022-12-14 22:36:04 -03:00
Debanjum Singh Solanky	c52383b11c	Delete stale, unused installation helper script	2022-12-03 13:36:47 -03:00
Debanjum Singh Solanky	1990d09032	Bump khoj version in setup.py, khoj.el to 0.2.0	2022-12-02 14:58:54 -03:00
Debanjum Singh Solanky	a9cfd8b800	Extract hash func for incremental text indexing into separate method	2022-10-26 13:56:58 +05:30
Debanjum Singh Solanky	0de2ff9c97	Add __init__.py to routers directory to register it as a package	2022-10-25 20:40:40 +05:30
Debanjum Singh Solanky	55d2fea9be	Move Custom Formatter class for logger to util.helper module from main.py	2022-10-20 00:32:24 +05:30
Debanjum Singh Solanky	1c40f97114	Merge branch 'master' of github.com:debanjum/khoj into modularize-api-and-increase-typing - Conflicts: - src/interface/emacs/khoj.el Use our update to `config-url', use their `url-request-method'	2022-10-19 16:46:53 +05:30
Debanjum Singh Solanky	e1b5a87920	Rename Frontend Router to Web Client. Fix logger usage in routers - Use logger in api_beta router instead of print statements - Remove unused logger in web client router	2022-10-19 16:36:48 +05:30
Debanjum	4abd51cb04	Merge pull request #99 from telotortium/method Explicitly set `url-request-method' to GET in khoj.el	2022-10-19 10:31:37 +00:00
Debanjum Singh Solanky	c467df8fa3	Setup `mypy' for static type checking	2022-10-08 17:33:13 +03:00
Debanjum Singh Solanky	d292bdcc11	Do not version API. Premature given current state of the codebase - Reason - All clients that currently consume the API are part of Khoj - Any breaking API changes will be fixed in clients immediately - So decoupling client from API is not required - This removes the burden of maintaining muliple versions of the API	2022-10-08 16:32:46 +03:00
Debanjum Singh Solanky	7e9298f315	Use new Text Entry class to track text entries in Intermediate Format - Context - The app maintains all text content in a standard, intermediate format - The intermediate format was loaded, passed around as a dictionary for easier, faster updates to the intermediate format schema initially - The intermediate format is reasonably stable now, given it's usage by all 3 text content types currently implemented - Changes - Concretize text entries into `Entries' class instead of using dictionaries - Code is updated to load, pass around entries as `Entries' objects instead of as dictionaries - `text_search' and `text_to_jsonl' methods are annotated with type hints for the new `Entries' type - Code and Tests referencing entries are updated to use class style access patterns instead of the previous dictionary access patterns - Move `mark_entries_for_update' method into `TextToJsonl' base class - This is a more natural location for the method as it is only (to be) used by `text_to_jsonl' classes - Avoid circular reference issues on importing `Entries' class	2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky	99754970ab	Type the /search API response to better document the response schema - Both Text, Image Search were already giving list of entry, score - This change just concretizes this change and exposes this in the API documentation (i.e OpenAPI, Swagger, Redocs)	2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky	0521ea10d6	Put image score breakdown under `additional' field in search response - Update web, emacs interfaces to consume the scores from new schema	2022-10-08 12:06:01 +03:00
Debanjum Singh Solanky	e42a38e825	Version Khoj API, Update frontends, tests and docs to reflect it - Split router.py into v1.0, beta and frontend (no-prefix) api modules under new router package. Version tag in main.py via prefix - Update frontends to use the versioned api endpoints - Update tests to work with versioned api endpoints - Update docs to mentioned, reference only versioned api endpoints	2022-09-28 20:08:38 +03:00
Robert Irelan	d25e1d8e86	fix: explicitly set url-request-method In my installation, it appears that `url-request-method` is sometimes set globally to POST. Need to explicitly set it to ensure that GET is always used as intended.	2022-09-19 15:46:46 -04:00
Debanjum Singh Solanky	ee65a4f2c7	Merge /reload, /regenerate into single /update API endpoint - Pass force=true to /update API to force regenerating index from scratch - Otherwise calls to the /update API endpoint will result in an incremental update to index	2022-09-16 00:53:19 +03:00
Debanjum Singh Solanky	02d944030f	Use Base TextToJsonl class to standardize <text>_to_jsonl processors - Start standardizing implementation of the `text_to_jsonl' processors - `text_to_jsonl; scripts already had a shared structure - This change starts to codify that implicit structure - Benefits - Ease adding more `text_to_jsonl; processors - Allow merging shared functionality - Help with type hinting - Drawbacks - Lower agility to change. But this was already an implicit issue as the text_to_jsonl processors got more deeply wired into the app	2022-09-16 00:53:11 +03:00
Debanjum Singh Solanky	c16ae9e344	Ignore "Legacy way to download model" warning for upstream dependency	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	3169e3b78e	Use ellipsis instead of pass in base filter abstract methods for aesthetic	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	bf1ae038cb	Get XMP metadata from image using Pillow. Remove ExifTool dependency - Pillow already supports reading XMP metadata from Images - Removes need to maintain my fork of unmaintained PyExiftool - This also removes dependency on system Exiftool package for XMP metadata extraction - Add test to verify XMP metadata extracted from test images - Remove references to Exiftool from Documentation	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	8f57a62675	Remove unused imports. Fix typing and indentation - Typing issues discovered using `mypy'. Fixed manually - Unused imports discovered and fixed using `autoflake' - Fix indentation in `org_to_jsonl' manually	2022-09-14 04:56:52 +03:00
Debanjum Singh Solanky	be57c711fd	Revert OrgNode.hasTag func to method instead of property as accepts argument	2022-09-14 04:56:48 +03:00
Debanjum Singh Solanky	0109c7bd91	Disable ability to call <text>_to_jsonl, <type>_search packages directly - This code is de-synced with expected args by above scripts - Better to remove unused capabilitity that needlessly increases maintainance burden	2022-09-14 04:56:48 +03:00
Debanjum Singh Solanky	1680a617da	Reflect updates to query and results count in URL - Simplify tracking khoj query history, saving/sharing links - Do not execute search, when query only contains whitespaces - Prevents error when try process results of empty query	2022-09-13 23:39:24 +03:00
Debanjum Singh Solanky	34314e859a	Call /reload instead of /regenerate API to update index from web interface - As `/reload` updates index incrementally, it's relatively quick - This makes exposing `/reload` endpoint a better default to expose via the web interface than `the /regenerate' endpoint	2022-09-12 23:39:10 +03:00
Debanjum Singh Solanky	13b5d5082f	Create input field to set results count on the web interface Resolves #96	2022-09-12 23:24:46 +03:00
Debanjum Singh Solanky	1bfe9c4ef2	Handle filter only queries. Short-circuit and return filtered results - For queries with only filters in them short-circuit and return filtered results. No need to run semantic search, re-ranking. - Add client test for filter only query and quote query in client tests	2022-09-12 17:13:05 +03:00
Debanjum Singh Solanky	afc84de234	Make word filter regex explicit. Allow hyphen in word filters Helps with #88	2022-09-12 17:05:29 +03:00
Debanjum Singh Solanky	536f03af8f	Process text content files in sorted order for stable indexing - Image search already uses a sorted list of images to process - Prevents index of entries to desync when entries, embeddings generated by a separate server/app instance	2022-09-12 11:09:40 +03:00
Debanjum Singh Solanky	a701ad08b9	Support multiple input-filters to configure content to index via khoj.yml - Update existings code, tests to process input-filters as list instead of str - Test `text_to_jsonl' get files methods to work with combination of `input-files' and `input-filters' Resolves #84	2022-09-12 11:08:59 +03:00
Debanjum Singh Solanky	940c8fac8c	Use app LRU, not functools LRU decorator, to cache search results in router - Provides more control to invalidate cache on update to entries, embeddings - Allows logging when results are being returned from cache etc - FastAPI, Swagger API docs look better as the `search' controller not wrapped in generically named function when using functools LRU decorator	2022-09-12 09:38:48 +03:00
Debanjum Singh Solanky	c6fa09d8fc	Fix querying with include word filter from web interface - Not encoding the `query' string before querying the backend API with it was causing the "+" prefix for include word filter to be lost	2022-09-12 09:27:02 +03:00
Debanjum Singh Solanky	1502fbc9e9	Add index_heading_entries flag to default and sample khoj configs	2022-09-11 17:33:37 +03:00
Debanjum Singh Solanky	7216cdff58	Add Date, Word filter for Org-Music content	2022-09-11 17:29:34 +03:00
Debanjum Singh Solanky	9d369ae4df	Fix OrgNode render of entries with property drawers and empty body - Issue - Indent regex was previously catching escape sequences like newlines - This was resulting in entries with only escape sequences in body to be prepended to property drawers etc during rendering - Fix - Update indent regex to only look for spaces in each line - Only render body when body contains non-escape characters - Create test to prevent this regression from silently resurfacing	2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky	253c9eae9a	Set index_heading_entries field in config to index entries with no body - Previously heading entries were not indexed to maintain search quality - But given that there are use-cases for indexing entries with no body - Add a configurable `index_heading_entries' field to index heading entries - This `TextContentConfig' field is currently only used for OrgMode content	2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky	1d3b3d5f39	Convert field get/set methods in OrgNode class to @property - Use more descriptive variable names in OrgNode parser and class - Convert OrgNode fields to private/protected, use property methods to get/set them	2022-09-11 14:59:28 +03:00
Debanjum Singh Solanky	db37e38df7	Create OrgNode hasBody method. Use it in org_to_jsonl checks	2022-09-11 12:50:03 +03:00
Debanjum Singh Solanky	b4878d76ea	Extract entries from scratch when regenerate requested - Do not rely on previously extracted entries to find new entries in regenerate scenario	2022-09-11 12:50:03 +03:00
Debanjum Singh Solanky	52e3dd9835	Pass the whole TextContentConfig as argument to text_to_jsonl methods - Let the specific text_to_jsonl method decide which of the TextContentConfig fields it needs to convert <text> type to jsonl - This simplifies extending TextContentConfig for a specific type without modifying all text_to_jsonl methods - It keeps the number of args being passed to the `text_to_jsonl' methods in check	2022-09-11 12:49:56 +03:00
Debanjum Singh Solanky	e951ba37ad	Raise exception when org file not found - No need to catch the IOError in OrgNode	2022-09-11 01:09:24 +03:00
Debanjum Singh Solanky	2e1bbe0cac	Fix striping empty escape sequences from strings - Fix log message on jsonl write	2022-09-10 23:57:05 +03:00
Debanjum Singh Solanky	a7cf6c8458	Use dictionary instead of list to track entry to file maps	2022-09-10 23:08:30 +03:00
Debanjum Singh Solanky	3e1323971b	Stack function calls in jsonl converters to avoid unneeded variables	2022-09-10 22:56:06 +03:00
Debanjum Singh Solanky	4eb84c7f51	Log performance metrics for beancount, markdown to jsonl conversion	2022-09-10 22:47:54 +03:00
Debanjum Singh Solanky	ebd5039bd1	Merge branch 'master' into support-incremental-updates-of-embeddings	2022-09-10 22:37:13 +03:00
Debanjum Singh Solanky	030fab9bb2	Support incremental update of Markdown entries, embeddings	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	91aac83c6a	Support incremental update of Beancount transactions, embeddings	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	b01b4d7daa	Extract logic to mark entries for embeddings update into helper function - This could be re-used by other text_to_jsonl converters like markdown, beancount	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	f97308bef2	Fix log message on writing JSONL data to file	2022-09-10 21:40:08 +03:00
Debanjum Singh Solanky	c17a0fd05b	Do not store word filters index to file. Not necessary for now - It's more of a hassle to not let word filter go stale on entry updates - Generating index on 120K lines of notes takes 1s. Loading from file takes 0.2s. For less content load time difference will be even smaller - Let go of startup time improvement for simplicity for now	2022-09-10 21:01:54 +03:00
Debanjum Singh Solanky	91d11ccb49	Only hash compiled entry to identify new/updated entries to update - Comparing compiled entries is the appropriately narrow target to identify entries that need to encode their embedding vectors. Given we pass the compiled form of the entry to the model for encoding - Hashing the whole entry along with it's raw form was resulting in a bunch of entries being marked for updated as LINE: <entry_line_no> is a string added to each entries raw format. - This results in an update to a single entry resulting in all entries below it in the file being marked for update (as all their line numbers have changed) - Log performance metrics for steps to convert org entries to jsonl	2022-09-10 21:01:44 +03:00
Debanjum Singh Solanky	b9a6e80629	Make OrgNode tags stable sorted to find new entries for incremental updates - Having Tags as sets was returning them in a different order everytime - This resulted in spuriously identifying existing entries as new because their tags ordering changed - Converting tags to list fixes the issue and identifies updated new entries for incremental update correctly	2022-09-10 20:59:52 +03:00
Debanjum Singh Solanky	2f7a6af56a	Support incremental update of org-mode entries and embeddings - What - Hash the entries and compare to find new/updated entries - Reuse embeddings encoded for existing entries - Only encode embeddings for updated or new entries - Merge the existing and new entries and embeddings to get the updated entries, embeddings - Why - Given most note text entries are expected to be unchanged across time. Reusing their earlier encoded embeddings should significantly speed up embeddings updates - Previously we were regenerating embeddings for all entries, even if they had existed in previous runs	2022-09-10 20:58:33 +03:00
Debanjum Singh Solanky	ec675d27d3	Suppress non-actionable HuggingFace FutureWarning shown on app start	2022-09-10 16:43:14 +03:00
Debanjum Singh Solanky	1ac6a71ff0	Add --version flag to show installed version of khoj	2022-09-10 16:40:19 +03:00
Debanjum Singh Solanky	976397bd82	Ignore empty #+TITLE, merge multiple #+TITLE for 0th level headings	2022-09-10 15:34:47 +03:00
Debanjum Singh Solanky	11917c6ddd	Do not normalize absolute filenames for creating links in OrgNode	2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky	07b98d35f1	Use filename or #+TITLE as heading for 0th level content in org files - Set LINE, SOURCE link properties in property drawer correctly for content which falls under no heading - See Issue #83 for more details	2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky	d6bd7bf3e1	Fix initializing OrgNode level to string to parse org files - Parsed `level` argument passed to OrgNode during init is expected to be a string, not an integer - This was resulting in app failure only when parsing org files with no headings, like in issue #83, as level is set to string of `*`s the moment a heading is found in the current file	2022-09-10 14:21:08 +03:00
Debanjum Singh Solanky	d835467f2c	Throw exception if no valid entries found in specified content files - Previously we were failing if no valid entries while computing embeddings. This was obscuring the actual issue of no valid entries found in the specified content files - Throwing an exception early with clear message when no entries found should make clarify the issue to be fixed - See issue #83 for details	2022-09-10 14:20:10 +03:00
Debanjum Singh Solanky	e00bb53336	Init word filter dictionary with default value as set to simplify code	2022-09-10 12:19:09 +03:00
Debanjum Singh Solanky	4d776d9c7a	Bump khoj version to 0.1.9	2022-09-09 07:50:15 +03:00
Debanjum Singh Solanky	588f598949	Pass empty list of `input_files' to FileBrowser on first run - Default config has `input_files' set to None - This was being passed to `FileBrowser' on Initialization - But `FileBrowser' expects `content_files' of list type, not None - This resulted in an unexpected NoneType failure	2022-09-09 07:26:40 +03:00
Debanjum Singh Solanky	3ddffdfba4	Create config directory before setting up logging to file under it - The logging to file code expects the config directory to already be setup - But parent directory of config file was being set up later in code - This resulted in app start failing with ~/.khoj dir does not exist error	2022-09-09 07:21:42 +03:00
Debanjum Singh Solanky	762607fc9f	Log processed entries by org_to_jsonl only if verbosity > 2 Output too verbose for even debug mode logging. So gated behind -vvv	2022-09-06 23:03:29 +03:00
Debanjum Singh Solanky	490157cafa	Setup File Filter for Markdown and Ledger content types - Pass file associated with entries in markdown, beancount to json converters - Add File, Word, Date Filters to Ledger, Markdown Types - Word, Date Filters were accidently removed from the above types yesterday - File Filter is the only filter that newly got added	2022-09-06 15:31:26 +03:00
Debanjum Singh Solanky	94cf3e97f3	Log app logs to file for posthoc debugging and performance analysis	2022-09-06 14:51:48 +03:00
Debanjum Singh Solanky	3707a4cdd4	Improve date filter perf. Precompute date to entry map, Cache results - Precompute date to entry map - Cache results for faster recall - Log preformance timers in date filter	2022-09-05 18:21:29 +03:00
Debanjum Singh Solanky	31503e7afd	Do not pass embeddings as argument to filter.apply method	2022-09-05 15:46:54 +03:00
Debanjum Singh Solanky	965bd052f1	Make search filters return entry ids satisfying filter - Filter entries, embeddings by ids satisfying all filters in query func, after each filter has returned entry ids satisfying their individual acceptance criteria - Previously each filter would return a filtered list of entries. Each filter would be applied on entries filtered by previous filters. This made the filtering order dependent - Benefits - Filters can be applied independent of their order of execution - Precomputed indexes for each filter is not in danger of running into index out of bound errors, as filters run on original entries instead of on entries filtered by filters that have run before it - Extract entries satisfying filter only once instead of doing this for each filter - Costs - Each filter has to process all entries even if previous filters may have already marked them as non-satisfactory	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	7dd20d764c	Pre-compute file to entry map in file filter to mark ids to include faster	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	2890b4cd44	Simplify extracting entries satisfying file filter	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	7606724dbc	Add file of each entry to entry dict in org_to_jsonl converter - This will help filter query to org content type using file filter - Do not explicitly specify items being extracted from json of each entry in text_search as all text search content types do not have file being set in jsonl converters	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	7e083d3e96	Cache results for file filters passed in query for faster filtering	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	f634399f23	Convert simple file filters with no path separator into regex - Specify just file name to get all notes associated with file at path - E.g `query` with `file:"file1.org"` will return `entry1` if `entry1` is in `file1.org` at `~/notes/file.org` - Test - Test converting simple file name filter to regex for path match - Test file filter with space in file name	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	092b9e329d	Setup Filters when configuring Text Search for each Search Type - Allows enabling different filters for different Text Search Types - Use FileFilter in Text Search on Org Files	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	1f9fd28b34	Create File Filter to filter files to query. Add tests for file filter	2022-09-05 01:09:20 +03:00
Debanjum Singh Solanky	e4418746f2	Create Abstract Base Class for Filters. Make Word, Date Filter Child of BaseFilter	2022-09-04 18:48:16 +03:00
Debanjum Singh Solanky	f930324350	Rename explicit filter to word filter to be more specific	2022-09-04 17:18:47 +03:00
Debanjum Singh Solanky	6087862521	Use LRU helper class for explicit filter cache	2022-09-04 16:42:28 +03:00
Debanjum Singh Solanky	8f3326c8d4	Create LRU helper class for caching	2022-09-04 16:31:46 +03:00
Debanjum Singh Solanky	191a656ed7	Use word to entry map, list comprehension to speed up explicit filter - Code Changes - Use list comprehension and `torch.index_select' methods - to speed selection of entries, embedding tensors satisfying filter - avoid deep copy of entries, embeddings - avoid updating existing lists (of entries, embeddings) - Use word to entry map and set operations to mark entries satisfying inclusion, exclusion filters - Results - Speed up explicit filtering by two orders of magnitude - Improve consistency of speed up across inclusion and exclusion filtering	2022-09-04 15:22:35 +03:00
Debanjum Singh Solanky	28d3dc1434	Deep copy entries, embeddings in filters. Defer till actual filtering - Only the filter knows when entries, embeddings are to be manipulated. So move the responsibility to deep copy before manipulating entries, embeddings to the filters - Create deep copy in filters. Avoids creating deep copy of entries, embeddings when filter results are being loaded from cache etc	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	3308e68edf	Cache explicitly filtered entries, embeddings by required, blocked words	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	cdcee89ae5	Wrap words in quotes to trigger explicit filter from query - Do not run the more expensive explicit filter until the word to be filtered is completed by user. This requires an end sequence marker to identify end of explicit word filter to trigger filtering - Space isn't a good enough delimiter as the explicit filter could be at the end of the query in which case no space	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	8d9f507df3	Load entries_by_word_set from file only once on first load of explicit filter	2022-09-04 00:37:37 +03:00
Debanjum Singh Solanky	858d86075b	Use regexes to check if any explicit filters in query. Test can_filter	2022-09-03 23:47:28 +03:00
Debanjum Singh Solanky	546fad570d	Use regex to extract include, exclude filter words from query	2022-09-03 23:41:43 +03:00
Debanjum Singh Solanky	ffb8e3988e	Use Python Logging Framework to Time Performance of Explicit Filter	2022-09-03 22:24:10 +03:00
Debanjum Singh Solanky	c7de57b8ea	Pre-compute entry word sets to improve explicit filter query performance	2022-09-03 16:16:31 +03:00
Debanjum Singh Solanky	094bd18e57	Use python standard logging framework for app logs - Stop passing verbose flag around app methods - Minor remap of verbosity levels to match python logging framework levels - verbose = 0 maps to logging.WARN - verbose = 1 maps to logging.INFO - verbose >=2 maps to logging.DEBUG - Minor clean-up of app: unused modules, conversation file opening	2022-09-03 14:43:32 +03:00
Debanjum Singh Solanky	d0531c3064	Update URL QueryParam when Type set in Dropdown on Web Interface - This also pushes the updated URL state to history - Allows jumping back to the web interface after clicking on an image and having the type set to image search - Previously type would get reset to the default search type on jumping back	2022-08-28 12:22:22 +03:00
Debanjum Singh Solanky	2eae32d743	Time, Log Image Search Performance	2022-08-28 00:28:46 +03:00
Debanjum Singh Solanky	c3ca99841b	Scale down images to generate image embeddings faster, with less memory - CLIP doesn't need full size images for generating embeddings with decent search results. The sentence transformers docs use images scaled to 640px width - Benefits - Normalize image sizes - Increase image embeddings generation speed - Decrease memory usage while generating embeddings from images	2022-08-24 14:09:02 +03:00
Debanjum Singh Solanky	ea4fdd9134	Fix logic to ignore notes with no body. Add tests to prevent regression - Notes with empty newlines in body were not being ignored - Add regression tests to avoid above regression in org_to_jsonl conversion	2022-08-21 19:41:40 +03:00
Debanjum	144986ebfd	Fix, Improve Desktop GUI Splash Screen and Main Window - `5e6625a` Fix file browser to not add empty line when no file/dir selected - `8098b8c` Bring main window to Top when open from System Tray - `1c122a8` Place window near top so buttons are not hidden by OS bottom bar - `dfe2546` Set Khoj Icon on Main Desktop Window - `1b1f8f9` Move Splash screen text below icon. Set the text color to black - `450f644` Fix path to remove shared libraries when packaging the Windows app	2022-08-20 23:19:01 +00:00
Debanjum Singh Solanky	5e6625ac68	Fix file browser to not add empty line when no file/dir selected - When no file selected in file browser an empty line/entry gets added to input entries list - Bug got introduced due to insufficient update on change to add instead of insert - Update is_none_or_empty helper method to also check for empty string	2022-08-21 02:03:28 +03:00
Debanjum Singh Solanky	8098b8c3a8	Bring Configure Window to Top when Opened from System Tray - Previously the window could get hidden behind other app windows when user clicked configure from the system tray	2022-08-20 23:38:43 +03:00
Debanjum Singh Solanky	1c122a8a91	Place window near top so buttons are not hidden by OS bottom bar	2022-08-20 22:38:06 +03:00
Debanjum Singh Solanky	dfe2546c04	Set Khoj Icon on Main Desktop Window	2022-08-20 20:36:15 +03:00
Debanjum Singh Solanky	82d2891765	Do not pass ML compute `device' around as argument to search funcs - It is a non-user configurable, app state that is set on app start - Reduce passing unneeded arguments around. Just set device where required by looking for ML compute device in global state	2022-08-20 14:44:53 +03:00
Debanjum Singh Solanky	acc9091260	Use MPS on Apple Mac M1 to GPU accelerate Encode, Query Performance - Note: Support for MPS in Pytorch is currently in v1.13.0 nightly builds - Users will have to wait for PyTorch MPS support to land in stable builds - Until then the code can be tweaked and tested to make use of the GPU acceleration on newer Macs	2022-08-20 14:44:06 +03:00
Debanjum Singh Solanky	7de9c58a1c	Load models, corpus embeddings onto GPU device for text search, if available - Pass device to load models onto from app state. - SentenceTransformer models accept device to load models onto during initialization - Pass device to load corpus embeddings onto from app state	2022-08-20 14:04:18 +03:00
Debanjum Singh Solanky	dc8dcc94a6	Bump Khoj.el package version to 0.1.6	2022-08-19 20:48:42 +03:00
Debanjum Singh Solanky	ffbf15eff8	Add helper function to identify when app running as pyinstaller app Useful for when want the app to behave differently in pyinstaller app scenario with frozen python. And in development scenarios	2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky	6c5c1c33c1	Turn off Tokenizers Parallelism. Khoj doesn't support it right now - Forking and multiprocess are problemantic in frozen python scenarios. This will cause issues when running App packaged by pyinstaller	2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky	d4072974d7	Use of XMP metadata in Khoj Image Search is broken. Disable by default - CLIP Image score and XMP metadata score are not combining well. When combined they give non sensical results. Enable only once figure how best to combine the two. - Show scores with higher precision for image search - Image search scores seem to be mostly be between 0.2 - 0.3 for some reason - Higher precision scores make it easier to understand the quality of returned results perceived by the model itself	2022-08-19 19:17:28 +03:00
Debanjum Singh Solanky	7c4417126c	Append files, directories selected by user to config in Desktop GUI - Allows adding multiple image directories via GUI - Allow adding multiple files in different directories via GUI - Previously users couldn't add multiple directories via GUI They'd have to manually append to input field if multiple files, directories - To clear/overwrite is much easier. The user can just select text to delete in input area	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	00ddcfdac8	Use .ico icon when packaging for Windows (and Linux) using Pynstaller	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	60dacf3f2c	Show splash screen on app start. Only supported on Windows, Linux	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	0079c13bf7	Set input-directories in config for image search type on Desktop GUI - Issue Fix configuring image search from Desktop GUI. It was broken before. The Desktop GUI was updating input-files field under content-type > image. This field is not used for image search. So image search couldn't be configured from the Desktop GUI - Fix - Set input-directories when field of search type image is set from GUI - Otherwise set input-files field in config	2022-08-18 18:29:55 +03:00
Debanjum Singh Solanky	c4fd661909	Move the experimental /chat API to under /beta/chat	2022-08-16 16:36:15 +03:00
Debanjum Singh Solanky	b8913476ba	Fix if condition in router to trigger markdown search	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	9bc4fd539e	Set Web Interface URL from loaded state in Desktop GUIs. Not hard-coded	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	7f479b0104	Improve Displaying Error to User on Khoj window in Desktop GUI - Show a helpful error message in the GUI to the user, instead of the crashing if loading config fails, for e.g if file wasn't found - Collate GUI errors into an ErrorType enum class - Remove previous error messages before showing the new one	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	873bb9dd97	Do not force the Khoj window to always be on top. It's needlessly annoying	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	67ab40bb01	Regenerate embeddings everytime user clicks configure in Desktop GUI Previously if the embeddings were already there only the khoj.yml config file would get updated. The embeddings would remain old. 1. This results in a stale app state where the config doesn't match the embeddings 2. Currently the user cannot update their config from the config screen. They'd have to use a combination of config screen and web interface>regenerate button to trigger it or delete their ~/.khoj dir This commit should resolve the above issues	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	2647e6bab4	Display re-ranked results triggered via keybinding in khoj.el - Prevent immediate overwrite of re-ranked results by incremental-search without rerank triggered via post-command-hook. - This triggers right after the reranking results are rendered, so user never ends up seeing them	2022-08-15 18:41:12 +03:00
Debanjum Singh Solanky	a91d2df300	Simplify Emacs interface to only rerank results on explicit command	2022-08-15 06:20:13 +03:00
Debanjum Singh Solanky	e846829a2e	Reset Khoj.el version to align with Khoj package version	2022-08-15 06:20:13 +03:00
Debanjum Singh Solanky	fed0b591af	Package Khoj as Debian app in Github Release Workflow	2022-08-14 05:07:58 +03:00
Debanjum Singh Solanky	541e03da3d	Make khoj.el pass checkdoc, package-lint, flycheck checks - Add docstrings, mention args in them. Make docstring crisper - prefix funcs, variables with khoj-- - Require emacs >27.1 for json-parse-buffer - Use lexical binding - Add quickstart docs to elisp file itself - Bump version of khoj.el	2022-08-13 21:37:41 +03:00
Debanjum Singh Solanky	3300378804	Minimal formatting to render beancount results legibly on web interface	2022-08-13 05:03:45 +03:00
Debanjum Singh Solanky	a0759dd923	Convert Configure Screen into the Main Application Window - What - Convert the config screen into the main application window with configuration as just one of the functionality it provides - Rename config screen to main window to match new designation - Why - System Tray isn't available everywhere (e.g Linux) - This requires moving functionality into a normal window for cross-compat	2022-08-13 02:05:52 +03:00
Debanjum Singh Solanky	684f497abe	Handle no System Tray on Linux (Gnome) - What - On Linux - Show Configure Screen, even if not first run experience - Do no show system tray on Linux - Quit app on closing Configure Screen - On Windows, Mac - Show Configure screen only if first run experience - Show system tray always - Do not quit app on closing Configure Screen - Why - Configure screen is the only GUI element on Linux. So closing it should close the application - On Windows, Mac the system tray exists, so app should not be closed on closing configure screen	2022-08-13 01:00:20 +03:00
Debanjum Singh Solanky	c2815c5d09	Open Search from Khoj Configure Screen - Start evolving configure screen away from just being a configure screen - Update Window Title to just say Khoj - Allow Opening Web Interface to Search from Khoj configure screen - Rename "Start" Button to more accurate "Configure" - Disable Search button on first run and while configuring app	2022-08-13 00:43:49 +03:00
Debanjum Singh Solanky	28a91ad1fd	Deep copy the default_config constant to prevent it being overwritten - Issue - In the previous form, updates to self.current_config would update default_config as python does a shallow copy - So self.current_config is just referencing the values of default_config - Hence updates to current_config updates the default_config values too - This is not what we want - Fix - Deep copy the default_config values. Now updates to self.current_config wouldn't affect the default_config	2022-08-12 23:54:16 +03:00
Debanjum Singh Solanky	62ac41ce3b	Reload settings in a separate thread to not freeze Config Screen - Generating embeddings takes time - If user enables a content type and clicks start. The app starts to generate embeddings when loading the new settings - Run this function in a separate thread to keep config screen responsive - But disable start button to prevent re-entrant threads - Also show a minimal visual indication that the app is saving state	2022-08-12 23:34:00 +03:00
Debanjum Singh Solanky	927547d0af	Update Title of Configure Screen to follow "<Screen> - App" pattern	2022-08-12 22:53:10 +03:00
Debanjum Singh Solanky	32ac1ea1b6	Allow user to quit application from the terminal via SIGINT Call python interpreter at regular interval to handle any interrupt signals. create custom handler to terminate server and application	2022-08-12 21:11:58 +03:00
Debanjum Singh Solanky	43301d488a	Increase Width of Configure Screen	2022-08-12 18:34:47 +03:00
Debanjum Singh Solanky	9baea9c9fd	Let Input Fields Wrap. Adjust Height based on Text in Field - Convert Input Fields into PlainTextEdit - Display Each Selected File on a Separate Line in Field - Set Height of FileBrowser Input Field based on Number of Lines/Files	2022-08-12 18:33:56 +03:00
Debanjum Singh Solanky	b7b96110e9	Rename FileBrowser Button Text to "Select" instead of "Add"	2022-08-12 17:08:40 +03:00
Debanjum Singh Solanky	a1c58a9470	Create, Use a Labelled Text Field for the Conversation Input Field - This fixes the field expanding when configure screen is expanded - Allows for reusability of the labelled text field - Simplifies the logic to save settings for conversation processor	2022-08-12 16:59:15 +03:00
Debanjum Singh Solanky	fa7e36cada	Rename external .js files to .min.js to mark them as vendored - Excludes from Github language stats. See linguists/vendor.yml for exclusion rules - Signifies them as external for Khoj developers too	2022-08-12 04:08:50 +03:00
Debanjum Singh Solanky	110e3df0b7	Set default config in the constant module. Use from there to configure app - Avoid having to pass the khoj_sample.yml data file into pip, native apps - Packaging data files into python packages is annoying. - There's `MANIFEST.in`, `data_files` and `package_data` in setup.py - Bdist, wheel, generated source tarball use different set of these fields and put the data files in different locations - Rather just code the default config into a constant. Avoid pointless file reads as well this way	2022-08-12 02:18:46 +03:00
Debanjum Singh Solanky	fad2f3a2e7	Resolve config_file to absolute right at start on parsing args in cli - Assume path is absolute in yaml util module while saving, loading file - This follows same convention as jsonl. Which just operates on passed file path, assuming it is of appropriate form. Responsibility to put it in appropriate form is on the caller, for now	2022-08-12 01:34:08 +03:00
Debanjum Singh Solanky	44fe70513a	Handle situation where default config directory or file does not exist - Include khoj_sample.yml in pip package to load default config from - Create khoj config directory if it doesn't exist - Load config from khoj_sample.yml if khoj.yml config doesn't exist	2022-08-12 01:17:34 +03:00
Debanjum Singh Solanky	41520e1608	Improve Docstring for Configure Screen and System Tray class, funcs	2022-08-11 23:36:02 +03:00
Debanjum Singh Solanky	a748acfeeb	Merge branch 'master' of github.com:debanjum/khoj into create-native-gui Conflicts: - src/main.py - router functions have moved to router - move logic to handle null query perf timer variables into router.py - set main.py to current branch, not master	2022-08-11 21:09:42 +03:00
Debanjum Singh Solanky	6af2d6bb6d	Add Flag to Start App without Native GUI	2022-08-11 20:59:57 +03:00
Debanjum Singh Solanky	b74ca1def6	Wrap error message instead of expanding screen to show message	2022-08-11 20:51:56 +03:00
Debanjum Singh Solanky	2646fa825b	Get Files from File input line to match user expectation - If a user manually edits the input file lines, clicking start should use that. Currently it just looks at the files selected last via file browser - We want to allow users to manually enter file paths in field. Which is why the field hasn't been set to read-only	2022-08-11 20:48:45 +03:00
Debanjum Singh Solanky	dad9133598	Split save_settings method into smaller methods for modularization	2022-08-11 20:00:52 +03:00
Debanjum Singh Solanky	56ba91fec8	Remove unused methods in file browser widget. Improve name of existing	2022-08-11 19:46:09 +03:00
Debanjum Singh Solanky	fd4e41495c	Use appropriate label for directory input types to minimize confusion	2022-08-11 19:45:19 +03:00
Debanjum Singh Solanky	c1e1466fb1	Validate new config before write. Show error if new config invalid	2022-08-11 19:18:22 +03:00
Debanjum Singh Solanky	1ff049599f	Show current config on config screen. Load default config if config unset - Track current (saved/loaded) config separate from the new config (to be written) when user clicks Start - Fallback to using default config when no config for the specific content type or processor is specified in khoj.yml - Earlier were only loading default config on first run, not after - Create Child CheckBox, LineEdit classes for Processor Widgets - Create ProcessorType, similar to SearchType - Track ProcessorType the widgets are associated with - Simplify update, save, load of config based on type	2022-08-11 19:11:25 +03:00
Debanjum Singh Solanky	23e06f483d	Do not emit type tags when dumping config YAML to file	2022-08-11 19:08:36 +03:00
Debanjum Singh Solanky	678fb6a3c7	Add Settings Panel for Conversation Settings to Config Screen	2022-08-11 04:52:40 +03:00
Debanjum Singh Solanky	c1fcf44405	Initialize Settings on Config Screen with Existing Settings from File	2022-08-11 04:51:33 +03:00
Debanjum Singh Solanky	3cec6229ad	Hot swap backend config via config screen start button click - Update configuration to use by the backend, while app is running - Trigger after user hits start button with their config. The config gets written to khoj.yml file first, then the updated config is loaded onto memory	2022-08-11 00:32:11 +03:00
Debanjum Singh Solanky	f7fdf8d8ce	Refactor app start to start server even if backend not configured - Decouple configuring backend from starting server. Backend search and processors can be configured after the backend server has started - Set global state in main instead of in configure_server method. This allows the app to start even if configure_server exits early in the first run scenario, where no config available to configure server - Now start server, even if no config, before GUI started in main - This refactor of app startup flow will allow users to configure backend using the configure screen after server start	2022-08-11 00:13:14 +03:00
Debanjum Singh Solanky	34018c7d4b	Store args passed from commandline at app start in global app state	2022-08-11 00:11:35 +03:00
Debanjum Singh Solanky	cc6ef0f450	Save configure screen settings to app config yaml on clicking Start	2022-08-10 23:10:39 +03:00
Debanjum Singh Solanky	dae65c5b6b	Create child class of Qt CheckBox to track search type it enables/disables	2022-08-10 22:44:37 +03:00
Debanjum Singh Solanky	f42f54019b	Type parent_layout passed as arguments to ConfigureScreen methods	2022-08-10 22:43:20 +03:00
Debanjum Singh Solanky	f63f11186f	Pass config file for app to configure screen	2022-08-10 22:42:32 +03:00
Debanjum Singh Solanky	82a7059b6a	Only setup conversation processor if it has configuration set	2022-08-10 22:34:03 +03:00
Debanjum Singh Solanky	9628ca073c	Extract conversation processor from config into separate function - Only pass processor config arg required by configure_processor. Not the unused full config object - Type arguments passed to methods configure processors - Import json for use by conversation processor to load logs	2022-08-10 22:33:33 +03:00
Debanjum Singh Solanky	62eb66b8ca	Rename load_config_from_file to more descriptive parse_config_from_file	2022-08-10 22:28:51 +03:00
Debanjum Singh Solanky	328cc00439	Create global constant to store app root directory	2022-08-10 20:09:03 +03:00
Debanjum Singh Solanky	d2c7b28172	Extract code to load config from YAML file into new utils.yaml module	2022-08-10 20:07:44 +03:00
Debanjum Singh Solanky	150ae19660	Indent Timestamps, Drawers at Body Level in OrgNode Entry Representation	2022-08-10 18:55:37 +03:00
Debanjum Singh Solanky	fd31d339c1	Remove spurious space in Entries without Todo in OrgNode Entry Repr	2022-08-10 13:48:44 +03:00
Debanjum Singh Solanky	eddf88f818	Org buffer customization settings to tail of khoj.el results buffer - Results get priority screen real estate - Allows quick speed key based traversal of results as cursor on switching to buffer is at top level heading - E.g C-x o n n o 2 jumps to entry in actual file of second result - Unlike before when it is at the #+STARTUP org buffer customization settings	2022-08-10 12:57:37 +03:00
Debanjum Singh Solanky	daef276fd1	Add files for each search type. Extract config on clicking start - Only allow adding files with appropriate file extension for each search type - e.g .org for org-mode search, directory for image search - Extract file paths added to config and enablement state of each search type - This extracted state will be used to populate the khoj.yml config file	2022-08-10 03:27:22 +03:00
Debanjum Singh Solanky	d74134e6cc	Reuse Single Method to Create Setting Panels for each Search Type	2022-08-09 23:50:43 +03:00
Debanjum Singh Solanky	509d52e2cd	Toggle Editability instead of Visibility of Per Search Type Settings - Simplifies the configure screen layout and allows it to be of constant width - It was buggy, the configure screen would dynamically expand but not restore back to original size on disabling search type after enable	2022-08-09 23:34:54 +03:00
Debanjum Singh Solanky	3c788f1d29	Rename configure window to more generic configure screen	2022-08-09 22:44:05 +03:00
Debanjum Singh Solanky	c50ab7c3ad	Split config settings GUI into functions. Convert Config Window to Dialog	2022-08-09 22:36:41 +03:00
Debanjum Singh Solanky	664713b24e	Extract Qt GUI code from main.py into separate interface/desktop dir	2022-08-09 22:12:29 +03:00
Debanjum Singh Solanky	84c1fc701d	Fix query timing variables from being referenced before assignment	2022-08-09 21:06:37 +03:00
Debanjum Singh Solanky	57026b802c	Set size of rendered images using user customizable vars	2022-08-09 21:06:37 +03:00
Debanjum Singh Solanky	0a758c9f0f	By default, wait for 2 seconds before initiating rerank in khoj.el - Subjectively, previous default seems to aggressive based on usage Doesn't give time for user to think and type their query	2022-08-09 21:06:30 +03:00
Debanjum Singh Solanky	f01fb16ebb	Use single hyphen in name of user configurable variables in khoj.el - Follow convention, two hyphens indicate variable private to library - Defcustom are user configurable variables. So they should have single - - Use khoj-results-count variable directly in code	2022-08-09 20:49:34 +03:00
Debanjum Singh Solanky	cd59982c9c	Add Qt Button to save Khoj configuration in Khoj Configuration Window	2022-08-09 20:42:44 +03:00
Debanjum Singh Solanky	2c77caf06c	Group ledger, org setting widgets into child Qt widgets of config window	2022-08-09 20:42:44 +03:00
Debanjum Singh Solanky	027da719aa	Open Configure Window on First Run or from System Tray - Trigger FRE if no config loaded. Open Configure Window automatically - Else user can manually open config window from App on System Tray	2022-08-09 17:05:27 +03:00
Debanjum Singh Solanky	a588a8e21f	Make config_file an optional argument. It can be generated on FRE - Make config_file an optional arg. It defaults to default khoj config dir - Return args.config as None if no config_file explicitly passed by user - Parent can use args.config = None as signal to trigger first run experience	2022-08-09 17:02:02 +03:00
Debanjum Singh Solanky	21af122447	Clean up unused methods, module imports. Add comments	2022-08-09 16:59:38 +03:00
Debanjum Singh Solanky	80fa9fde6a	Quit GUI via SysTray instead of sys.exit to cleanly terminate server	2022-08-08 23:49:26 +03:00
Debanjum Singh Solanky	e5691f9d1d	PyInstaller Spec to Wrap Khoj into a Basic Native App - Verified functionality on MacOS - Add ICNS Icon to use as MacOS App Icon - Spec generated by PyInstaller: ```sh pyinstaller \ src/main.py \ --windowed \ --onefile \ --name "Khoj" \ --target-arch arm64 \ -i src/interface/web/assets/icons/favicon.icns \ --add-data "src/interface/web:src/interface/web" \ --copy-metadata tqdm \ --copy-metadata regex \ --copy-metadata requests \ --copy-metadata packaging \ --copy-metadata filelock \ --copy-metadata numpy \ --copy-metadata tokenizers ```	2022-08-08 23:23:02 +03:00
Debanjum Singh Solanky	ef009323e7	Use sys.exit to quit via system tray. Fix pip install cmd in Readme	2022-08-08 21:42:36 +03:00
Debanjum Singh Solanky	eacd95bebd	Start Creating Native Configure Page using PyQt	2022-08-08 18:31:47 +03:00
Debanjum Singh Solanky	dddc57e132	Rename get-enabled-search-types to get-enabled-content-types as more appropriate	2022-08-07 18:53:14 +03:00
Debanjum Singh Solanky	127c6e78df	Only show keybindings for enabled search types in simple info menu too Convert the khoj--keybindings-info-message into a func Dynamically generate info menu Show keybindings for enabled search types only	2022-08-07 18:40:35 +03:00
Debanjum Singh Solanky	d08c25b62b	Make default search type used in the Emacs interface configurable	2022-08-07 18:24:53 +03:00
Debanjum Singh Solanky	5a10c47499	Allow setting music as search type in khoj.el. Had forgotten to include it earlier	2022-08-07 18:24:53 +03:00
Debanjum Singh Solanky	ebee716026	Only show keybindings reference for enabled search types in khoj.el	2022-08-07 18:24:53 +03:00
Debanjum Singh Solanky	6dc9801f45	Get Khoj search-types enabled by user in Emacs	2022-08-07 18:24:53 +03:00
Debanjum Singh Solanky	f3c1512c38	Fix to let user to start enter query right after initiating khoj on emacs - Fix regression since moving to use `which-key-show-full-keymap~ - The above function reads user keypress, so eats up 1 keypress before starting to enter query - No way to pass no-paging config via the external function to the internally used which-key--show-keymap function that does allow setting no-paging to not read user keypress - So use the internal function instead and set no-paging arg to t	2022-08-07 15:57:08 +03:00
Debanjum Singh Solanky	e95686c89c	Show complete Khoj keybindings when initiate search in Emacs - The keybindings to select search types was previously confusing as it only highlighted the final symbol to press (the C-x was shown but it wasn't made apparent that it had to be pressed before) - Previously some keybindings unrelated to khoj were also being shown in the which-key popup. Now only the khoj keybindings are visible	2022-08-06 16:36:57 +03:00
Debanjum Singh Solanky	4696eadc02	Fix definition of khoj--search-<content-type> functions in khoj.el	2022-08-06 15:19:01 +03:00
Debanjum Singh Solanky	c5bf051a29	Rename initialize_{search,processor,server} to configure_{search,procesor,server} - Search is being reconfigured multiple times in /regenerate and n/reload. More appropriate name is configure_ rather than initialize_ for it - Standardize name of methods under configure.py	2022-08-06 03:23:02 +03:00
Debanjum Singh Solanky	7b04978f52	Put global state variables into separate state module - Variables storing app, device state aren't constants. Do not mix with actual constants like empty_escape_sequence, web_directory	2022-08-06 03:13:18 +03:00
Debanjum Singh Solanky	b04c84721b	Extract configure and routers from main.py into separate modules - Main.py was becoming too big to manage. It had both controllers/routers and component configurations (search, processors) in it - Now that the native app GUI code is also getting added to the main path, good time to split/modularize/clean main.py - Put global state into a separate file to share across modules	2022-08-06 02:39:18 +03:00
Debanjum Singh Solanky	083fefdd07	Create Native Menu Bar with PyQt to open Search, Config webpages - Run FastAPI server in a separate thread. - This allows starting both the server and gui in parallel - Create System Tray for Khoj - Contains menu items that open search or config pages in browser - Rearrange code to have only the code required to start Backend and GUI in the run() method - Move the backend setup code into a separate method	2022-08-06 01:00:25 +03:00
Debanjum Singh Solanky	9fa3345000	Show available Khoj keybindings to customize search using which-key Fallback to showing simple khoj keybindings info message in echo area when which-key not available	2022-08-05 20:24:29 +03:00
Debanjum Singh Solanky	6a8b2a6936	Do not run incremental search when query is empty	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	609cd6e8bb	Show keybindings to set khoj search type in echo area to assist user	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	48e4a983c5	Allow switching search type in the middle of querying Khoj on Emacs - More generally, this allows configuring the khoj search anytime while in khoj minibuffer window - Earlier could only configure search type at the start of the search	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	48c33b93cc	Generalize khoj keymap to func that can update existing keybdings	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	19c4701f3f	Default to ledger search from files with .beancount extensions	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	cc9a395e0a	Keep name of buffer for Khoj results in a variable	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	0a5c6d067a	Do not prompt user to set search type before querying Khoj via Emacs - What - Default to last used search type, when no search type specified - Allow user to change search type before they enter query (and after they've called khoj), if they want - Why - Reduce time from intent to results by using reasonable defaults - Make interactions smoother, more intuitive	2022-08-05 19:35:38 +03:00
Debanjum Singh Solanky	24ccba74d4	Put type dropdown, regenerate button on same row. Regain screen space	2022-08-05 06:17:43 +03:00
Debanjum Singh Solanky	017e287b8a	Remove redundant query as title in results section - Regain screen real-estate - Remove unused parameters, html being returned by org.js	2022-08-05 06:17:25 +03:00
Debanjum Singh Solanky	06afeec7e2	Hide stars of org entry results on Emacs to reduce visual clutter They've all been normlized to the same level and hence don't hold much data. So good opportunity to reduce, non-useful visual clutter	2022-08-05 05:27:57 +03:00
Saba	d1fe6353b5	Check whether processor_config exists during shutdown event	2022-08-04 21:57:36 -04:00
Debanjum Singh Solanky	4d4d2ff921	Ensure all org entries are unfolded in results buffer on Emacs	2022-08-05 04:54:29 +03:00
Debanjum Singh Solanky	49ef741d4b	Prevent Zoom on Input in Web Interface. Document Pip upgrade in Readme - Name /Reload API Controller Reload	2022-08-05 03:51:34 +03:00
Debanjum Singh Solanky	675e821d95	Make embeddings, jsonl paths absolute. Create directories if non-existent	2022-08-05 02:57:59 +03:00
Debanjum Singh Solanky	d5b43eb836	Use input filter in image search setup. Input filter wasn't used earlier	2022-08-05 02:40:03 +03:00
Debanjum Singh Solanky	ca5a8bd113	Make config file a positional argument, as it is required - Test invalid config file path throws. Remove redundant cli test - Simplify cli parser code - Do not need to explicitly check if args.config_file set. argparser checks for positional arguments automatically - Use standard semantics for cli args - All positional args are required. Non positional args are optional - Improve command line --help description	2022-08-05 01:09:40 +03:00
Debanjum Singh Solanky	1374065092	Mark all required fields for config. Throw if no input_* field specified - Add custom validator to throw if neither input_filter or input_<files\|directories> are specified - Set field expecting paths to type Path - Now that default_config isn't used in code. We can update fields in rawconfig to specify whether they're required or not. This lets pydantic validate config file and throw appropriate error	2022-08-05 01:08:48 +03:00
Debanjum Singh Solanky	f78d6ae754	Create khoj_sample file with all configurable fields in one place - Reason - Simplifies code. No merge_dict required - 1 place for user to see all configurables, defaults and required values - Details - Remove default_config from code. Set defaults in khoj_sample.yml itself - Keep fields required to be set by user as empty in khoj_sample to YAML - Set defaults for fields not requiring configuration by user	2022-08-05 01:08:33 +03:00
Debanjum Singh Solanky	3abf3e5ee0	Update merge_dicts to recursively merge the dictionaries Previously it was only merging dictionary at the first/top level	2022-08-04 22:46:20 +03:00
Debanjum Singh Solanky	61c26ba611	Only show large Khoj favicon on web interface - Do not want browsers to use the small, grainy favicons - Firefox for Android does use the bigger icon, when it's the only one available - Update svg to match the 144x144 ratio just for consistency	2022-08-04 14:33:29 +03:00
Debanjum Singh Solanky	1649fa644c	Autofocus on Query field in Web Interface. Improve time to query	2022-08-04 05:23:19 +03:00
Debanjum Singh Solanky	71fcb1087f	Add icons for web interface to render on more browsers and as PWA Safari, Firefox for Android etc don't support SVG Favicons yet	2022-08-03 18:52:41 +03:00
Debanjum Singh Solanky	5b6b7ec123	Delete khoj network connections on incremental search teardown on Emacs interface Currently only get into this state when debug breakpoints on backend are keeping the connection open and user exits khoj search from Emacs Results in a number of open connections that slow khoj down.	2022-08-03 18:52:41 +03:00
Debanjum Singh Solanky	555c1088cc	Cache queries in /search controller using LRU cache - Most concretely right now, it eliminates the re-rank latency hit on re-rank triggered on user hitting enter after re-rank is already done on user idle in the emacs interface - Improves search latency of (incremental) search	2022-08-03 18:52:41 +03:00
Debanjum Singh Solanky	38df727ef4	Fix escape sequence usage in strings. Remove unneeded import of os Rename /config API method to config to match it's purpose. UI is anyway too generic, and not what it is doing	2022-08-03 18:51:55 +03:00
Debanjum Singh Solanky	f642450ed9	Disable Incremental Search for Images on Web Bug introduced in commit `da118b3fed`	2022-08-03 11:52:51 +03:00
Debanjum Singh Solanky	b9e6273644	Include interfaces in pip package. Fix paths to web interface in app	2022-08-03 00:02:39 +03:00
Debanjum Singh Solanky	1b55462fb0	Convert search_filter, conversation dir to proper modules Add __init__.py files to their directories	2022-08-02 20:23:42 +03:00
Debanjum Singh Solanky	5108d45951	Wrap application startup steps into a method	2022-08-02 20:13:14 +03:00
Debanjum Singh Solanky	0ebfbb43ce	Nest org, md results at level 2 on Emacs interface. Improve readability - Makes it easier to fold/unfold, traverse and read results - This 2 level nesting is already being used on the web interface - Previously we were using the original nesting depth of the entry. This was aimed at providing more of the orginal context of the results. But currently this additional information does not provide as much, for the decreased legibility of the results	2022-08-01 04:01:18 +03:00
Debanjum Singh Solanky	1201bfddf3	Simplify name of config css from config-style.css to config.css	2022-08-01 01:34:00 +03:00
Debanjum Singh Solanky	075dba5d64	Use Khoj Title, Favicon in Config Page for Consistency	2022-08-01 01:27:14 +03:00
Debanjum Singh Solanky	56a4429f01	Move web interface to configure application into src/interface/web directory - Improve code layout by ensuring all web interface specific code under the src/interface/web directory - Rename config API to more specifi /config instead of /ui - Rename config data GET, POST api to /config/data instead of /config	2022-08-01 00:53:42 +03:00
Debanjum	bb2ccec1ca	Populate type dropdown on the web interface with only enabled search types - Previously we were statically populating types dropdown field in the web interface with all available search types - This change populates the type dropdown field with only search types that are enabled/configured - It queries the `/config` backend API to see which of the available search types are configured	2022-08-01 00:20:45 +03:00
Debanjum Singh Solanky	8b6058c879	Fix instantiating type field with value from URL query parameter - Populate via `.then` after enabled search types in dropdown are populated - Call to `/config` API is async and will usually complete after the value of type field is set from url - So value of type field would earlier be overridden when search types dropdown is populated after the call to `/config` API completes	2022-08-01 00:04:50 +03:00
Debanjum Singh Solanky	be253bab39	Populate type dropdown with only enabled search types in web interface - Get /config API and check config for which available search types is populated. This gives us the list of enabled search types - Dynamically populate search type field with enabled search types only	2022-07-31 23:42:00 +03:00
Debanjum Singh Solanky	0abd40aeb7	Only set query field when appropriate query param passed via URL - Setting query value to default option when query param wasn't passed via URL was overriding placeholder text in query field - We wanted placeholder text in field, not the query field to actually be populated by placeholder text - This clears field when user starts typing query into the query field, instead of them having to manually delete the default text populated	2022-07-31 22:29:23 +03:00
Debanjum Singh Solanky	17c38b526a	Default config for each search types to None - Setting up default compressed-jsonl, embeddings-file was only required for org search_type, while org-files and org-filter were allowed to be passed as command line argument - This avoided having to set compressed-jsonl and embeddings-file via command line argument as well for org search type - Now that all search types are only configurable via config file, We can default all search types to None. The default config for the rest of the search types wasn't being used anyway	2022-07-31 22:23:57 +03:00
Debanjum Singh Solanky	b83021a723	Improve code readability of merge_dicts helper method	2022-07-31 22:07:56 +03:00
Debanjum Singh Solanky	38aede68f2	Only configure org via config file for consistency across search types - Previously org-files were configurable via cmdline args. Where as none of the other search types are - This is an artifact of how the application grew - It can be removed for better consistency and equal preference given all search types	2022-07-31 22:02:03 +03:00
Saba	b55159f5bd	Fix URL for khoj.el quelpa setup instructions	2022-07-29 23:01:04 -04:00
Debanjum Singh Solanky	da118b3fed	Simplify incremental search function used in web interface Re-rank isn't passed to image search API in search function. So don't need to check type in incremental_search function too	2022-07-29 23:18:01 +04:00
Debanjum Singh Solanky	3079614981	Allow set up of search form via query params in web interface - Default search type to org, instead of images	2022-07-29 23:13:26 +04:00
Debanjum Singh Solanky	02ca2c05a1	Add Eagle Icon for Khoj to Web, Emacs Interfaces and Readme	2022-07-29 17:50:29 +04:00
Debanjum Singh Solanky	78314263a0	Add Table of Contents, Features, Performance Details to Readme	2022-07-29 17:08:17 +04:00
Debanjum Singh Solanky	ed181f47c9	Prettify rendering of org music results on Khoj web interface	2022-07-29 04:28:22 +04:00
Debanjum Singh Solanky	7e5291a38e	Make org result headings at same level. Improve spacing of results Having org-mode result headings change size based on their depth in the source document makes is a confusing UI experience. Improve font-size, line-spacing and margins of results to make delineation between entries, and differntiating between entry heading and it's body easier to visually infer. Do not white-space: pre-line. Improves rendering of Markdown results	2022-07-29 01:55:46 +04:00
Debanjum Singh Solanky	4d5183063c	Create images directory if doesn't exist, to store image search results	2022-07-28 21:30:31 +04:00
Debanjum Singh Solanky	a9bc17a6b0	Prettify Render of Markdown Results in Web Interface	2022-07-28 20:56:37 +04:00
Debanjum Singh Solanky	a6ae74f52e	Move JS files like org.js into a separate assets/ directory	2022-07-28 20:46:48 +04:00
Debanjum Singh Solanky	a12eaa4ce0	Move Khoj image results into a child images/ directory	2022-07-28 20:45:12 +04:00
Debanjum	a71253e137	Support Incremental Search on Web Interface ## Support Incremental Search on Khoj Web Interface - Use default, fast path to query /search API while user is typing - Upgrade to cross-encoder re-ranked results once user hits enter on search box ## Improve Render of Org Results on Web Interface - We were previously just wrapping results from /search API into a pre formatted div field. This was not easy to read - Use [org.js](https://mooz.github.io/org-js/) to render results from Khoj `/search` API as proper HTML - Improve org.js to render all task states, stylize task tags and make org-mode results look more like original content Closes #42 #41	2022-07-28 09:31:57 -07:00
Debanjum Singh Solanky	e8029bf415	Extract and Highlight org-mode tags in HTML render of search results	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	c6c248df26	Improve styling of org-mode results to original alignment, line breaks	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	9f59897eeb	Highlight all org-mode task states in HTML. Not just TODO, DONE. - Make logic to extract, mark todo state in org.js more generic - Add default todo state styling to html	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	f040b3f65c	Stylize TODO/DONE states with CSS	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	581b6097c7	Clean Results. Remove TOC, Heading Number and Property Drawers	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	965a93a2f2	Add Basic HTML Rendering of Org-Mode Results	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	1da44d4dfe	Add Incremental Search to Khoj Web Interface	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	af1dd31401	Do not pass verbose argument to image_search.query() as not supported	2022-07-28 19:52:58 +04:00
Debanjum Singh Solanky	80ac10835c	Rerank results on normal minibuffer exit In current state: - Rerank results: - If user idles while entering query OR - exits normally - Do not rerank results: - If user exits abnormally, e.g via C-g from query	2022-07-28 03:37:16 +04:00
Debanjum Singh Solanky	1b759597df	Make incremental search more robust. Follow standard user expectations - Rename functions to more standard, descriptive names - Keep known, required code for incremental search - E.g Do not set buffer local flag in hooks on minibuffer setup - Only query when user in khoj minibuffer - Use active-minibuffer-window and track khoj minibuffer - (minibuffer-prompt) is not useful for our use-case here - (For now) Run re-rank only if user idle while querying - Do not run rerank on teardown/completion - The reranking lag (~2s) is annoying; hit enter, wait to see results - Also triggered when user exits abnormally, so C-g also results in rerank which is even more annoying - Emacs will still hang if re-ranking gets triggered on idle but that's better than always getting triggered. And better than not having mechanism to get results re-ranked via cross-encoder at all	2022-07-28 02:52:27 +04:00
Debanjum Singh Solanky	9a6eee31be	Make number of results to get from Khoj API customizable in khoj.el	2022-07-27 18:55:18 +04:00
Debanjum Singh Solanky	9302b45fe0	Use khoj-incremental as the main khoj func. Rename khoj to khoj-simple - Update khoj-simple to work cross-encoder re-ranked results like before - Increment major version as incremental search considered a breaking change and a major update to search capability	2022-07-27 18:18:17 +04:00
Debanjum Singh Solanky	09727ac3be	Make bi-encoder return fewer results to reduce cross-encoder latency	2022-07-27 07:26:02 +04:00
Debanjum Singh Solanky	9ab3edf6d6	Re-rank incremental search results using cross-encoder if user idle This provides a relatively smooth mechanism - to improve relevance of results on idle - while providing the rapid, incremental results while typing	2022-07-27 07:25:42 +04:00
Debanjum Singh Solanky	ad242cafa7	Support querying all text search types in incremental search - Before incremental search was hard-coded to only query org	2022-07-27 07:25:42 +04:00
Debanjum Singh Solanky	bfcb962cbe	Use post-command-hook to only query on user input - Hooking into after-change-functions results in system logs triggering query	2022-07-27 07:25:42 +04:00
Debanjum Singh Solanky	0d49398954	Reuse code to query api, render results. Formalize method, arg names	2022-07-27 07:25:42 +04:00
Debanjum Singh Solanky	fd1963d781	Implement Basic Incremental Search Interface in Emacs for Org Mode Notes	2022-07-27 03:05:00 +04:00
Debanjum Singh Solanky	3fa7d8f03a	Skeleton to allow incremental search on Khoj via Emacs	2022-07-27 02:48:27 +04:00
Debanjum Singh Solanky	1168244c92	Make cross-encoder re-rank results if query param set on /search API - Improve search speed by ~10x Tested on corpus of 125K lines, 12.5K entries - Allow cross-encoder to re-rank results by settings &?r=true when querying /search API - It's an optional param that default to False - Earlier all results were re-ranked by cross-encoder - Making this configurable allows for much faster results, if desired but for lower accuracy	2022-07-26 22:56:36 +04:00
Debanjum Singh Solanky	b1e64fd4a8	Improve search speed. Only apply filter if filter keywords in query - Formalize filters into class with can_filter() and filter() methods - Use can_filter() method to decide whether to apply filter and create deep copies of entries and embeddings for it - Improve search speed for queries with no filters as deep copying entries, embeddings takes the most time after cross-encodes scoring when calling the /search API Earlier we would create deep copies of entries, embeddings even if the query did not contain any filter keywords	2022-07-26 22:47:26 +04:00
Debanjum Singh Solanky	f094c86204	Trace query response performance and display timings in verbose mode	2022-07-26 21:03:53 +04:00
Debanjum Singh Solanky	65fea7681a	Rename notes search type to org search, now that markdown notes supported	2022-07-21 22:09:44 +04:00
Debanjum Singh Solanky	4c24202e42	Update documentation. Simplify, reflect current capabilities	2022-07-21 22:09:44 +04:00
Debanjum Singh Solanky	d4d7dbaca6	Support Natural Search on Markdown Files - Reason: Allow natural search on markdown based notes, documentation, websites etc - Details: - Create markdown processor to extract Markdown entries (identified by Heading) into standard jsonl format required by text_search - Update API, Configs to support interfacing with new markdown type - Update Emacs, Web clients to support interfacing with new markdown type via API - Update Readme to mentiond markdown is also supported Closes #35	2022-07-21 22:07:05 +04:00
Debanjum Singh Solanky	0602d018c0	Merge Symmetric, Asymmetric Search Types into a single Text Search Type - The code for both the text search types were mostly the same It was earlier done this way for expedience while experimenting - The minor differences were reconciled and merged into a single text_search type - This simplifies the app and making it easier to process other text types	2022-07-21 21:19:52 +04:00
Debanjum Singh Solanky	0917f1574d	Consolidate jsonl helper methods in a single file under utils module	2022-07-21 03:30:13 +04:00
Debanjum Singh Solanky	de726c4b6c	Minor fixes to unused installer utility script	2022-07-21 03:30:13 +04:00
Debanjum Singh Solanky	5aad297286	Reuse logic to extract entries across symmetric, asymmetric search Now that the logic to compile entries is in the processor layer, the extract_entries method is standard across (text) search_types Extract the load_jsonl method as a utility helper method. Use it in (a)symmetric search types	2022-07-21 02:53:18 +04:00
Debanjum Singh Solanky	e220ecc00b	Generate compiled form of each transaction directly in the beancount processor - The logic for compiling a beancount entry (for later encoding) now completely resides in the org-to-jsonl processor layer - This allows symmetric search to be generic and not be aware of beancount specific properties that were extracted by the beancount-to-jsonl processor layer - Now symmetric search just expects the jsonl to (at least) have the 'compiled' and 'raw' keys for each entry. What original text the entry was compiled from is irrelevant to it. The original text could be location, transaction, chat etc, it doesn't have to care	2022-07-21 02:43:28 +04:00
Debanjum Singh Solanky	06cf425314	Generate compiled form of each entry directly in the org-mode processor - The logic for compiling an org-mode entry (for later encoding) now completely resides in the org-to-jsonl processor layer - This allows asymmetric search to be generic and not be aware of org-mode specific properties that were extracted by the org-to-jsonl processor layer - Now asymmetric search just expects the jsonl to (at least) have the 'compiled' and 'raw' keys for each entry. What original text the entry was compiled from is irrelevant to it. The original text could be mail, chat, markdown, org-mode etc, it doesn't have to care	2022-07-21 02:08:02 +04:00
Debanjum Singh Solanky	4ead79d272	Make Notes Search Natural Language Date Aware - Pass Scheduled, Closed Dates of Entries to Include in Embeddings - The (new?) model seems to understand dates. So can give more relevant entries if date in natural language mentioned in query - E.g "Went Surfing with Friends" vs "Went Surfing with Friends in 1984" will give different results, with the second prioritizing entries mentioning any entries with closed, scheduled dates from 1984	2022-07-21 01:06:49 +04:00
Debanjum Singh Solanky	d50bfb5188	Parse Logbook Entries in the OrgNode parser for Org-Mode. Update tests	2022-07-21 00:15:30 +04:00
Debanjum Singh Solanky	70e70d4b15	Rename 'embed' key to more generic 'compiled' for jsonl extracted results - While it's true those strings are going to be used to generated embeddings, the more generic term allows them to be used elsewhere as well - Their main property is that they are processed, compiled for usage by semantic search - Unlike the 'raw' string which contains the external representation of the data, as is	2022-07-20 20:35:50 +04:00
Debanjum Singh Solanky	c1369233db	Consistently use "entry", "score" in json response for all search types - Had already made some progress on this earlier by updating the image search responses. But needed to update the text search responses to use lowercase entry and score - Update khoj.el to consume the updated json response keys for text search	2022-07-20 20:33:27 +04:00
Debanjum Singh Solanky	d68a9dc445	Sort extracted images before computing their embeddings - Image order returned by glob is OS dependent - This prevented sharing image embeddings across machines running different OS - A stable sort order for processed images allows sharing embeddings across machines. - Use case: A more powerful, always on machine actually computes the image embeddings regularly The client machine just load these periodically to provide semantic search functionality	2022-07-20 03:51:27 +04:00
Debanjum Singh Solanky	c4c7f38b15	Fix extracting image names from multiple image directories	2022-07-20 03:40:49 +04:00
Debanjum Singh Solanky	bdc1b9f2bb	Resolve edge case errors in encoding image metadata - Handle case where current image batch smaller than batch_size - Handle case where no XMP metadata for current image - return empty strings in such a scenario instead of ". "	2022-07-20 02:58:43 +04:00
Debanjum Singh Solanky	2a5445216c	Image input directory not required by collate result as image_name already absolute path	2022-07-20 02:56:23 +04:00
Debanjum Singh Solanky	6c9ffdba57	Allow indexing multiple image directories for image search	2022-07-20 02:56:01 +04:00
Debanjum Singh Solanky	70221bb038	Allow filtering transactions by date in symmetric ledger	2022-07-19 20:58:24 +04:00
Debanjum Singh Solanky	b673d26a12	Extract Entries in a standardized format across text search types Issue: - Had different schema of extracted entries for symmetric_ledger vs asymmetric - Entry extraction for asymmetric was dirty, relying on cryptic indices to store raw entry vs cleaned entry meant to be passed to embeddings - This was pushing the load of figuring out what property to extract from each entry to downstream processes like the filters - This limited the filters to only work for asymmetric search, not for symmetric_ledger - Fix - Use consistent format for extracted entries { 'embed': entry_string_meant_to_be_passed_to_model_and_get_embeddings, 'raw' : raw_entry_string_meant_to_be_passed_to_use } - Result - Now filters can be applied across search types, and the specific field they should be applied on can be configured by each search type	2022-07-19 20:52:25 +04:00
Debanjum Singh Solanky	e66cd5bf59	Only extract transactions from Beancount - Earlier was extracting all entries starting with dates but the other type of entries like account open/close, asserts etc aren't useful for querying	2022-07-19 19:50:58 +04:00
Debanjum Singh Solanky	732b2d287f	Give the project a short, less generic name. Rename it to Khoj - Semantic Search was just a placeholder used to test the idea out Didn't want to get into naming at that point of time	2022-07-19 18:26:16 +04:00
Debanjum Singh Solanky	989526ae54	Use a more accurate model for symmetric semantic search - The all-MiniLM-L6-v2 is more accurate - The exact previous model isn't benchmarked but based on the performance of the closest model to it. Seems like the new model maybe similar in speed and size - On very preliminary evaluation of the model, the new model seems faster, with pretty decent results	2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky	4a90972e38	Use a better model for asymmetric semantic search - The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1] - It has the right mix of model query speed, size and performance on benchmarks - On hugging face it has way more downloads and likes than the msmarco model[2] - On very preliminary evaluation of the model - It doubles the encoding speed of all entries (down from ~8min to 4mins) - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier) [1]: https://www.sbert.net/docs/pretrained_models.html [2]: https://huggingface.co/sentence-transformers	2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky	5e302dbcda	Fix using 1 column layout on small screens	2022-07-18 02:40:16 +04:00
Debanjum Singh Solanky	7d16b673b1	Use Single Column Layout for Small Screens on Web Interface	2022-07-18 02:08:52 +04:00
Debanjum Singh Solanky	31a221a76b	Auto focus cursor on query input box to simplify, speed interactions - Avoids having to click the query input box - Just open page, type whatever and hit enter to do image search - For other search types select appropriate type from dropdown	2022-07-16 19:39:15 +04:00
Debanjum Singh Solanky	06b0c720d6	Improve Rendering of Image Search Results in Emacs - Use shr to render image response from html in result buffer Earlier was using org-mode. But rendering HTML with shr seems cleaner - Use Headings to Add highlights - Use Random to Force fetch of Image. Similar to what was done for Web interface - Remove trailing elisp brackets from response - Show query match scores by image model for each image in results	2022-07-16 19:31:49 +04:00
Debanjum Singh Solanky	28ec9af589	Extract image URL location from response in elisp after API update	2022-07-16 18:43:55 +04:00
Debanjum Singh Solanky	47613cba1f	Improve Landing Page Look in General and Layout for Mobile - Ask for 6 Images to Fill Grid into 3x2 Layout - Submit Form on Hitting Enter	2022-07-16 16:55:13 +04:00
Debanjum Singh Solanky	cf207d6ebe	Add title, heading to the semantic search web interface	2022-07-16 03:44:29 +04:00
Debanjum Singh Solanky	e0d8398b27	Normalize metadata match score to work better with image match score - Metadata match score were consistently giving higher scores by a factor of ~3x wrt to image match score. This was resulting in all results being from the metadata match with query and none from the image match with query. - Scaling the metadata match scores down by scaling factor seems to give more consistently give a blend of results from both image and metadata matches	2022-07-16 03:39:33 +04:00
Debanjum Singh Solanky	a3fc82817d	Log and continue on image metadata encoding error due to Tensor size mismatch	2022-07-16 03:39:19 +04:00
Debanjum Singh Solanky	f26d0ddbbd	Minor fix to asymmetric search when no entries returned	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	ca3f93e641	Add button on web interface to regenerate embeddings of specified type	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	231cc91e14	Force reload of images every time user clicks search button Adding a random, unused url param at the end of the img.src string fixes the issue. As the browser thinks it's a new image and doesn't use the image data that's already cached because of which it wasn't even making the fetch call for the image	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	a6aef62a99	Create Basic Landing Page to Query Semantic Search and Render Results - Allow viewing image results returned by Semantic Search. Until now there wasn't any interface within the app to view image search results. For text results, we at least had the emacs interface - This should help with debugging issues with image search too For text the Swagger interface was good enough	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	4e27ae0577	Ease access to image result for given query by image_search - Copy images to accessible directory - Return URL paths to them to ease access - This is to be used in the web interface to render image results directly in browser - Return image, metadata scores for each image in response as well This should help get a better sense of image scores along both XMP metadata and whole image axis	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	801e59a20d	Allow explicit filters when querying Ledger transactions	2022-07-15 23:41:54 +04:00
Debanjum Singh Solanky	0e979587e0	Add configurable filter support to Symmetric Ledger Search	2022-07-14 23:40:41 +04:00
Debanjum Singh Solanky	85077bc1d1	Handle unparseable date range passed via date filter in query - Do not reuse the same list - Just create new list, so only parsed data is in it	2022-07-14 22:47:23 +04:00
Debanjum Singh Solanky	a60de2c02b	Include date filter in asymmetic search on music as well	2022-07-14 22:37:17 +04:00
Debanjum Singh Solanky	c3b3e8959d	Put entry splitting regex in explicit filter into a variable for code readability	2022-07-14 22:00:10 +04:00
Debanjum Singh Solanky	3aac3c7d52	Run explicit filter on raw entry, add more terms to split entries by - With \t Last Word in Headings was suffixed by \t and so couldn't be filtered by - User interacts with raw entries, so run explicit filters on raw entry - For semantic search using the filtered entry is cleaner, still	2022-07-14 21:54:04 +04:00
Debanjum Singh Solanky	7640e2ab0c	Wrap attempt to extract dates from entry in try/catch - Not all YYYY-MM-DD strings in entry are necessarily dates	2022-07-14 21:38:00 +04:00
Debanjum Singh Solanky	9de2097182	Fix date filter usage with multi word queries. Simplify date regex	2022-07-14 21:34:33 +04:00
Debanjum Singh Solanky	dcb6fe479e	Fix date_filter query, entry in query range check. Add tests for it - Fix date_filter date_in_entry within query range check - Extracted_date_range is in [included_date, excluded_date) format - But check was checking for date_in_entry <= excluded_date - Fixed it to do date_in_entry < excluded_date - Fix removal of date filter from query - Add tests for date_filter	2022-07-14 20:01:35 +04:00
Debanjum Singh Solanky	011f81fac5	Fix date_filter to handle non overlapping date ranges	2022-07-14 18:53:38 +04:00
Debanjum Singh Solanky	70ac35b2a5	Compute Date Range to filter entries to, from Comparators, Dates in Query	2022-07-14 18:20:09 +04:00
Debanjum Singh Solanky	e6db3e3d00	Prefer Dates From Future only when specific words in date string - Default to looking at dates from past, as most notes are from past - Look for dates in future for cases where it's obvious query is for dates in the future but dateparser's parse doesn't parse it at all. E.g parse('5 months from now') returns nothing - Setting PREFER_DATES_FROM_FUTURE in this case and passing just parse('5 months') to dateparser.parse works as expected	2022-07-14 18:13:12 +04:00
Debanjum Singh Solanky	4a201d52af	Add, test date filter regex and date parsing to get natural date range	2022-07-14 16:47:32 +04:00
Debanjum Singh Solanky	b54588717f	Filter for entries with dates specified by user in query - Create Date filter - Users can pass dates in YYYY-MM-DD format in their query - Use it to filter asymmetric search to user specified dates	2022-07-14 00:51:02 +04:00
Debanjum Singh Solanky	b82aef26bf	Make filters to apply before semantic search configurable Details -- - The filters to apply are configured for each type in the search controller - Muliple filters can be applied on the query, entries etc before search - The asymmetric query method now just applies the passed filters to the query, entries and embeddings before semantic search is performed Reason -- This abstraction will simplify adding other pre-search filters. E.g datetime filter	2022-07-13 16:37:09 +04:00
Debanjum Singh Solanky	c92789d20a	Extract explicit pre-search filter function into a separate module Details -- - Move explicit_filters function into separate module under search_filter - Update signature of explicit filter to take and return query, entries, embeddings - Use this explicit_filter func from search_filters module in query Reason -- Abstraction will simplify adding other pre-search filters. E.g datetime filter	2022-07-13 16:20:04 +04:00
Debanjum Singh Solanky	6d7ab50113	Run Explicit Filter on Entries, Embeddings before Semantic Search for Query - Issue - Explicit filtering was earlier being done after search by bi-encoder but before re-ranking by cross-encoder - This was limiting the quality of results being returned. As the bi-encoder returned results which were going to be excluded. So the burden of improving those limited results post filtering was on the cross-encoder by re-ranking the remaining results based on query - Fix - Given the embeddings corresponding to an entry are at the same index in their respective lists. We can run the filter for blocked, required words before the search by the bi-encoder model. And limit entries, embeddings being considered for the current query - Result - Semantic search by the bi-encoder gets to return most relevant results for the query, knowing that the results aren't going to be filtered out after. So the cross-encoder shoulders less of the burden of improving results - Corollary - This pre-filtering technique allows us to apply other explicit filters on entries relevant for the current query - E.g limit search for entries within date/time specified in query	2022-07-12 18:25:42 +04:00
Debanjum Singh Solanky	7677465f23	Fix passing of device to setup method in /reload, /regenerate API - Use local variable to pass device to asymmetric.setup method via /reload, /regenerate API - Set default argument to torch.device('cpu') instead of 'cpu' to be more formal	2022-06-30 01:32:56 +04:00
Debanjum Singh Solanky	eda4b65ddb	Improve Query Speed. Normalize Embeddings, Moving them to Cuda GPU - Move embeddings to CUDA GPU for compute, when available - Normalize embeddings and Use Dot Product instead of Cosine	2022-06-30 00:59:57 +04:00
Debanjum Singh Solanky	b89fc2f4ac	Add /reload API to reload model embeddings and entries from file - The reload API adds the ability to separate out the loading of embeddings from file without having to restart app or (re-)generate embeddings - Before this the only way to load model from file was by restarting app - The other way to reload the model embeddings by regenerating them was to expensive for larger datasets - This unlocks at least 1 use-case, where - we regenerate model via an app instance running on a separate server and - just reload the generated embeddings on the client device - This allows us to offload the expensive embedding generation compute to a background server while letting - This avoids having to (re-)restart application on client device or be forced to generate embeddings on the client device itself - But it requires the model relevant files to be synced to the client device This can be done with any file syncing application like Syncthing - We can then call /regenerate on server and /reload client on a regular schedule to keep our data up to date on semantic search	2022-06-29 23:47:17 +04:00
Debanjum Singh Solanky	f5d6d1e752	Tiny style fix to separate functions by 2 newlines	2022-06-29 23:47:17 +04:00
Debanjum Singh Solanky	85fbe1c42b	Normalize org notes path to be relative to home directory - This is still clunky but it should be commitable - General enough that it'll work even when a users notes are not in the home directory - While solving for the special case where: - Notes are being processed on a different machine and used on a different machine - But the notes directory is in the same location relative to home on both the machines	2022-06-28 19:16:11 +04:00
Debanjum Singh Solanky	094eaf3fcc	Fix minor bugs in OrgNode parser - Bugs discovered from writing org-node tests	2022-06-17 19:14:54 +03:00
Debanjum Singh Solanky	36495038dd	Fix storing parsed CLOSED date in OrgNode The CLOSED date was getting parsed but not stored Adding setClosed at start also fixed the issue	2022-06-17 16:33:37 +03:00
Debanjum Singh Solanky	1c5754bf95	Simplify storing Tags in OrgNode object - Use Set for Tags instead of dictionary with empty keys - No Need to store First Tag separately - Remove properties methods associated with storing first tag separately - Simplify extraction of tags string in org_to_jsonl - Split notes_string creation into multiple f-string in separate line for code readability	2022-06-17 16:33:37 +03:00
Debanjum Singh Solanky	51a43245d3	Escape square brackets in file+heading based org-mode links	2022-06-17 16:20:19 +03:00
Debanjum Singh Solanky	04610f453a	Include scheduled date, deadline date and close date in repr of org node - Now that excluding the times line from the raw body of node, show it in repr so user can see it for reference - But the model doesn't need to see it for it's embeddings to be confused by	2022-06-17 05:13:48 +03:00
Debanjum Singh Solanky	367d7377df	Ignore scheduled, closed, deadline time and logbook start, end in org node body - Gives cleaner embeddings for semantic search - Hopefully improves results and reduces size, compute	2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky	b77ccadcba	Make property key regex more strict. Property key has to be alphanumeric	2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky	ac9d746444	Fix Tags extraction in Org Node parser - Previous version required two tags at least to work, not sure why - Fixed it to extract all tags, even if only one tag in heading	2022-06-17 04:21:22 +03:00
Debanjum Singh Solanky	fb86be8cd9	Add ID, File+Heading based Links to Org-Mode Entries - Add links to property drawer - This ensures results returned by semantic search contain these links - This allows the user to jump to entry within original file for context - The ID, file+heading based links are more robust to find relevant entry in original file than the line no based link, as edits being done by user to original files between embedding regenerations	2022-06-17 03:11:11 +03:00
Debanjum Singh Solanky	de23fc2051	Revert Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search Sentence Transformer MSMarco Model isn't date aware So no use of adding scheduled, deadline dates to model embeddings for consideration This reverts commit `a2a08d1354`.	2022-06-17 02:57:28 +03:00
Debanjum Singh Solanky	a2a08d1354	Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search	2022-06-17 02:55:27 +03:00
Debanjum Singh Solanky	cfbd5c4ecc	Update global model on regenerate via API	2022-06-17 00:49:06 +03:00
Debanjum Singh Solanky	c78bf84eef	Introduce search api endpoint that auto infers search type intent - Introduce prompt for GPT to automatically extract user's search intent - Expose new search api endpoint to use that to set SearchType being passed to search API - Currently meant as an experimental API to gauge usefulness, extendability. Evaluating for phone or voice use-case	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	8ef7917014	Fix json format passed in prompt to GPT	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	f57b7f65ea	Wrap prompts for GPT in triple quotes to improve prompt readability To prompt improve readability: - Remove newline escape sequence and use actual newline directly - This avoids one long line of text as prompt and - Remove escaping of double quotes	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	1eba7b1c6f	Use empty_escape_sequence constant to strip response text from gpt	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	1c3a1420f8	Update asymmetric extract_entries method to handle uncompressed jsonl This is similar to what was done for the symmetric extract_entries method earlier	2022-02-27 19:03:31 -05:00
Debanjum Singh Solanky	3d8a07f252	Extract empty line escape sequences var into constants file for reuse	2022-02-27 19:01:49 -05:00
Debanjum Singh Solanky	bb5d0d8908	Improve Semantic Search Buffer Names in Emacs - Allow multiple semantic searches buffers to exist simultaneously - Uniquify semantic search buffer namew - Add query and search-type to semantic search buffer name for easier disambiguration, search and find appropriate	2022-02-26 18:30:14 -05:00
Debanjum Singh Solanky	b68558651b	Improve Extraction of Beancount Entries - Only extract entries starting with YYYY-MM-DD from Beancount - Strip Trailing Escape Sequences from Entries	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	b3ac2dd730	Improve Results Rendered on Emacs from Semantic Search on Ledger - Add search query to top of buffer as Beancount comment - Remove trailing ) from response - Separate entries by empty line - Load beancount-mode in semantic search on ledger buffer	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	502c68d4f8	Remove trailling escape sequence in ledger search response entries - Fix loading entries from jsonl in extract_entries method - Only extract Title from jsonl of each entry This is the only thing written to the jsonl for symmetric ledger - This fixes the trailing escape seq in loaded entries - Remove the need for semantic-search.el response reader to do pointless complicated cleanup - Make symmetric_ledger:extract_entries use beancount_to_jsonl:load_jsonl Both methods were doing similar work - Make load_jsonl handle loading entries from both gzip and uncompressed jsonl	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	248aa632c0	Do not throw warning for beancount files with .beancount extension	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	76cd63f4bd	Fix count of processed jsonl entries shown to user by ledger processor Count lines not chars	2022-02-26 17:46:06 -05:00
Saba	33bc62dc19	Fix type of use_xmp_metadata to be bool, rather than str	2022-01-24 21:53:26 -05:00
Debanjum Singh Solanky	179153dc5a	Rename RawConfig Types for Consistency - Naming convention - [ContentType][ConfigType]Config - Where [ConfigType] ~ Content, Search, Processor - Where [ContentType] ~ Text, Image, Asymmetric, Symmetric, Conversation - Current Configs: - Content: - Org Notes - Org Music - Image - Ledger/Beancount - Search: - Asymmetric - Symmetric - Image - Processor: - Conversation	2022-01-14 20:54:38 -05:00
Debanjum Singh Solanky	c64e0c2965	Load model from HuggingFace if model_directory unset in config YAML - Do not save/load the model to/from disk when model_directory unset in config.yml - Add symmetric search default config to cli.py	2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky	510faa1904	Save Image Search Model to Disk	2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky	934ec233b0	Add Search Config for Symmetric Model. Save Model to Disk	2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky	b63026d97c	Save Asymmetric Search Model to Disk - Improve application load time - Remove dependence on internet to startup application and perform semantic search	2022-01-14 17:36:27 -05:00
Debanjum Singh Solanky	2e53fbc844	Fix the user intent extraction prompt for GPT. Clean up chatbot test	2022-01-12 10:36:01 -05:00
Debanjum Singh Solanky	ea28897cdd	Remove deprecated conversation_history field from config	2022-01-12 10:35:52 -05:00
Debanjum Singh Solanky	5a686b7be9	Add logs for chat bot in verbose mode	2022-01-12 10:35:52 -05:00
Debanjum Singh Solanky	6dc2a99d35	Merge branch 'master' of github.com:debanjum/semantic-search into add-summarize-capability-to-chat-bot - Fix openai_api_key being set in ConfigProcessorConfig - Merge addition of config UI and config instantiation updates	2021-12-20 13:30:42 +05:30
Debanjum Singh Solanky	65da7daf1f	Load, Save Conversation Session Summaries to Log. s/chat_log/chat_session Conversation logs structure now has session info too instead of just chat info Session info will allow loading past conversation summaries as context for AI in new conversations { "session": [ { "summary": <chat_session_summary>, "session-start": <session_start_index_in_chat_log>, "session-end": <session_end_index_in_chat_log> }], "chat": [ { "intent": <intent-object> "trigger-emotion": <emotion-triggered-by-message> "by": <AI\|Human> "message": <chat_message> "created": <message_created_date> }] }	2021-12-15 10:17:07 +05:30
Saba	97a6dfaa1e	Use default value False for verbose parameter, and small changes Pass config as parameter to initialize_search, change name of API methods to handle config CRUD operations, and initalize config to FullConfig	2021-12-11 14:13:14 -05:00
Saba	9536358d34	Fix key error model_name issue by upgrade sentence-transformers version Refer to https://github.com/UKPLab/sentence-transformers/issues/1241 Also user verbose flag passed through function parameters in image_search	2021-12-11 11:58:19 -05:00
Saba	ce7a751e6b	Fix passing verbose flag down in symmetric_ledger.py	2021-12-11 11:36:32 -05:00
Saba	d65190c3ee	Update unit tests, files with removing model suffix to config types	2021-12-09 08:50:38 -05:00
Debanjum Singh Solanky	0ac1e5f372	Summarize chat logs and notes returned by semantic search via /chat API	2021-12-08 02:34:07 +05:30
Saba	76e9e9da2f	Update unit tests to use the new BaseModel types	2021-12-05 09:31:39 -05:00
Saba	9b16cdbb41	Use past tense for verbose log	2021-12-04 11:45:44 -05:00
Saba	10e4065e05	Consolidate the search config models and pass verbose as a top level flag	2021-12-04 11:43:48 -05:00
Saba	43e647835b	Append Model Suffixed to config models	2021-12-04 10:51:21 -05:00
Saba	e068968b35	Update imports for raw config models in config.py	2021-12-04 10:44:55 -05:00
Saba	4d6284b0af	Remove Test suffix from Config models	2021-12-04 10:44:13 -05:00
Saba	7fcc8d2cef	Add null check for processor config	2021-12-04 10:11:00 -05:00
Saba	7ca4fc3453	Resolve mrege conflicts with updated processor conversation data model	2021-11-28 16:22:52 -05:00
Saba	87a6c2d716	Use parse_obj instead of parse_raw as incoming data is in dict	2021-11-28 14:34:32 -05:00
Saba	5d50487d83	Linting New line at end of config.html Remove debug print statement	2021-11-28 13:32:56 -05:00
Saba	6f466c8d99	Use global config and add a regenerate button to the config ui' && git push	2021-11-28 13:28:22 -05:00
Saba	34d1e4199c	Use alias generator when deserializing the config file	2021-11-28 13:05:48 -05:00
Saba	19b81e82f0	Write back to the raw config.yml file on update	2021-11-28 12:34:40 -05:00
Saba	8837b02de6	dump updated config to a yaml file	2021-11-28 12:26:07 -05:00
Saba	5b80b87379	Streamline None checking in initialize_search	2021-11-28 12:05:04 -05:00
Saba	bf8ae31e6a	Streamline None checking in initialize_search	2021-11-28 11:59:45 -05:00
Saba	da52433d89	Update to re-use the raw config base models in config.py as well	2021-11-28 11:57:33 -05:00
Saba	6292fe4481	Update to re-use the raw config base models in config.py as well	2021-11-28 11:57:13 -05:00
Saba	311c4b7e7b	Working API request body parsing to /post config!	2021-11-28 11:16:33 -05:00
Saba	66183cc298	Working API request body parsing to /post config!	2021-11-28 11:12:26 -05:00
Debanjum Singh Solanky	5cd920544d	Add GPT method to summarize notes and chat logs	2021-11-28 13:08:05 +05:30
Debanjum Singh Solanky	1785047ea6	Improve understand primer and load understand response as dict	2021-11-28 13:04:16 +05:30
Saba	64645c3ac1	Begin type checking/input validation effort	2021-11-27 21:47:56 -05:00
Saba	9a0264b7fc	Add a dummy POST config endpoint, integrate with editable UI	2021-11-27 20:36:03 -05:00
Saba	f3b03ea5b7	Make raw data reactive to changes	2021-11-27 19:17:15 -05:00
Debanjum Singh Solanky	67c3cd7372	Wire up GPT understand method to /chat API. Log conversation metadata too	2021-11-28 00:04:39 +05:30
Saba	3db06eee3f	Basic example of serving conifg as JSON and retriving on button click	2021-11-27 10:49:33 -05:00
Saba	3d4471e107	Merge branch 'master' of github.com:debanjum/semantic-search into saba/configui	2021-11-27 08:52:48 -05:00
Debanjum Singh Solanky	ccfb97e1a7	Wire up minimal conversation processor. Expose it over /chat API endpoint Ensure conversation history persists across application restart	2021-11-27 18:12:01 +05:30
Debanjum Singh Solanky	a99b4b3434	Make conversation processor configurable	2021-11-27 18:12:01 +05:30
Debanjum Singh Solanky	d4e1120b22	Add GPT based conversation processor to understand intent and converse with user - Allow conversing with user using GPT's contextually aware, generative capability - Extract metadata, user intent from user's messages using GPT's general understanding	2021-11-27 18:12:01 +05:30
Saba	baee52648d	Set up basic ui page with no functionality	2021-11-26 14:51:11 -05:00
debanjum	46661b3057	Ensure top_k never more than total entries to run symmetric search on	2021-11-16 11:32:21 -08:00
debanjum	8c858d1a94	Reduce symmetric search results for cross-encoder to re-rank to improve search speed	2021-11-16 11:31:19 -08:00
Debanjum Singh Solanky	f3fd5ae978	Improve code comments. Do not import unused modules in asymmetric search	2021-11-17 00:58:31 +05:30
Debanjum Singh Solanky	8cf2465e8e	Ensure top_k never more than total entries to search from	2021-11-17 00:56:31 +05:30
Debanjum Singh Solanky	4d37ace3d6	Reduce search results for cross-encoder to re-rank to improve search speed Search time on my notes reduced from 14s to 4s. Cross-encoder re-ranking step takes majority time, not the cosine similarity search	2021-11-17 00:50:28 +05:30
Debanjum Singh Solanky	1832e418e5	Use raw string for regex in orgnode to fix deprecation warning	2021-10-02 17:38:31 -07:00
Debanjum Singh Solanky	f59e321419	Update CLIP model load path	2021-10-02 16:50:06 -07:00
Debanjum Singh Solanky	c47a8cdf16	Allow configuring host, port or unix socket of server via CLI	2021-10-02 16:16:33 -07:00
Debanjum Singh Solanky	516f28b082	Merge branch 'master' of github.com:debanjum/semantic-search	2021-09-30 04:17:32 -07:00
Debanjum Singh Solanky	d2905c4be6	Move tests out to project root. Use absolute import in project tests/ directory in project root is more standard. Just had to use absolute path for internal module imports to get it to work	2021-09-30 04:12:14 -07:00
Debanjum Singh Solanky	58bb420f69	Fix image_metadata argument ordering bug. Add E2E image search test - Image search test seems a little flaky - Interchanged argument was causing inaccurate results earlier	2021-09-30 03:30:47 -07:00
Debanjum Singh Solanky	d5597442f4	Modularize Code. Wrap Search, Model Config in Classes. Add Tests Details - Rename method query_* to query in search_types for standardization - Wrapping Config code in classes simplified mocking test config - Reduce args beings passed to a function by passing it as single argument wrapped in a class - Minimize setup in main.py:__main__. Put most of it into functions These functions can be mocked if required in tests later too Setup Flow: CLI_Args\|Config_YAML -> (Text\|Image)SearchConfig -> (Text\|Image)SearchModel	2021-09-30 02:04:04 -07:00
Debanjum Singh Solanky	f4dd9cd117	Use type specific model for other search types too. Expose them via SearchModels - Wrap Image, Music, Ledger search into the type of SearchModel they use Similar to what was done for notes model by wrapping it's config into an AsymmetricSearchModel. - Use the uber wrapper class to expose all type specific search models	2021-09-29 21:09:42 -07:00
Debanjum Singh Solanky	352d2930ee	Use multiple threads to generate model embeddings. Other minor formating	2021-09-29 20:47:58 -07:00
Debanjum Singh Solanky	e22e0b41e3	Wrap asymmetric search model into SearchModels. Test notes search end-to-end - Wrap asymmetric search model parameters into AsymmetricSearchModel class - Create wrapper for all search type models. Put notes search model into it - Test notes search end-to-end from client API layer to results. Use model build on test data	2021-09-29 20:47:35 -07:00
Debanjum Singh Solanky	cde11a2331	Wrap search type enablement status in a search settings class - Cleaner, more idiomatic usage of a global variable - Simplifies mocking when testing client in pytest as setting wrapped in object rather than a simple type. So passed around by reference	2021-09-29 19:18:33 -07:00
Debanjum Singh Solanky	81ce0cacc3	Only allow supported search types to /search, /regenerate APIs - Use a SearchType to limit types that can be passed by user - FastAPI automatically validates type passed in query param - Available type options show up in Swagger UI, FastAPI docs - controller code looks neater instead of doing string comparisons for type - Test invalid, valid search types via pytest	2021-09-29 19:12:56 -07:00
Debanjum Singh Solanky	5db08c5293	Set query as heading of notes search results in Emacs Org buffer	2021-09-29 13:30:15 -07:00
Debanjum Singh Solanky	fdb60a8dcf	Set Query as Heading of Image Search Results Emacs Buffer	2021-09-16 12:30:06 -07:00
Debanjum Singh Solanky	169ddcc8c6	Make Using XMP Metadata to Enhance Image Search Optional, Configurable - Break the compute embeddings method into separate methods: compute_image_embeddings and compute_metadata_embeddings - If image_metadata_embeddings isn't defined, do not use it to enhance search results. Given image_metadata_embeddings wouldn't be defined if use_xmp_metadata is False, we can avoid unnecessary addition of args to query method	2021-09-16 12:01:05 -07:00
Debanjum Singh Solanky	a4a23d7a72	Batch encode XMP metadata from images too for image_search	2021-09-16 11:11:36 -07:00
Debanjum Singh Solanky	3afe054312	Make image batch size to encode configurable via config.yml	2021-09-16 10:52:31 -07:00
Debanjum Singh Solanky	41c328dae0	Batch encode images to keep memory consumption manageable - Issue: Process would get killed while encoding images for consuming too much memory - Fix: - Encode images in batches and append to image_embeddings - No need to use copy or deep_copy anymore with batch processing. It would earlier throw too many files open error Other Changes: - Use tqdm to see progress even when using batch - See progress bar of encoding independent of verbosity (for now)	2021-09-16 10:15:54 -07:00
Debanjum Singh Solanky	d8abbc0552	Use XMP metadata in images to improve image search - Details - The CLIP model can represent images, text in the same vector space - Enhance CLIP's image understanding by augmenting the plain image with it's text based metadata. Specifically with any subject, description XMP tags on the image - Improve results by combining plain image similarity score with metadata similarity scores for the highest ranked images - Minor Fixes - Convert verbose to integer from bool in image_search. It's already passed as integer from the main program entrypoint - Process images with ".jpeg" extensions too	2021-09-16 08:55:20 -07:00
Debanjum Singh Solanky	0e34c8f493	Allow semantic search on images from Emacs Images are rendered inline a temporary org-mode buffer	2021-09-10 01:14:34 -07:00
Debanjum Singh Solanky	7d5514ecaa	Allow user to override inferred search type with other valid options	2021-09-10 00:58:24 -07:00
Debanjum Singh Solanky	3bdeeb1e19	Autoload main semantic-search function	2021-09-09 22:10:37 -07:00
Debanjum Singh Solanky	f4bde75249	Decouple results shown to user and text the model is trained on - Previously: The text the model was trained on was being used to re-create a semblance of the original org-mode entry. - Now: - Store raw entry as another key:value in each entry json too Only return actual raw org entries in results But create embeddings like before - Also add link to entry in file:<filename>::<line_number> form in property drawer of returned results This can be used to jump to actual entry in it's original file	2021-08-29 06:06:54 -07:00
Debanjum Singh Solanky	7ee3007070	Get ID, QUERY, TYPE, CATEGORY properties from org property drawer when present	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	0263d4d068	Enable semantic search for songs in org-music Org-Music: https://github.com/debanjum/org-music	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	fd7888f3d4	Resolve relative file paths to config YAML file in cli.py	2021-08-29 03:03:37 -07:00
Debanjum Singh Solanky	fc531a1915	Resolve relative file paths to model embeddings in all search types	2021-08-28 22:26:12 -07:00
Debanjum Singh Solanky	4daeddbbda	Enable Semantic Search on Images	2021-08-22 21:42:37 -07:00
Debanjum Singh Solanky	fd217fe8b7	Enable Semantic Search for Beancount transactions	2021-08-22 21:36:06 -07:00
Debanjum Singh Solanky	97263b8209	Move CLI into a separate module. Move CLI tests into a separate file	2021-08-21 19:21:38 -07:00
Debanjum Singh Solanky	78a1f4ebb4	Use YAML file to allow user to configure application. Add tests - YAML Config - Can specify all params[1] earlier being passed via cmd args in config YAML - Can now also configure sentence-transformer models to use etc for search - [1] Config params - org files - compressed entries file config path - embeddings file config path - Include sample_config.yaml - Include sample .org file from this repos readmes - CLI - Configuration Priority: Config via cmd > Config via YAML > Default Config - Test CLI, include test config.yml for the tests - Set default type to None unless set via query param to API Run notes search if search_enabled, also if type is None (default) Prepares for running queries on all search types unless type specified in API query param - Update Readme	2021-08-21 19:07:39 -07:00
Debanjum Singh Solanky	bafc86d583	Add helpers to merge dictionaries and get keys deep inside a dictionary	2021-08-21 18:27:50 -07:00
Debanjum Singh Solanky	252266b62a	Pass type of item via regenerate API. Default type query param to None	2021-08-17 18:25:07 -07:00
Debanjum Singh Solanky	ff7207a6bd	Extract commandline arguments into separate testable method	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	a3a1100be9	Arrange modules in standardized ordering	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	569e30b1c8	Create a few basic tests	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	af9660f28e	Move application files under src directory. Update Readmes - Remove callign asymmetric search script directly command. It doesn't work anymore on calling directly due to internal package import issues	2021-08-17 04:11:03 -07:00

... 31 32 33 34 35 ...

3180 commits