sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-28 18:03:01 +01:00

Author	SHA1	Message	Date
sabaimran	bf8914d0c8	Fix default config initialization for for chat.html	2023-07-03 00:00:47 -07:00
Debanjum	faad1297f4	Drop Support for Org Music, Ledger Content Types Removing unused content types will reduce khoj code to manage - `0f993b3` Drop support for Ledger as a separate content type Khoj will soon get a generic text indexing content type in Index plain text files #237. This along with a file filter should suffice for searching through Ledger transactions - `c9db532` Remove unused org-music as an indexable content type from Khoj Org-music was just a custom content type that worked with org-music. It was mostly only useful for me.	2023-07-02 17:48:29 -07:00
Debanjum Singh Solanky	0f993b332e	Drop support for Ledger as a separate content type Khoj will soon get a generic text indexing content type. This along with a file filter should suffice for searching through Ledger transactions, if required. Having a specific content type for niche use-case like ledger isn't useful. Removing unused content types will reduce khoj code to manage.	2023-07-02 16:57:49 -07:00
sabaimran	fa218ff5aa	Fix call to update for Reinitialize button	2023-07-02 16:31:30 -07:00
sabaimran	a8b83da872	Merge branch 'master' of github.com:debanjum/khoj into features/simplify-configuration-steps	2023-07-02 16:21:54 -07:00
Debanjum Singh Solanky	c9db5321e7	Remove unused org-music as an indexable content type from Khoj Org-music was just a custom content type that worked with org-music. It was mostly only useful for me. Cleaning up that code will reduce number of content types for khoj to manage.	2023-07-02 16:21:21 -07:00
sabaimran	b86a3bb0c5	Merge branch 'master' of github.com:debanjum/khoj into fix/obsidian-setup-issues	2023-07-02 16:21:05 -07:00
sabaimran	a52c1c8380	Use built-in app.vault to determine whether there are any PDF files within	2023-07-02 16:20:43 -07:00
sabaimran	eff1436857	Overwrite existing PDFs in Obsidian as well, make if-block more legible	2023-07-02 16:17:25 -07:00
Debanjum Singh Solanky	30459ee4ba	Fix Khoj subtitle in desktop entry, pyproject, cli and Obsidian Readme	2023-07-02 16:09:07 -07:00
sabaimran	1a1b044d12	Simplify settings pages for configuration - Add one-click disablement - Remove fields that probably don't need to be edited (our implementation details) - Add a green tick if a given field is configured	2023-07-02 16:04:05 -07:00
sabaimran	e4c445f805	Add try-except-finally blocks around configure calls in /update	2023-07-02 13:35:02 -07:00
sabaimran	4b02a8c788	Fix PDF setup in Obsidian plugin and force Obsidian configuration for markdown	2023-07-02 12:37:24 -07:00
sabaimran	2a7e4f2b71	Escape special characters in the URL when adding a link to the remote file	2023-07-02 09:13:28 -07:00
sabaimran	c747562897	Update the GUI to just be a simple box with a button for the web UI	2023-07-01 20:37:21 -07:00
sabaimran	bab7f39d47	Move logic to open the web browser into the GUI section	2023-07-01 20:11:27 -07:00
sabaimran	36537606da	Update unit test and preserve prior operational ordering in main.py	2023-07-01 20:02:35 -07:00
sabaimran	ea9ae4ae28	Configure Khoj to automatically open the browser to their web home page when Khoj is up	2023-07-01 19:46:31 -07:00
sabaimran	d2083dd395	Remove bespoke processing for GithubToJsonl file demo	2023-07-01 19:09:22 -07:00
sabaimran	a71440f62a	Update the guidance in the error message if config is not set	2023-07-01 19:09:00 -07:00
sabaimran	7db97d8aa9	Fix: don't try to render the search_type.ALL	2023-07-01 19:08:19 -07:00
sabaimran	f0f6390366	Make --no-gui the default behavior of Khoj and update corresponding documentation	2023-07-01 19:07:59 -07:00
Debanjum Singh Solanky	d77e05c279	Release Khoj version 0.7.0	2023-07-01 05:44:22 -07:00
Debanjum Singh Solanky	30d87a9a01	Update color of Khoj chat in Obsidinan plugin to Lantern theme	2023-07-01 02:18:47 -07:00
Debanjum Singh Solanky	51826d28d6	Ensure clicking Update in Khoj Obsidian indexes PDF files too	2023-07-01 02:18:47 -07:00
sabaimran	dac2d14380	Handle file names appropriately for md files and render commits in github results	2023-07-01 01:20:58 -07:00
sabaimran	dbe713604d	Fix error in tests for markdown_to_jsonl	2023-07-01 00:49:40 -07:00
sabaimran	931aab4464	Handle case for when headers value is None	2023-07-01 00:37:30 -07:00
sabaimran	d01afb3ee4	Fix path issues for URL-based markdown files	2023-07-01 00:25:11 -07:00
sabaimran	31655447e7	Add the sign-up list to the chat page as well and update copy	2023-06-30 21:43:01 -07:00
sabaimran	796102c74e	Add separate configuration if the given Khoj instance is meant for demo - In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network) - Prevent re-indexing for Github data if this is a demo instance - Fix up some issues with the CSS which made settings page small in mobile - In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page	2023-06-30 20:38:55 -07:00
sabaimran	db3026739d	Resolve diffs in api.py to make /chat endpoint async with new request parameter	2023-06-30 00:25:37 -07:00
sabaimran	ef72508914	Try/catch around github file decoding, await call to search in chat API, fix img width	2023-06-30 00:23:21 -07:00
Debanjum Singh Solanky	b950889f47	Fix org-mode web renderer to handle results containing list in block - Break out of rendering list if at end of org block in org.js - This would previous hang rendering results in web interface Should try fix this upstream in org.js as well	2023-06-29 19:01:25 -07:00
sabaimran	780c769567	Add additional request headers to improve telemetry	2023-06-29 18:51:24 -07:00
sabaimran	6c10d68262	Merge pull request #253 from khoj-ai/features/github-issues-indexing Support indexing Github issues as well as corresponding comments	2023-06-29 16:02:47 -07:00
sabaimran	b2dd946c6d	Rename issue to entry method for accuracy	2023-06-29 15:23:50 -07:00
Debanjum Singh Solanky	51dfa48e2b	Have Khoj support Python 3.11 as Pytorch supports it now - Previously Khoj could only support Python upto 3.10 due to pytorch. But lots of folks had python 3.11 installed by default on their machines. This required installing python 3.10 and dealing with virtual envs. With Torch >= 2.0.1 now able to support python 3.11, at least one class of installation troubles for Khoj should drop. See https://github.com/pytorch/pytorch/issues/86566 for reference - Preliminary testing indicates using the new torch 2.x may reduce search time by 25% (from 80ms to 60ms on Mac M1) - Update Docs to not require mentioning python <=3.10 required - Update Github test workflow to run khoj tests with python 3.11 too	2023-06-29 15:13:26 -07:00
sabaimran	65bf894302	Interpret org files as a list and put them in separate divs. Update styling of search results to separate into cards	2023-06-29 15:12:48 -07:00
Debanjum Singh Solanky	d212298573	Make Configure button on web interface incrementally update by default We should add a way to force index everything. But force indexing should not be the default when user is just trying update content to index	2023-06-29 14:52:51 -07:00
Debanjum Singh Solanky	da2de21339	Only return requested result count even if search in multiple content types - Set results_count to default value at start so it is an int, never None	2023-06-29 14:49:05 -07:00
sabaimran	77672ac0ae	Demarcate different results with a border box - Add back support for searching by type Github - Remove custom class name in markdown js file	2023-06-29 14:14:25 -07:00
sabaimran	6edc32f2f4	Accept current changes to include issues in rendering flow	2023-06-29 12:25:29 -07:00
sabaimran	ab7dabe74f	Explicitly use Union type for function parameters for lint checks	2023-06-29 11:44:30 -07:00
sabaimran	fecf6700d2	Limit small image rendering to just the avatar images	2023-06-29 11:27:18 -07:00
sabaimran	70e550250a	Add an additional data source for issues from Github repositories + quality of life updates - Use a request session to reduce the overhead of setting up a new connection with the Github URL each request - Use the streaming feature for the REST api to reduce some of the memory footprint	2023-06-29 10:59:54 -07:00
Debanjum Singh Solanky	5f2717cc4b	Use logger.warning since logger.warn is deprecated	2023-06-28 22:15:27 -07:00
Debanjum Singh Solanky	56ce97ef9e	Use async/await in tests for query method of text and image search The text, image search query method has become async. So async/await is required to get results correctly in tests etc	2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky	b1767f93d6	Get any configured asymmetric search model to encode query for search - Set image_search.query to async to use it with multi-threading This is same as text_search.query being set to an async method - Exit search early if no search_model is defined in state.model	2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky	8eae7c898c	Put each result under org heading when query for "all" content type in khoj.el - Add "all" as default content type when no content type retrieved from server	2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky	630bf995f1	Style each result based on its content type in same view on Khoj web - So when searching across content types (with content-type = "all") org-mode results get rendered differently than markdown, PDF etc. results - Set div class for each result separately instead of a single uber div for styling. This allows styling div of each result based on the content-type of that result - No need to create placeholder "all" content type on web interface as server is passing an all content type by itself	2023-06-28 22:07:01 -07:00
Debanjum Singh Solanky	1773a78339	Fix createRequestUrl method signature to fetch results from khoj web	2023-06-28 12:10:45 -07:00
Debanjum Singh Solanky	212b1a96c8	Create "all" search type for search across all content types on khoj server Allows moving logic to handle search across all content types to server from clients	2023-06-28 11:34:26 -07:00
Debanjum Singh Solanky	0636ceaf14	Merge branch 'master' of github.com:khoj-ai/khoj into parallelize-search-across-all-asymmetric-text-content-types Conflicts: - src/khoj/routers/api.py: Use theirs	2023-06-27 16:10:32 -07:00
Debanjum Singh Solanky	510bb7e684	Use typing union in text_search for python 3.8 compatible type hinting	2023-06-27 15:59:50 -07:00
Debanjum Singh Solanky	1b11d5723d	Extract search request URL builder into js function in web interface	2023-06-27 15:50:41 -07:00
Debanjum Singh Solanky	09f739b8cc	Null check config, log warning instead of error when configuring search	2023-06-27 15:48:48 -07:00
sabaimran	9d62d66a77	Simplify construction of repo shorthand in GithubToJsonl	2023-06-27 15:05:03 -07:00
sabaimran	227169ebde	Support configuration of multiple Github repositories in the settings interface - Add cards to configure each of the Github repositories - Fix a bug in the API which caused all other settings to be wiped when updating one of the content types - Provide an error message to the user if they have a misconfiguration in their chat settings	2023-06-27 14:10:09 -07:00
sabaimran	37a1f15c38	Add backend support for indexing multiple repositories - Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view - Support indexing a list of repositories	2023-06-27 12:06:15 -07:00
sabaimran	ddd550e6f4	Add call to use X-CSRFToken in relevant POST methods	2023-06-26 12:38:00 -07:00
sabaimran	35e24d7851	Fix null checking in state for content config API and telemetry API	2023-06-26 11:37:34 -07:00
sabaimran	5e39421f56	Merge branch 'master' of github.com:debanjum/khoj	2023-06-25 11:41:47 -07:00
sabaimran	4410a3bb4b	Limit max width of the pre tag to 100% of the screen width	2023-06-25 11:41:15 -07:00
sabaimran	ffe66b848a	Use a single column tempalte for config plugins when in mobile	2023-06-25 11:27:41 -07:00
Debanjum Singh Solanky	b1890aa050	Null check intermediary objects when config not fully initialized	2023-06-24 15:34:18 -07:00
Debanjum Singh Solanky	946af0889d	Improve showing status message on saving config via web interface - Show success/failure status message much closer to the save button Previously status message was shown on top of the page, which wasn't always in view and wasn't easily seen - Improve the status message to more clearly show next steps on success	2023-06-24 00:49:57 -07:00
Debanjum Singh Solanky	40d1abfe50	Update the new /config APIs to configure Khoj for first time users - Setup state.config and sub-components from unset state - Setup search types with default settings	2023-06-24 00:45:30 -07:00
Debanjum Singh Solanky	edabede93a	Fix post configuration state update on error or success on config html	2023-06-23 14:52:25 -07:00
Debanjum Singh Solanky	4744d69221	Resolve button name, anchor tag feedback. Add status message to settings page - Use "Configure" name for settings config action - Use more standard anchor tag instead of button - Add configure status message	2023-06-23 09:48:38 -07:00
Debanjum Singh Solanky	26abafa658	Highlight currently active tab in web interface for orientation	2023-06-22 00:33:28 -07:00
Debanjum Singh Solanky	2728c714d7	Put pico.css in local assets. Move common css styling into khoj.css	2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky	20a37697de	Add Khoj header with navigation pane to Search and Chat Interfaces	2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky	c467a0cbb0	Update UI of config sub pages to use khoj lantern theme styling	2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky	0ce2ec590a	Update main config page on khoj server to match khoj lantern theme	2023-06-21 20:25:25 -07:00
Debanjum Singh Solanky	d30a9ddd33	Use Khoj Logo on Search, Chat pages of Web Interface	2023-06-21 12:34:53 -07:00
Debanjum Singh Solanky	6d4aad57e1	Use new Khoj Lantern Logo in Web, Emacs, Obsidian UIs and Docs	2023-06-21 01:57:22 -07:00
Debanjum Singh Solanky	69d4fa6525	Rename project links across repo from debanjum/khoj to khoj-ai/khoj	2023-06-21 00:13:21 -07:00
Debanjum Singh Solanky	5c4eb950d5	Search across all content types via khoj.el on Emacs If no content-type selected in transient menu option, khoj.el queries khoj server without content-type parameter (t) set. This results in search across all enabled asymmetric search text content types	2023-06-20 23:39:56 -07:00
Debanjum Singh Solanky	2cd3e799d3	Improve null and type checks	2023-06-20 23:30:59 -07:00
Debanjum Singh Solanky	d5fb4196de	Update web interface to allow querying all content types at once	2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky	5c7c8d1f46	Use async/await to fix parallelization of search across content types	2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky	1192e49307	Pass default value matching argument types expected by text_search methods	2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky	0144e610d6	Only search across content types that work with asymmetric search	2023-06-20 22:21:46 -07:00
Debanjum Singh Solanky	f6a7aa6c96	Style Khoj chat on web interface with new lantern theme - Color khoj chat message with new yellow theme color - Update Khoj chat emoji to lantern - Add page type to title of pages on web interface	2023-06-20 01:39:33 -07:00
Debanjum Singh Solanky	6d94d6e75a	Encode the asymmetric, symmetric search queries in parallel for speed Use timer to measure time to encode queries and total search time	2023-06-20 01:18:17 -07:00
Debanjum Singh Solanky	d292dc03b3	Use new Khoj Logotype in Web interface	2023-06-20 01:13:06 -07:00
Debanjum Singh Solanky	db07362ca3	Encode user query as same across search types to speed up query time - Add new filter abstract method to remove filter terms from query - Use the filter method to remove filter terms, encode this defiltered query and pass it to the query methods of each search types TODO: Encoding query is still taking 100-200 ms unlike before. Need to investigate why	2023-06-19 23:29:54 -07:00
Debanjum Singh Solanky	285d17af2a	Search in parallel across all enabled content types requested via API - Update API to return content from all enabled content types when type is not set to specific type in HTTP request param - To do this efficiently run the search queries in parallel threads	2023-06-19 23:29:06 -07:00
Debanjum Singh Solanky	79d325fbb6	Fix triggering @general queries in Khoj Chat	2023-06-19 23:05:33 -07:00
Debanjum Singh Solanky	e97a20d70c	Set conversation type if query param set, else return chat history Only initialize variables if query is not empty, to avoid unnecessary compute, variable null checks etc. Fixes #230	2023-06-19 19:59:16 -07:00
sabaimran	4722a2c16d	Add Github configuration page and success notifications	2023-06-18 10:06:45 -07:00
sabaimran	668135c763	Merge branch 'master' of github.com:debanjum/khoj into features/pretty-config-page	2023-06-18 08:35:09 -07:00
sabaimran	81183a1fe1	Address misc PR comments and update logo in all clients - Rename the new logo to reflect accuracy on size (e.g., 128x128) - Update the icns file for Mac - Update nomenclature in settings pages	2023-06-18 08:34:58 -07:00
Debanjum Singh Solanky	a44cde2865	Show hint to re-index vault if wonky results in Obsidian search modal Remove spurious indentation in Obsidian styles.css Resolves #207	2023-06-18 04:53:51 -07:00
Debanjum Singh Solanky	595cc5b0f5	Use printer icon for PDF logs. Only split lines if file at web link in web interface	2023-06-18 02:26:03 -07:00
Debanjum Singh Solanky	e31a540a5e	Get all md files recursively in repository by passing recursive param Previously the `get_markdown_files' method was only getting files at root of the repository Fix, improve logger messages in github to jsonl processor	2023-06-18 01:47:15 -07:00
Debanjum Singh Solanky	6fdac24416	Set page size to 100 to reduce requests required to Github API to 1/3 - Default is 30. So number of paginated requests required to get all items (commits, files) will reduce by 67% - No need to increase page size for the get tree Github API request from `get_markdown_files' Get tree Github API doesn't support pagination and return 100K items in response. This should be way more than enough for our current use-cases	2023-06-18 01:44:36 -07:00
Debanjum Singh Solanky	87975e589a	Fix passing auth token to Github API to increase rate limits by x85 - Previously wasn't prefixing "token" to PAT token in Auth header This resulted in the request being considered unauthenticated - Unauthenticated requests to Github API are limited to 60 requests/hour Authenticated requests to Github API are allowed 5000 requests/hour	2023-06-18 01:19:26 -07:00
Debanjum Singh Solanky	9c70af960c	Extract logic to get file content from Github into a separate method	2023-06-18 01:19:13 -07:00
Debanjum Singh Solanky	10d4c38ce9	Extract Wait for rate limit reset logic into a function for reuse	2023-06-18 01:06:46 -07:00
sabaimran	aad7f825e0	Remove music configuration	2023-06-17 21:23:56 -07:00
sabaimran	5f97afbfac	Ignore type checks from mypy in subindexed fields	2023-06-17 16:53:36 -07:00
sabaimran	c2d46de8bc	Add endpoint for regenerating directly from the config page and add music content-type	2023-06-17 15:47:33 -07:00
sabaimran	ded3100caf	Update the configuration page to make config management easier - Add a central configuration management page to make management of config details easier - Add relevant api endpoints both for client and server to update/request data as necessary - Attempt to update the favicon	2023-06-17 15:21:28 -07:00
Debanjum Singh Solanky	3f24e53b6e	Render URL as link in web interface if file param of result is a web link	2023-06-17 04:26:40 -07:00
Debanjum Singh Solanky	63ec84ad78	Store Github URL of Markdown files on Github in file jsonl param	2023-06-17 04:23:01 -07:00
Debanjum Singh Solanky	0c1c7583b5	Handle pagination, API rate limits. Get all commits from Github repo	2023-06-17 04:21:39 -07:00
Debanjum Singh Solanky	31d17d0b22	Index commits message from repository with the github plugin	2023-06-17 02:59:54 -07:00
Debanjum Singh Solanky	c29c141a7e	Use Github Rest API to index Markdown files in Github Repository The Llama_Hub Github plugin is fairly limited. The Github Rest API is well supported and can easily be extended to index commit messages, issues, discussions, PRs etc.	2023-06-17 02:16:13 -07:00
Saba	ac96f43b1b	Remove try-catch specific to Github plugin; consolidate GUI logic	2023-06-16 23:46:25 -07:00
Saba	019d3732de	Rename orgmode_search to org_search	2023-06-13 16:06:54 -07:00
Saba	08d79f5ba4	Unify types used in Github and other text-based configs. Fix typing issues	2023-06-13 15:52:36 -07:00
Saba	a6cd96a6a9	Add a Github plugin which can be used to read from a Github repository	2023-06-13 14:40:06 -07:00
Debanjum	c68cde4803	Log clients calling API endpoints on Khoj server - Make API endpoints on Khoj server accept `client` as request parameter - Khoj API endpoints: /chat, /search, /update - Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server - Khoj clients: Emacs, Obsidian and Web - Also log khoj server_version running to telemetry server	2023-06-09 18:36:49 +05:30
sabaimran	59fa48036f	Merge pull request #224 from debanjum/fix/message-exceeds-prompt-size Pass truncated message as string in ChatMessage when exceeding max prompt size	2023-06-08 17:32:53 -07:00
Debanjum Singh Solanky	139a3ba060	Update server to log new server version field to telemetry db	2023-06-08 14:14:21 +05:30
Saba	5d5ebcbf7c	Rename truncate messages method and update unit tests to simplify assertion logic	2023-06-06 23:25:43 -07:00
Saba	7119ed0849	Run pre-commit script	2023-06-05 19:29:23 -07:00
Saba	6212d7c2e8	Remove debug line	2023-06-05 19:00:25 -07:00
Saba	f65ff9815d	Move message truncation logic into a separate function. Add unit tests with factory boy.	2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky	eb6175e9b0	Update description field in webmanifest of Khoj, Khoj Chat PWA	2023-06-06 01:53:42 +05:30
Debanjum Singh Solanky	bb2363f324	Set client request param when calling khoj server APIs from Web	2023-06-06 00:05:00 +05:30
Debanjum Singh Solanky	caab55fbdd	Set client request param when calling khoj server APIs from Obsidian	2023-06-06 00:04:46 +05:30
Debanjum Singh Solanky	de2494154f	Set client request param when calling khoj server APIs from Emacs	2023-06-06 00:02:10 +05:30
Debanjum Singh Solanky	168c11cea7	Make server API endpoints accept client as query param - The chat, search and update API will accept client as request param. - This will allow logging the client from which these APIs was called.	2023-06-05 23:57:08 +05:30
Debanjum Singh Solanky	8617cf1389	Push telemetry to Posthog to grok Khoj usage	2023-06-05 22:47:49 +05:30
Debanjum Singh Solanky	d13db2e666	Make old telemetry server forward requests to new server	2023-06-05 13:06:45 +05:30
Saba	5f4223efb4	Increase timeout to OpenAI call	2023-06-04 20:49:47 -07:00
Saba	0e63a90377	Fix the mechanism to retrieve the message content	2023-06-04 20:25:37 -07:00
Saba	f0efe0177e	Pass truncated message as string in ChatMessage when exceeding max prompt size	2023-06-04 19:33:46 -07:00
Saba	068ee0ac5e	Swap elif with else, as usage of this method does not use openai_api_key	2023-06-04 02:25:08 -07:00
Saba	6508379d7b	Use api_key keyword argument to set the openai_api_key parameter for GPT	2023-06-04 00:57:00 -07:00
Debanjum Singh Solanky	7af8a56434	Remove filename from reference before rendering references in khoj.el Fixes bug where actual reference heading in next line jumping out of references footnote section	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	ec280067ef	Do not retrieve relevant notes when having a general chat with Khoj - This improves latency of @general chat by avoiding unnecessary compute - It also avoids passing references in API response when they haven't been used to generate the chat response. So interfaces don't have to add logic to not render them unnecessarily	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	90439a8db1	Update Khoj subtitle to AI personal assistant for your digital brain	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	e9ed7a19fd	Update search prompt to extract PDF search type. Fix extract_question prompt	2023-06-02 10:06:03 +05:30
Debanjum Singh Solanky	bbe3bf9733	Render PDF search results in Khoj Obsidian interface - Make plugin update khoj server config to index PDF files in vault too - Make Obsidian plugin update index for PDF files in vault too - Show PDF results in Khoj Search modal as well - Ensure combined results are sorted by score across both types - Jump to PDF file when select it PDF search result from modal	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	e3892945d4	Render PDF search results in Khoj.el Emacs interface	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	85144006a1	Render PDF search results in khoj web interface	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	acd14a5e41	Wire up PDF to jsonl processor to Khoj server layer (API, config) - Specify PDF content to index via khoj.yml - Index PDF content on app start, reconfigure - Expose PDF as a search type via API	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	286b500f66	Create PDF to JSONL processor using PyPDF and LangChain Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts throwing typing error for python 3.8, 3.9	2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky	1b3effd8e6	Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor	2023-06-01 09:13:31 +05:30
Debanjum Singh Solanky	1cd9ecd449	Truncate last message if still over max supported prompt size by model	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	ed4d0f9076	Simplify argument names used in khoj openai completion functions - Match argument names passed to khoj openai completion funcs with arguments passed to langchain calls to OpenAI - This simplifies the logic in the khoj openai completion funcs	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	703a7c89c0	Reduce retry count and request timeout for faster response or failure - Fix bug where both LangChain and Khoj retry requests 6 times each. So a total of 12 requests at >1minute intervals for each chat response in case of OpenAI API being down - Retrying too many times when the API is failing doesn't help - The earlier 60 second request timeout was spacing out the interval between retries way too much. This slowed down chat response times quite a bit when API was being flaky - With these updates you'll know if call to chat API failed in under a minute	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	18081b3bc6	Use LangChain to call GPT over API	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	277d2f5c96	Do not add "Notes:" suffix to chat messages when no notes retrieved This was causing spurious "Notes:" suffix being added to Khoj Chat in response	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	334be4e600	Use LangChain to call OpenAI for Khoj Chat - Use ChatModel and ChatOpenAI to call OpenAI chat model instead of using OpenAI package directly - This is being done as part of migration to rely on LangChain for creating agents and managing their state	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	efcf7d1508	Extract prompts as LangChain Prompt Templates into a separate module Improves code modularity, cleanliness. Reduces bloat in GPT.py module	2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky	b484953bb3	Import app state correctly to generate embeddings with OpenAI model Resolves #216	2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky	a0d0dbaca7	Fix link to Khoj Obsidian Demo video in Readmes	2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky	ebb5d7b8e5	Release Khoj version 0.6.2	2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky	d02415edcc	Write generated server id to env file when env file does not contain it	2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky	dc0626856e	Put the telemetry db in a separate directory by default	2023-05-17 18:58:47 +05:30
Debanjum Singh Solanky	e9f04dc644	Add dockerfile to containerize telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	07b19964d4	Schedule jobs at (co-)prime intervals to reduce overlap in job runs	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	d42f0f5055	Add basic telemetry server for khoj	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	134cce9d32	Batch upload telemetry data at regular interval instead of while querying	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	3ede919c66	Log usage of /search, /chat, /update API endpoints to telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	f2e89f6f46	Add khoj app helper methods to log app usage to a telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	9ca61d62ff	Enable/disable logging telemetry by setting bool in khoj.yml config We log usage telemetry by default, unless setting explicitly set in khoj.yml	2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky	131b8407b5	Allow Khoj Chat to respond to general queries not in reference notes - Khoj chat will now respond to general queries if: 1. no relevant reference notes available or 2. when explicitly induced by prefixing the chat message with "@general" - Previously Khoj Chat would a lot of times refuse to respond to general queries not answerable from reference notes or chat history - Make chat quality tests more robust - Add more equivalent chat response options refusing to answer - Force haiku writing to not give any preable, just the haiku	2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky	cc75f986b2	Test text search index only updates on changes to text content	2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky	f9ccce430e	Allow configuring OpenAI chat model for Khoj chat - Simplifies switching between different OpenAI chat models. E.g GPT4 - It was previously hard-coded to use gpt-3.5-turbo. Now it just defaults to using gpt-3.5-turbo, unless chat-model field under conversation processor updated in khoj.yml	2023-05-03 23:01:13 +08:00
Debanjum Singh Solanky	6b535cc345	Snip prepended heading to avoid crossing model max_token limits Otherwise if heading > max_tokens than the search models will just see a heading (with repeated filename) for each compiled entry and not actual content. 100 characters should be sufficient to include filename (not path) and entry heading. If longer rather truncate to pass entry unique text to model for search context	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	02aeee60aa	Set filename as top heading of org entries for better search context Previously filename was only being appended to markdown entries. Test filename getting prepended to compiled entry as heading	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	94825a70b9	Set heading of md entries to improve search context for long entries Otherwise if a markdown entry is longer than max_tokens, the split entries (apart from first one) do not get their heading context set	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	5de04621b5	Set filename as top heading of md entries for better search context Previously filename was appended to the end of the compiled entry. This didn't provide appropriate structured context Test filename getting prepended as heading to compiled entry	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	0e3fb59e09	Entries with no md headings should not get heading prefix prepended Files with no headings would previously get their entry be prefixed with a markdown heading prefix (#)	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	45a991d75c	Prepend entry heading to all compiled org snippets to improve search context All compiled snippets split by max tokens (apart from first) do not get the heading as context. This limits search context required to retrieve these continuation entries	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	3386cc92b5	Fix khoj server config update in khoj.el by unquoting list to cl-push to - cl-push expects a generatlized variable. Else throws (setf quote) undefined warning - This results in the config call failing on calling khoj entrypoint	2023-05-03 15:10:56 +08:00
Debanjum Singh Solanky	948a4274e4	Fix documentation strings and simplify not null checks	2023-05-02 21:47:50 +08:00
Debanjum Singh Solanky	731ef5688f	Use cl-pushnew to fix byte-compile errors with using add-to-list	2023-05-02 21:47:38 +08:00
Debanjum Singh Solanky	f046523b33	Improve khoj.el messages to convey state of khoj server - Remove waiting for server message as it hides the messages from the server - Fix the nil message that were being rendered, by checking before showing messages from server - Consistently prefix messages from khoj with khoj.el	2023-04-28 11:15:13 +08:00
Debanjum Singh Solanky	76df393eb5	Only call khoj server configure API from khoj.el when config updated Previously khoj.el was calling the server configure API even when config was same as before. This had broken the khoj search as you type experience from emacs Also show more details to user about what in khoj is being configured	2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky	ceae06ae9d	Fix khoj.el compilation warnings around unused variables	2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky	8269adf849	Refactor khoj-setup in khoj.el for readability. No functional change	2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky	865d12b6f2	Fix escaping quote in chat references to prevent it breaking out of html	2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky	26cb878327	Add Yarn lockfile for Khoj Obsidian	2023-04-18 00:57:11 +07:00
Debanjum Singh Solanky	e3180d63e6	Sync Khoj Obsidian Tagline with Khoj tagline	2023-04-18 00:56:50 +07:00
Debanjum Singh Solanky	62e6e09521	Release Khoj version 0.6.1	2023-04-17 23:31:35 +07:00
Debanjum Singh Solanky	b079fb31bc	Replace Windows path separators in indexName configured via Khoj Obsidian Resolves #185, #199 - Issue IndexName created from Obsidian Absolute Vault path wasn't replacing windows path, drive separators with underscore. It was only replacing unix path separators - Fix Also replace windows drive and path separators with _ while creating IndexName in Khoj Obsidian plugin	2023-04-17 16:55:33 +07:00
Debanjum Singh Solanky	d90df966a9	Make khoj logger use utf-8 encoding when writing to khoj log file Resolve logger error issue mentioned in #199	2023-04-17 16:55:07 +07:00
Debanjum Singh Solanky	dc3f399f91	Fix to get score associated with SearchResponse in result as string	2023-04-16 20:22:51 +07:00
Debanjum Singh Solanky	d5000c63e1	Update Readmes to use python -m pip install khoj-assistant Makes it easier to tell pip associated with which python is being used. Easier to debug when users have different versions of python installed (e.g 3.10 and 3.11)	2023-04-16 20:17:20 +07:00
Debanjum Singh Solanky	453c84ab79	Add Screenshots of Khoj Chat Interface on Emacs, Obsidian to Readmes	2023-04-07 23:19:47 +07:00
Debanjum Singh Solanky	35aa06067f	Release Khoj version 0.6.0 Upload styles.css via release workflow	2023-03-31 18:13:16 +07:00
Debanjum Singh Solanky	5673bd5b96	Keep original formatting in compiled text entry strings - Explicity split entry string by space during split by max_tokens - Prevent formatting of compiled entry from being lost - The formatting itself contains useful information No point in dropping the formatting unnecessarily, even if (say) the currrent search models don't account for it (yet)	2023-03-30 14:02:46 +07:00
Debanjum Singh Solanky	a2ab68a7a2	Include filename of markdown entries for search indexing Append originating filename to compiled string of each entry for better search quality by providing more context to model Update markdown_to_jsonl tests to ensure filename being added Resolves #142	2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky	67129964a7	Create Note with Query as title from within Khoj Search Modal This follows expected behavior for obsidain search modals E.g Ominsearch and default Obsidian search. The note creation code is borrowed from Omnisearch. Resolves #133	2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky	d3257cb24e	Style the search result. Use Obsidian theme colors and font-size Based on PR #135	2023-03-30 12:35:29 +07:00
Debanjum Singh Solanky	40091489c0	For each result: snip it by lines, show filename, remove frontmatter Based on PR #135 Resolves #134	2023-03-30 12:34:55 +07:00
Debanjum Singh Solanky	240db7b4f0	Add screenshot of Khoj chat on Obsidian to Readme. Fix links	2023-03-30 02:49:05 +07:00
Debanjum Singh Solanky	234be96e53	Fix processor key used to configure chat model in khoj obsidian	2023-03-30 01:47:09 +07:00
Debanjum Singh Solanky	c8c0cfd10e	Add Chat features, setup and usage to Khoj Obsidian plugin Readme	2023-03-30 00:32:24 +07:00
Debanjum Singh Solanky	7ecae224e7	Configure OpenAI API Key from the Khoj plugin setting in Obsidian	2023-03-29 23:54:08 +07:00
Debanjum Singh Solanky	3d616c8d65	Use Obsidian font sizes. Improve input field, reference indexing - Give space in the input field. Too narrow previously - References should be indexed from 1 instead of 0 - Use Obsidian font size variables to scale fonts in chat appropriately	2023-03-29 22:13:55 +07:00
Debanjum Singh Solanky	23bd737f6b	Use chat input element to send message on Enter. No send button required	2023-03-29 22:13:30 +07:00
Debanjum Singh Solanky	81e98c3079	Scroll to bottom of modal on open and message send	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	59ff1ae27f	Use obsidian theme colors for bg, text. Restrict css namespace via prefix	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	001ac7b5eb	Style Obsidian Chat Modal like Khoj Chat Web Interface - Add message sender, date metadata as message footer - Use css directly from Khoj Chat Web Interface. - Modify it to work under a Obsidian modal - So replace html, body styling from web interface to instead styling new "khoj-chat" class attached to contentEl of modal	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	112f388ada	Render references next to chat responses by khoj in chat modal	2023-03-28 18:11:03 +07:00
Debanjum Singh Solanky	1d3d949962	Render conversation logs on page load	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	cd46a17e5f	Add Khoj Chat Modal, Command in Khoj Obsidian to Chat using API	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	c0972e09e6	Rename KhojModal to KhojSearchModal, a more specific name for it In preparation to introduce Khoj chat in Obsidian	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	64fff1d372	Release Khoj version 0.5.0	2023-03-28 03:35:59 +07:00
Debanjum Singh Solanky	fc218508f9	Update khoj.el docs and Emacs Readme for chat, simplified setup	2023-03-27 22:02:47 +07:00
Debanjum Singh Solanky	83a7ccd729	Fix docstrings and method ordering in khoj.el	2023-03-27 18:33:09 +07:00
Debanjum Singh Solanky	5c2327ee4f	Configure org directories to index from khoj.el Converts paths to glob style regexes that will index all org files recursively under the specified list of path Should help setup for org-roam users from khoj.el	2023-03-27 18:30:53 +07:00
Debanjum Singh Solanky	6e8a40906d	Allow disabling automatic server setup. Fix server start vs ready logic - khoj-auto-setup controls whether to automatically check for and setup khoj server from within Emacs - extract install, start, configure sequence into public, interactive method. Allows calling khoj-setup during package load via init.el - Fix: Do not attempt to configure or wait for server ready if user has said no to auto-setup request - Fix logic to mark server started vs ready - Previously the started/running vs ready variables defs were getting intertwined - Server started indicates server bootup has been triggered - Server ready indicates server API ready to accept requests	2023-03-27 17:53:08 +07:00
Debanjum Singh Solanky	526a927bce	Fix org entry extraction test, variable prefixed with khoj in khoj.el Discovered via failing build and test workflows on Github	2023-03-27 16:44:50 +07:00
Debanjum Singh Solanky	7243059507	Track index update asynchronously via moon phase progressbar in khoj.el	2023-03-27 06:01:04 +07:00
Debanjum Singh Solanky	8a9055f918	Restrict server messages show in echo area to main server files	2023-03-27 04:59:55 +07:00
Debanjum Singh Solanky	ae535a06eb	Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs	2023-03-27 04:59:54 +07:00
Debanjum Singh Solanky	36b17d4ae0	Generalize the directory from config extraction elisp method	2023-03-27 03:44:03 +07:00
Debanjum Singh Solanky	924424c754	Throw actionable exceptions when content types or chat not configured	2023-03-27 02:47:44 +07:00
Debanjum Singh Solanky	359a2cacef	Fix khoj--server-running to work with unconfigured or external server - If khoj server started outside emacs, khoj--server-ready should be set to true by khoj--server-running method (instead of waiting for proc msg) - If khoj server is unconfigured the /config/types endpoint wouldn't return anything. Using config/data/default allows checking khoj server running status without requiring it to be configured as well	2023-03-27 02:45:59 +07:00
Debanjum Singh Solanky	d7fb9a596e	Auto configure server before loading khoj-menu If the config hasn't changed there'll be no update. If config has changed indexing will get triggered asynchronously. But user cannot make query till indexing done As easier to know when server ready to configure	2023-03-27 02:44:02 +07:00
Debanjum Singh Solanky	8a21aff438	Make khoj.el server start, stop, restart, setup methods interactive No need to erase temporary buffers before working on them	2023-03-27 01:53:15 +07:00
Debanjum Singh Solanky	cb40a96c85	Index configured org files from khoj.el - Set `khoj-org-files-index' to list of files to index - Defaults to indexing org-agenda-files - Uses khoj server api to configure org files to index	2023-03-27 01:05:26 +07:00
Debanjum Singh Solanky	50760acc37	Wait for Khoj server to get ready before opening khoj.el transient menu - Use process filter, sentinel to mark when khoj server is ready or not - Display server messages for visibility into server boot-up process - Wait until server ready to open khoj transient menu in Emacs Until then khoj features wouldn't work anyway, so avoids confusion	2023-03-26 13:00:01 +07:00
Debanjum Singh Solanky	82eb4bfd0d	Setup Khoj server on opening khoj from with Emacs - Create helper methods to check, stop, restart, setup khoj server - (Ask to) setup khoj server on calling khoj main entrypoint function	2023-03-26 10:12:06 +07:00
Debanjum Singh Solanky	99d19dcf43	Start Khoj server from Emacs using khoj.el	2023-03-26 09:38:46 +07:00
Debanjum Singh Solanky	c92d79118a	Install Khoj server from Emacs using khoj.el	2023-03-26 08:50:03 +07:00
Debanjum Singh Solanky	e281a498b4	Style Khoj search org buffer via elisp instead of in-buffer settings	2023-03-26 06:34:18 +07:00
Debanjum Singh Solanky	4f655d20ae	Style Khoj chat directly via elisp instead of via in-buffer settings	2023-03-26 06:03:30 +07:00
Debanjum Singh Solanky	f6ff7b1beb	Render foonote reference links as superscript for Khoj Chat on Emacs	2023-03-26 05:33:08 +07:00
Debanjum Singh Solanky	67c850a4ac	Add retry logic to OpenAI API queries to increase Chat tenacity - Move completion and chat_completion into helper methods under utils.py - Add retry with exponential backoff on OpenAI exceptions using tenacity package. This is officially suggested and used by other popular GPT based libraries	2023-03-26 05:12:35 +07:00
Debanjum Singh Solanky	ff846f05c5	Clean-up khoj.el based on linting helpers and manual review	2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky	7e36f421f9	Truncate message logs to below max supported prompt size by model - Use tiktoken to count tokens for chat models - Make conversation turns to add to prompt configurable via method argument to generate_chatml_messages_with_context method	2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky	4725416fbd	Use shortcut keybindings in buffer to ease sending messages to Khoj	2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky	508b2176b7	Update Chat API, Logs, Interfaces to store, use references as list - Remove the need to split by magic string in emacs and chat interfaces - Move compiling references into string as context for GPT to GPT layer - Update setup in tests to use new style of setting references - Name first argument to converse as more appropriate "references"	2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky	b08745b541	Keep chat messages at 1 empty line visible distance in khoj.el - Clean redundant concat, format string - Improve variable name to emojified sender	2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky	27217a330d	Time chat API sub-components for performance analysis Time and the search query extraction, search and response generation components	2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky	5e9558d39d	Stylize references shown as footnote links in chat messages - Render references as superscript - Show reference definitions on hover over reference links to ease access - Truncate reference def shown on hover to 70 char - Add continuation suffix, ..., when reference definition truncated	2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky	cf28f104c7	Register separate timestamps for user query and response by Khoj Chat	2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky	93e2aff786	Add references as org footnotes instead of links	2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky	d78454d4ad	Load Khoj Chat buffer before asking for query to provide context	2023-03-24 13:43:46 +07:00
Debanjum Singh Solanky	863933daaa	Resolve build issues found by melpazoid	2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky	e9ca04af0d	Require dash, org to run ERT tests for khoj.el	2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky	06df394d6c	Style chat messages as org-mode entries in Emacs - Style Message as Org Entries instead of List - Put khoj response as child of user query entry - Improves color coding for readability - Allows folding each back-n-forth - Put timestamp of message received into property drawer - Use standardized time format for new and old chat messages	2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky	364e6c11af	Render chat history from API in chat buffer on first run - Generalize the render-chat-response method to handle rendering history or chat response from chat API reponse - Trigger rendering of khoj chat history if Khoj chat buffer not created for this session yet	2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky	36b52fdd0a	Properly escape reference links before rendering - Use org-insert-link method to improve link rendering robustness Previous simple mechanism to crete org-links would result in links escaping out of formating. Use a user-facing org-mode method to remove/reduce probability of this - Replace newlines with space to render reference notes as links	2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky	72f63a6ef7	Add basic chat interface for Khoj on Emacs - Query khoj chat API to get Khoj Chat response to user message - Render chat messages as a org-mode list in format: - [sender-name]: [message] - /[receive-date]/ - Add references as org links with context visible on hover, but no jump to note - Require dash library for khoj.el to simplify list manipulation. Use `-map-indexed' method from dash	2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky	e4d67694e1	Add search to method, variable names meant for khoj search in khoj.el In preparation to introduce Khoj chat in Emacs	2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky	2f6284872d	Mention Khoj needs Python version 3.10 or lower in docs	2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky	601ff2541b	Revert to using GPT to extract search queries from users message - Reasons: - GPT can extract date aware search queries with date filters better than ChatGPT given the same prompt. - Need quality more than cost savings for now. - Need to figure ways to improve prompt for ChatGPT before using it	2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky	e28526bbc9	Extract search queries from users message using ChatGPT as Search Actor - Reasons - ChatGPT should be better at following instructions than GPT - At 1/10th the cost, it's much cheaper than using older GPT models	2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky	939d7731da	Fix-up Search Actor GPT's response for decoding it as valid JSON	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	f63fd0995e	Pass more search results as context to Chat Actor to improve inference	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	10836dedee	Search should return user message if GPT response is not valid JSON Previously would throw if GPT response is not valid JSON. Better to return original message to use for search instead	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	08f5fb315f	Add answers to context for Search Actor to generate relevant queries Update Search Actor prompt with answers, more precise primer and two more examples for context Mark the 3 chat quality tests using answer as context to generate queries as expected to pass. Verify that the 3 tests pass now, unlike before when the Search Actor did not have the answers for context	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	45cb510421	Loosen search results score thresold used by chat for more context	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	d871e04a81	Use past user messages, inferred questions as context to extract questions - Keep inferred questions in logs - Improve prompt to GPT to try use past questions as context - Pass past user message and inferred questions as context to help GPT extract complete questions - This should improve search results quality - Example Expected Inferred Questions from User Message using History: 1. "What is the name of Arun's daughter?" => "What is the name of Arun's daughter" 2. "Where does she study?" => => "Where does Arun's daughter study?" OR => "Where does Arun's daughter, Reena study?"	2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky	1a5d1130f4	Generate search queries from message to answer users chat questions The Search Actor allows for 1. Looking up multiple pieces of information from the notes E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches 2. Allow date aware user queries in Khoj chat Answer time range based questions Limit search to specified timeframe in question using date filter E.g "What national parks did I visit last year?" adds dt>="2022-01-01" dt<"2023-01-01" to Khoj search Note: Temperature set to 0. Message to search queries should be deterministic	2023-03-18 16:28:51 -06:00
Debanjum	e75e13d788	Create Tests to Measure Chat Quality, Capabilities Create Rubric to Test Chat Quality and Capabilities ### Issues - Previously the improvements in quality of Khoj Chat on changes was uncertain - Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities ### Fix 1. Create an Evaluation Dataset to assess Chat Capabilities - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋) - Add a few of Paul Graham's more personal essays. [Easy to get as markdown](https://github.com/ofou/graham-essays) 2. Write Unit Tests to Measure Chat Capabilities - Measure quality at 2 separate layers - Chat Actor: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py` - Chat Director: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes. - Mark desired but not currently available capabilities as expected to fail <br /> This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat	2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky	7526a50dd4	Extract conversation processor utility funcs from gpt.py into utils.py	2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky	24ddebf3ce	Make converse prompt more precise. Fix default arg vals in gpt methods - Set conversation_log arg default to dict - Increase default temperature to 0.2 for a little creativity in answering - Make GPT be more reliable in looking at past conversations for forming response	2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky	8609e3129e	Fix, improve displaying chat messages, sources by Khoj in web interface Pretty pretty json in conversation logs	2023-03-14 11:24:47 -06:00
Debanjum	6c0e82b2d6	Merge Improve Khoj Chat PR #183 from debanjum/improve-chat-interface # Improve Khoj Chat ## Main Changes - Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost - Improve Prompt to answer query using indexed notes - Previously was asking GPT to summarize the notes - Both the chat and answer API use this new prompt - Support Multi-Turn conversations - Pass previous messages and associated reference notes to ChatGPT for context - Show note snippets referenced to generate response - Allows fact-checking, getting details - Simplify chat interface by using only single unified chat type for now ## Miscellaneous - Replace summarize with answer API. Summarize via API not useful for now - Only pass Khoj search results above a threshold confidence to GPT for context - Allows Khoj to say don't know if it can't find answer to query from notes - Allows relying on (only) conversation history to generate response in multi-turn conversation - Move Chat API out of beta. Update Readme	2023-03-10 19:03:44 -06:00
Debanjum Singh Solanky	cccd225247	Deduplicate and simplify logic to render chat message with reference	2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky	b9caad458e	Type score_threshold with union, not \|, to support python <3.10	2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky	a71f168273	Move the chat API out of beta. Save chat sessions at 15min intervals	2023-03-10 17:20:52 -06:00
Debanjum Singh Solanky	8bb8824d0c	Bump khoj versions in obsidian, emacs files	2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky	e16d0b6d7e	Open references notes used for chat on mobile too (by clicking) Requires clicking the reference as hover doesn't work on mobile	2023-03-09 17:13:07 -06:00
Debanjum Singh Solanky	c3c7b8a951	Make Khoj chat a separate Progressive Web App (PWA) for easier access	2023-03-09 13:45:06 -06:00
Debanjum Singh Solanky	3838f9d8e3	Remove explicitly asking GPT to say I don't know in prompt for now GPT still mostly says I don't know when answer not in notes or chats But with this its more inclined to answer general questions not in chats or notes while informing user that the information is not from existing chats or notes	2023-03-09 12:11:44 -06:00
Debanjum Singh Solanky	f7b8cdd02e	Log prompts being passed to GPT for debugging	2023-03-08 19:17:52 -06:00
Debanjum Singh Solanky	2739a492b4	Log message metadata along with Khoj message instead of user message References should be attached to khoj chat messsage rather than the users message in the chat interface	2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky	87d1e1341d	Show reference notes used as response context in chat interface	2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky	280061e1fa	Do not deduplicate search results used for chat context - Chat uses compiled form of search results, not the raw entries to provide context for chat. The compiled snipped search results themselves are unique and using multiple of them for context from the same raw note is fine if they cross the score and rank thresholds This should improve the context provided for chat - Also apply score_threshold, no deduplication to the answers API	2023-03-06 23:51:31 -06:00
Debanjum Singh Solanky	672f61529e	Make getting deduped search results configurable via Search API	2023-03-06 23:48:46 -06:00
Debanjum Singh Solanky	4fb628975c	Fix jumping to note from Khoj Obsidian search modal result on Windows - Issue The file path separator by khoj server and the Obsidian vault were different on Windows - Fix Normalize file path to use forward slash(/) to find the matching note file in the Obsidian vault for jump to it Resolves #177	2023-03-05 21:07:54 -06:00
Debanjum Singh Solanky	b6cdc5c7cb	Do not expose answer API as a chat type in chat web interface or API Answer does not rely on past conversations, just the knowledge base. It is meant for one off interactions, like search rather than a continuing conversation like chat For now it is only exposed via API. Later it will be expose in the interfaces as well Remove ability to select different chat types from the chat web interface as there is only a single chat type Stop appending answers to the conversation logs	2023-03-05 18:21:59 -06:00
Debanjum Singh Solanky	7f994274bb	Support multi-turn conversations in chat mode - Only use decent quality search results, if any, as context - Pass source results used by previous chat messages as context - Loosen prompt to allow looking at previous chats and notes to answer - Pass current date for context - Make GPT provide reason when it can't answer the question. Gives user context to tune their questions	2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky	d73042426d	Support filtering for results above threshold score in search API	2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky	45f461d175	Keep search results passed to GPT as context in conversation logs This will be useful to 1. Show source references used to arrive at answer 2. Carry out multi-turn conversations	2023-03-05 16:00:19 -06:00
Debanjum Singh Solanky	7cad1c9428	Only use past chat message, not session summaries as chat context Passing only chat messages for current active, and summaries for past session isn't currently as useful	2023-03-05 16:00:18 -06:00
Debanjum Singh Solanky	ad1f1cf620	Improve and simplify Khoj Chat using ChatGPT - Set context by either including last 2 chat messages from active session or past 2 conversation summaries from conversation logs - Set personality in system message - Place personality system message before last completed back & forth This may stop ChatGPT forgetting its personality as conversation progresses given: - The conditioning based on system role messages is light - If system message is too far back in conversation history, the model may forget its personality conditioning - If system message at end of conversation, the model can think its the start of a new conversation - Inserting the system message before last completed back & forth should prevent ChatGPT from assuming its the start of a new conversation while not losing personality conditioning from the system message - Simplfy the Khoj Chat API to for now just answer from users notes instead of trying to infer other potential interaction types. - This is the default expected behavior from the feature anyway - Use the compiled text of the top 2 search results for context - Benefits of using ChatGPT - Better model - 1/10th the price - No hand rolled prompt required to make GPT provide more chatty, assistant type responses	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	9d42b5d60d	Use multiple compiled search results for more relevant context to GPT Increase temperature to allow GPT to collect answer across multiple notes	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	c3b624e351	Introduce improved answer API and prompt. Use by default in chat web interface - Improve GPT prompt - Make GPT answer users query based on provided notes instead of summarizing the provided notes - Make GPT be truthful using prompt and reduced temperature - Use Official OpenAI Q&A prompt from cookbook as starting reference - Replace summarize API with the improved answer API endpoint - Default to answer type in chat web interface. The chat type is not fit for default consumption yet	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	7184508784	Mention Python and Pip need to be installed in Main and Emacs Readme	2023-03-02 21:28:54 -06:00
Debanjum Singh Solanky	211e460398	Output date filter from cache log at debug level. Remove unused imports Other logs not directly useful to user have already been converted to debug log levels in `1ae4016`. Just forgot to convert this log line too	2023-03-02 15:41:32 -06:00
Debanjum Singh Solanky	b6dbe4dd1d	Do not try retrieve an unconfigured core content type in Config GUI Previous behavior was resulting in a null reference error. As key for the core content/search type was not present in current config Fallback to using default config for unconfigured core content type instead See #165 for details	2023-03-02 11:09:31 -06:00
Debanjum Singh Solanky	1ae40163a9	Show user friendly information logs by default for context - Use emojis to make info logs easier to read - Inform when khoj is ready to use - Provide information on what khoj is doing while starting up - Inform when content/search types and processors are setup - Inform when models are being loaded from the web as this step can take time - Convert all other info logs to be only shown in verbose mode	2023-03-01 16:39:07 -06:00
Debanjum Singh Solanky	fe03ba3dce	Index intro text before headings in org files - Text before headings was not being indexed due to buggy orgnode parsing logic - Resolved indexing intro text from files with and without headings in them - Ensure intro text node has heading set to all title lines collected from the file Resolves #165	2023-03-01 12:11:33 -06:00
Debanjum Singh Solanky	7ad251b8ef	Log and Continue on OSError while collating dates for date filters Log to understand if error, date can be handled better Mitigates #172	2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky	2bed4c3b50	Fix configuring search types & /config/types API when no plugin configured - Test /config/types API when no plugin configured, only plugin configured and no content configured scenarios - Do not throw null reference exception while configuring search types when no plugin configured - Do not throw null reference exception on calling /config/types API when no plugin configured Resolves bug introduced by #173	2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky	8914dbd073	Fix creating GUI panels for unconfigured search, processor types Repro: 1. Open khoj server with `khoj` on first run 2. Install/enable Khoj Obsidian plugin (to configure khoj server) 3. Restart khoj server with `khoj` Bug: - Unconfigured processor and search_types are instantiated as None in self.current_config - While creating the desktop GUI, these null configs are attempted to be accessed as valid dictionaries for creating their GUI panels - This results in the null ref errors Fix: Use default config to create their GUI elements for unconfigured search and processor types Resolves #167	2023-03-01 01:20:58 -06:00
Debanjum Singh Solanky	b09350c052	Fix to return only enabled content types via the new config/types API - Previously was return all core content types even if they had not been setup - Add test to validate only configured content types are returned by the api/config/types API endpoint	2023-02-28 22:08:26 -06:00
Debanjum Singh Solanky	b177adf3a7	Return value of search_type in /config/type API endpoint - Remove need for interfaces to downcase content types returned by API before using the type in search and other API endpoint - Fix to check for search_type.name in plugin keys instead of value	2023-02-28 21:49:26 -06:00
Debanjum Singh Solanky	88344f9ed2	Improve rendering search results of plugin content types on web interface Render only the entry from plugin search response instead of raw json Use the results-ledger styling for results-plugin styling	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	c2814fce58	Improve rendering search results of plugin content types in khoj.el Render only the entry from plugin search response instead of raw json	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	f3f24387ec	Use new config/types API to set enabled content types on web interface	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	1e43f1a12e	Use new config/types API to set enabled content types in khoj.el menu	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	9d38eadd42	Return enabled content types via api/config/types API endpoint Simplifies dynamically populating enabled content types for interfaces	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	68bd5d9ebc	Configure API routes after set up search types while configuring server Configure app routes after configuring server. Import API routers after search type is dynamically populated. Allow API to recognize the dynamically populated plugin search types as valid type query param. Enable searching for plugin type content.	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	d91c7e2761	Search for plugin content via the search API	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	47b58a2a4d	Configure, use dynamically instantiated SearchType enum on app start The SearchType is now dynamically populated with core and configured plugin types Use the new dynamic SearchType enum from state.py across codebase	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	ab0d3a08e2	Index configured plugins on app start and via update API endpoint	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	55a032e8c4	Add processor to index entries from jsonl files for plugins - Read, merge entries from input jsonl files and filters - Mark new, modified entries for update	2023-02-24 02:54:12 -06:00
Debanjum Singh Solanky	fcbbe8c759	Read content plugin configs from Khoj config YAML Configure external text content plugins via the Khoj YAML Reuse existing TextContentConfig definition for external text content plugins	2023-02-23 23:57:32 -06:00
Debanjum Singh Solanky	61b6ee2857	Use helper script to bump khoj pre-release versions	2023-02-17 20:31:51 -06:00
Debanjum Singh Solanky	053d6141f3	Ignore ts typing error, Fix SPDX license identifier in Obsidian plugin	2023-02-17 18:19:01 -06:00
Debanjum Singh Solanky	36be3c4b8f	Fix or ignore MyPy issues in PyQt desktop GUI code - Remove unneeded type ignore for mps with the latest mypy - Stop excluding PyQT desktop GUI code from MyPy checks - Do not warn about unused ignores. Some issue with mypy giving different errors in different environments (venv, system and pre-commit)	2023-02-17 16:13:05 -06:00
Debanjum Singh Solanky	051f0e3fb5	Add, configure and run pre-commit locally and in test workflow	2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky	5e83baab21	Use Black to format Khoj server code and tests	2023-02-17 11:55:17 -06:00
Debanjum Singh Solanky	8b293edd7c	Move mypy config into pyproject.toml. Ignore 2 remaining mypy issues	2023-02-16 03:33:08 -06:00
Debanjum Singh Solanky	c641eb4ad6	Improve rendering log and error stacktraces using the Rich package - Use Rich to render uvicorn, fastAPI logs as well The previous CustomFormatter only worked on khoj logs - Improve rendering stacktrace on errors using Rich	2023-02-15 16:19:32 -06:00
Debanjum Singh Solanky	bc7477ea3e	Move Emacs, Obsidian plugin code out from under src/khoj directory - What - The Emacs and Obsidian interfaces stay in their original directories under src/ - src/khoj now only contains code meant for pypi packaging - Benefits - This avoids having to update khoj MELPA, Obsidian plugin config as the Emacs, Obsidian code is under their original directories - It separates the code in src/khoj meant for python packaging from code for external interfaces like Emacs and Obsidian	2023-02-14 15:44:22 -06:00
Debanjum Singh Solanky	25a749ca1d	Use the src/ layout to fix packaging Khoj for PyPi - Why The khoj pypi packages should be installed in `khoj' directory. Previously it was being installed into `src' directory, which is a generic top level directory name that is discouraged from being used - Changes - move src/* to src/khoj/* - update `setup.py' to `find_packages' in `src' instead of project root - rename imports to form `from khoj.*' in complete project - update `constants.web_directory' path to use `khoj' directory - rename root logger to `khoj' in `main.py' - fix image_search tests to use the newly rename `khoj' logger - update config, docs, workflows to reference new path `src/khoj'	2023-02-14 15:19:06 -06:00
Debanjum	84322b2a45	Demo using Search in Khoj Obsidian Plugin	2023-02-14 08:43:50 -08:00
Debanjum Singh Solanky	a4dcb20622	Add setting to toggle auto configuring of khoj backend from Obsidian - By default the obsidian plugin automatically configures the khoj backend to index the current vault - For more complex scenarios, users can manage their ~/.khoj/khoj.yml manually by toggling the auto-configure setting off in the khoj plugin settings Resolves #156	2023-02-13 20:15:28 -06:00
Debanjum Singh Solanky	24aa696ef5	Indicate indexing active on Update button in Obsidian plugin settings Use moon rotating through phases to indicate notes indexing in progress Resolves #129	2023-02-13 19:28:19 -06:00
Debanjum Singh Solanky	11517ba8eb	Encode jsonl data as utf8 for gzip write for consistent read/write encoding Should help with issue #89	2023-02-12 17:33:23 -06:00
Debanjum Singh Solanky	3ec41c4d64	Wrap lines for org, markdown results in khoj search results buffer	2023-02-12 07:33:50 -06:00
Debanjum Singh Solanky	9a013ec48f	Add more details to setup Khoj backend in Obsidian plugin readme	2023-02-12 07:31:13 -06:00
Jason Axelson	6d5930363a	Fix obsidian plugins doc link Also make it more obvious where the link is going, initially I thought the link was to another official khoj documentation site.	2023-02-10 07:11:21 -10:00
Debanjum Singh Solanky	215235efd2	Bump khoj pre-release version	2023-02-08 20:24:36 -03:00
Debanjum Singh Solanky	2445664d40	Deprioritize searching for Music content over other text content	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	2e052913b6	Search in first configured content type when no search type set Instead of searching through all configured content types but only returning results of the last configured content type	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	a26ab31d20	Allow chat with markdown notes if no org-mode content configured	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	fbb7747dcc	Read Markdown file as utf8 instead of the default encoding used by OS - Background 1. Obsidian stores markdown notes as utf8[1] 2. By default, the python `open' command uses the OS locale encoding[2] This was causing the `UnicodeDecodeError: <locale_encoding> codec can't decode byte' error - Fix - Read markdown files as utf8 The Obsidian plugin is the main use-case for markdown files in khoj currently and that stores md files as utf8. Do not assume utf8 for other content types like org-mode, beancount for now. - Fail if error in reading file as utf8, instead of ignoring errors. Would rather have user realize that their files are not going to get indexed correctly. [1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3 [2]: https://docs.python.org/3/library/functions.html#open	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	66dca6cf33	Add Docs to Search across Languages, Uninstall Khoj to Readme Add details and fixes to Obsidian, Main readme based on feedback, confusion from the Obsidian plugin announcement	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	cba9a6a703	Use List, Tuple, Set from typing to support Python 3.8 for khoj Before Python 3.9, you can't directly use list, tuple, set etc for type hinting Resolves #130	2023-02-06 01:23:52 -03:00
Debanjum Singh Solanky	f26cee604d	Update Khoj Plugin Install Instructions. Rename main Readme to README Khoj plugin page from within Obsidian isn't recognized. Seems like it needs an uppercase readme file only. So it doesn't show the Khoj readme from within Obsidian itself.	2023-01-27 20:01:31 -03:00
Debanjum Singh Solanky	2e13e15625	Ensure markdown entries in khoj.el results separated by empty line - Update khoj.el test to reflect updated rendering logic - Move ledger render function before image rendered to group functions with similar logic closer	2023-01-26 19:13:02 -03:00
Debanjum Singh Solanky	85ae46f429	Use thread_last to make results rendering funcs more readable in khoj.el	2023-01-26 18:59:44 -03:00
Debanjum Singh Solanky	b415f87093	Split code in onChooseSuggestion method to make it more readable Split find file, jump to file code to make onChooseSuggestion more readable - Use find, instead of using return in forEach to get first match - Move the jump to file+heading code out from forEach	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	37063f6a38	Truncate query to 8k chars for find similar notes from obsidian plugin Truncate current file data passed to khoj backend API via query string below default query size supported by popular servers	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	4456cf5c8f	No need to use then or finally in async functions after an await	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	4070be637c	Pass app object from plugin instance to child objects and functions Do not reference global app object from child objects and funcs directly. It is only available for debugging purposes and access to it maybe dropped in the future.	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	c203c6a3fd	Use Sentence case for Find Similar Note command name in Khoj Obsidian	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	e18124ef6f	Add badge for tests and update project subtitle in khoj.el Readme	2023-01-23 20:52:03 -03:00
Debanjum Singh Solanky	86e808abfb	Test get-current-text helpers for Find Similar feature in khoj.el	2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky	be6acda212	Create khoj.el tests. Test rendering results of each content types	2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky	0d0bf3b5aa	Simplify get-current-text functions for Find Similar in khoj.el Use existing functions like `string-trim', `thing-at-point' and remove unneeded code from the two functions	2023-01-23 19:15:52 -03:00
Debanjum Singh Solanky	07e9e4ecc3	Get current paragraph text when point at start of paragraph in khoj.el Previously if cursor was at start of current paragraph, it would get text for the current and next paragraph, instead of just the current one	2023-01-23 18:05:54 -03:00
Debanjum Singh Solanky	a0b03c8bb1	Get current entry text when point at heading for Find Similar in khoj.el Previously if cursor was at heading of current entry, it would find entries similar to the previous outline heading, instead of the current one	2023-01-23 10:01:25 -03:00
Debanjum Singh Solanky	013c7c10a4	Bump khoj pre-release version	2023-01-22 18:45:56 -03:00
Debanjum Singh Solanky	ad3c9b5f44	Bump khoj version to 0.2.5 in preparation for release	2023-01-22 18:18:21 -03:00
Debanjum Singh Solanky	9ed056c7e7	Use consistent indentation in Khoj Emacs Readme	2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky	0980c6e87f	Update Emacs Usage section in Readme. Add find-similar, menu usage	2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky	6908b6eed3	Truncate image queries below max tokens length supported by ML model This would previously return the infamous tensor size mismatch error Verify this error is not raised since adding the query truncation logic	2023-01-21 14:11:00 -03:00
Debanjum Singh Solanky	3d9ed91e42	Search by image at path only if query of form "file:/path/to/image" Previously no query syntax helpers, like the "file:" prefix, were used before checking if query contains file path. This made query to image search brittle to misinterpretation and pointless checking Add test to verify search by image at file works as expected	2023-01-21 14:06:56 -03:00
Debanjum Singh Solanky	b7aa22a059	Change order of arg passed to query-api-and-render-results by importance	2023-01-20 22:13:24 -03:00
Debanjum Singh Solanky	936a88fa7e	Find items of specified type similar to current text item at point - Support querying with text surrounding point in any text buffer Previously could only find items similar to org entry at point - Find similar items of specified content type indexed on khoj Previously only looked for similar org entries indexed on khoj Now uses the content-type configured in khoj transient menu to find items of the specified content type - Details - Generalize the get-current-org-entry-text func to get text for any outline section - Replace leading whitespaces from query text as well - Create method to get current paragraph text from non-outline mode buffers - Update transient, find-similar funcs to pass, use content-type configured in khoj transient menu - Generalize query title creation logic to remove markdown headings prefix (#) apart from org heading prefix (*) as well - Update last used khoj content-type and results from the find-similar and update funcs for later reuse - Jump to top of results buffer after results rendered	2023-01-20 22:12:54 -03:00
Debanjum Singh Solanky	17aaadea1f	Find notes similar to current org entry at point	2023-01-20 05:14:54 -03:00
Debanjum Singh Solanky	44bbc0a417	Add section separators to khoj.el for easier code traversal	2023-01-19 23:36:54 -03:00
Debanjum Singh Solanky	48ad3c535e	Use default content types if fail to call backend on khoj.el load Do not want khoj.el to fail on init/load if khoj backend not running	2023-01-19 20:13:49 -03:00
Debanjum Singh Solanky	9f0bd0a361	Add Github workflow for khoj.el build and quality checks Add khoj.el build badge to khoj.el Readme	2023-01-19 20:13:19 -03:00
Debanjum Singh Solanky	0dd1cba272	Rename configuration sections in khoj.el transient menu	2023-01-19 03:03:08 -03:00
Debanjum Singh Solanky	5d0f369186	Add ability to quit khoj transient with standard q keybinding	2023-01-19 02:47:07 -03:00
Debanjum Singh Solanky	87c7cf4272	Use single khoj func as entrypoint. Group khoj.el code into sections - Give more relevant, specific name to khoj suffix commands - Remove `khoj-simple'. Have single `khoj' function for entrypoint	2023-01-19 02:38:19 -03:00
Debanjum Singh Solanky	9d64a009fd	Allow updating khoj content index from within khoj.el - Split transient config menu by type	2023-01-18 23:07:59 -03:00
Debanjum Singh Solanky	a8d0c7d905	Rename search type to more apt content type in khoj.el	2023-01-18 22:13:49 -03:00
Debanjum Singh Solanky	00daea16df	Allow setting default-search-type to image. Make docstrings compact	2023-01-18 22:01:17 -03:00
Debanjum Singh Solanky	216b17cfd0	Dynamically populate content type choices when khoj transient invoked	2023-01-18 22:00:56 -03:00
Debanjum Singh Solanky	5f446b1440	Convert main khoj.el entrypoint into transient menu for richer configuration	2023-01-18 21:50:07 -03:00
Debanjum Singh Solanky	5c07dcd219	Fix, update Obsidian Readme. Add Find Similar Notes to Implementation section	2023-01-18 00:22:26 -03:00
Debanjum	b7fc344be1	Search for Similar Notes from Obsidian Plugin Enable searching for notes similar to the current note being viewed ## Main Changes - `39a18e2` Extend search modal to search for similar notes - Hide input field on init, Trigger search on opening modal when in similar notes mode - Set input to contents of current markdown file and get notes similar to it - Re-rank, by default, when searching for similar notes - Filter out current note from similar note search results - `0bed410` Only show `Find Similar Note' command in Editor	2023-01-18 00:10:10 -03:00
Debanjum Singh Solanky	6119d0a69e	Add usage of "Find Similar Notes" command to the Khoj Obsidian Readme	2023-01-18 00:03:13 -03:00
Debanjum Singh Solanky	657e455785	Remove unused `onunload' method in main.ts of khoj obsidian plugin	2023-01-17 23:46:38 -03:00
Debanjum Singh Solanky	0bed410712	Limit Find Similar Note command to be triggered from Editor Fixup indentation and comments	2023-01-17 19:34:48 -03:00
Debanjum Singh Solanky	39a18e2080	Add ability to search for similar notes in Khoj Obsidian - Hide input field on init, Trigger search on opening modal in similar notes mode - Set input to current markdown file and get similar notes to it - Enable rerank when searching for similar notes - Filter out current note from similar note search results	2023-01-17 19:07:18 -03:00
Debanjum Singh Solanky	ffaef92476	Encode query string before passing as query param to search API	2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky	d5a7cc5b0f	Compact code to map results from search API into SearchResult objects Make code compact for readability Remove unneeded temporary variables and return statements	2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky	8ab7a26bde	Update Khoj on Obsidian screenshots in Main and Plugin Readme - Screenshot querying "Setup Editor" on test vault with Khoj Readmes - New features showcase: - information keybindings, rerank keybinding at bottom of modal - fixed top level headings in search results - search results snipped if greater than N words	2023-01-17 13:58:50 -03:00
Debanjum Singh Solanky	7b4f78776c	Fix extracting Markdown Entries with Top Level Headings - Previously top level headings would have get stripped of the space between heading text and the prefix # symbols. That is, `# Top Level Heading' would get converted to `#Top Level Heading' - This would mess up their rendering as a heading in search results - Add unit tests to text_to_jsonl processors to prevent regression	2023-01-17 13:06:28 -03:00
Debanjum Singh Solanky	1a296518c5	Limit total words for each Search Result rendered in search modal Provides a more consistent rendering of results in modal. Makes it easier to see more results in modal. To see complete entry, user can always just jump to entry from modal	2023-01-17 13:06:14 -03:00
Debanjum Singh Solanky	e7b89f7fd0	Return compiled entry in additional details of /api/search response This can be used to highlight portion of raw entry to highlight and for passing to summarizer to stay with max_tokens limit supported by GPT models	2023-01-16 22:56:06 -03:00
Debanjum Singh Solanky	7071d081e9	Increase max_tokens returned by GPT summarizer. Remove default params	2023-01-16 22:55:36 -03:00
Debanjum Singh Solanky	3d9cdadbbb	Add codebase visualization of Khoj Obsidian to Khoj Obsidian Readme	2023-01-15 14:09:21 -03:00
Debanjum Singh Solanky	d02ba325aa	Handle empty chat history returned by API to chat.html on web interface	2023-01-15 13:51:16 -03:00
Debanjum	3f2ea039a7	Add Chat page to the Khoj Web Interface ### Overview - Provide a chat interface to engage with and inquire your notes - Simplify interacting with the beta `chat` and `summarize` APIs ### Use - Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize - Type your queries, see summarized response by Khoj from your notes Note: - You will need to add an API key from OpenAI to your khoj.yml - Your query and top note from search result will be sent to OpenAI for processing ## Details - `177756b` Show chat history on loading chat page on web interface - `d8ee0f0` Save chat history to disk for persistence, seeing chat logs - `5294693` Style chat messages as speech bubbles - `d170747` Add khoj web interface and chat styling to new chat page on khoj web - `de6c146` Implement functional, unstyled chat page for khoj web interface	2023-01-13 23:02:19 -03:00
Debanjum Singh Solanky	16d4560ff8	Comment css styling of chat page for later reference	2023-01-13 22:40:01 -03:00
Debanjum Singh Solanky	cfef346d03	Do not update query field to ever chat message It doesn't work as well with chat, unlike for search page Use more appropriate thinking face emoji for you instead of surprise face	2023-01-13 22:24:26 -03:00
Debanjum Singh Solanky	177756be7e	Fetch chat history from backend and render it on chat page load	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	330febaa1a	Update conversation logs from /beta/summary API endpoint too	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	cb6f0b53c9	Make user_message_metadata arg to message_to_log in gpt.py optional - Use a default user_message_metadata if arg not set - Update conversation to use `by' as `you' and `khoj'	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	cc2456e411	Update /beta/chat API to return chat history if no query param passed	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	d8ee0f0e9a	Use scheduler to save chat history to disk every 5 minutes - The previous mechanism to trigger saving on shutdown event did not work - Use scheduler to persist chat sessions to disk at a 5 minute interval - This improve time granularity, fixed interval of saving chat logs - It may lose ~5 minutes of chat history until mechanism to also write on shutdown found/resolved - Create conversation directory if it doesn't exist before attempting write - Reset chat_session after writing it to disk	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	5294693e97	Style message as speech bubbles on chat page of web interface - Wrap messages into speech bubbles - Color messages by khoj blue, sender grey - Add those standard protrusions to the speech bubbles for fun - Align bubbles left or right based on sender - messages by khoj are left aligned, message by self are right aligned - Put message metadata like sender and time under speech bubble - use data-* attribute and ::after css pseudo-selector for this - Update renderMessage func to accept time param, remove unused type_ param	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	7723d656dc	Do not force GPT to summarize note using past tense Not all notes are in the past. Notes can be about stuff in the future. Casting them to past tense gives the impression that they've already happened / been done.	2023-01-13 13:10:35 -03:00
Debanjum Singh Solanky	2842e3a035	Automatically scroll to bottom of chat body on new messages	2023-01-13 13:09:51 -03:00
Debanjum Singh Solanky	34014635d0	Improve colors, fix contrast for accessability on web interface - Changes - Use blue color for khoj heading font - This fixes the title color issue - Update background to lighter shade - This fixes the body text color issue - Update colors for todo, done, miscellaneous todo state, tag color - This does not fix the color contrast issue but seems like an acceptable solution - Using white text rather than black text on blue background better even though the black text on blue background passes the WCAG acceptable contrast score - For details see blog post: https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/ - Add border to tags to give them tag pills look and differntiate from todo states - Buttons and inputs - Change background color of input fields like type dropdown, update button and results count counter, to match background color of page - Add shadow on hover over button, dropdowns Resolves #111	2023-01-12 21:59:50 -03:00
Debanjum Singh Solanky	d170747ec2	Add khoj web interface & chat styling to new chat page on khoj web - Ensure message input box sticks to bottom of screen - Ensure chat logs div is scrollable when logs become longer than screen Do not make the whole page scroll, just the chat logs body div	2023-01-12 21:58:46 -03:00
Debanjum Singh Solanky	de6c146290	Implement functional, unstyled chat page for khoj web interface Expose it at /chat URL	2023-01-12 21:53:25 -03:00
Debanjum Singh Solanky	e6793816f9	Upgrade Khoj.el Readme. Add TOC, Screenshot, Features Sections - Update Query filter details	2023-01-12 02:14:02 -03:00
Debanjum Singh Solanky	26f791e9ad	Update Obsidian Plugin Readme. Add Khoj icon to Khoj Modal Placeholder text - Fold Query Filter, Demo Description - Add Limitations to Readme - Add Update index bullet to Troubleshooting Options	2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky	3e63af5c94	Constrain grid rows to fix layout of Khoj web interface on Chrome	2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky	50c797962c	Jump to Search Result from Khoj Modal even on Obsidian Android Uses longest file path match to find markdown file in vault corresponding to file of search result returned by Khoj Allow jumping to search result from khoj plugin modal on Android too	2023-01-11 19:44:11 -03:00
Debanjum Singh Solanky	51ea6d9c9b	Do not force index update when configure backend on plugin load - Backend can handle incremental updates - Avoid khoj usability delay by avoiding recomputed everytime vault opened	2023-01-11 17:17:08 -03:00
Debanjum Singh Solanky	5996d47d7c	Trigger input event to Get, Render Reranked results from Khoj backend Previous mechanism of manually triggering getSuggestions, renderSuggestions flow was corrupting traversing and opening reranked search results in KhojModal Emulate event that would anyway trigger the get & render of results in modal. This lets obsidian core handle the flow without digging too deep into obsidian cores handling of the flow. Lowers the chance of breakage	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	1c813a6884	Convert results count setting to slider in plugin settings pane	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	4e1abd1b72	Disable update button while indexing vault in plugin settings	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	513c86c6a1	Set index file paths relative to current or default path on khoj backend We need the index file paths to make sense on the khoj backend server Having path of index on backend relative to current vault directory on frontend ignores the fact that the frontend maybe on a different machine than the khoj backend server Using unique index name per vault allows switching vaults without overwriting indices of other vaults created on khoj backend when khoj obsidian plugin is loaded on opening a different vault	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	4407e23c19	Only index current vault on Khoj. Remove plugin setting to configure it - Overview Limits using Khoj with a single vault at a time. This is automatically configured to the most recently opened vault. Once directory filters are supported on backend, the plugin will be updated to index multiple vault but search only current vault from current vaults khoj obsidian plugin - Code Details - Remove setting to configure Vault directory from Khoj Obsidian plugin - Automatically configure Khoj to index only current Vault. - Overwrites any previous vaults that were intended to be indexed by Khoj backend - Force update of index after configuring vault - Why It's not helpful for now and can lead to more problems, confusion. Once directory filters	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	86a1e43605	Return HTTP Exception on /api/update API call failure - Previously the backend was just throwing backend error. The frontend calling the /update API wasn't getting notified - Now the frontend can react appropriately and make the issue visible to the user	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	5af2b68e2b	Update plugin notifications for errors and success - Only show notification on plugin load and failure. - In settings page, set current backend status at top of pane instead of showing notification Notices bubbles cluttered the UI while typing updates to settings - Show notification once index updated via settings pane button click There was no notification on index updated, which usually takes time on the backend	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	853192932a	setCTA on Khoj Obsidian plugin button. Minor cleanup of space, tabs	2023-01-10 23:36:02 -03:00
Debanjum Singh Solanky	da49ea272c	Add placeholder text to modal in Khoj Obsidian plugin	2023-01-10 22:50:11 -03:00
Debanjum Singh Solanky	580f4aca23	Add hints to Modal for available Keybindings	2023-01-10 22:03:47 -03:00
Debanjum Singh Solanky	b52cd85c76	Allow Reranking Results using Keybinding from Khoj Search Modal	2023-01-10 21:59:38 -03:00
Debanjum Singh Solanky	7991ab7a86	Add button in Obsidian plugin settings to force re-indexing your vault	2023-01-10 19:49:12 -03:00
Debanjum Singh Solanky	f046a95f3d	Track connectedToBackend as a setting. Use it across obsidian plugin - Display warning at top of khoj obsidian plugin settings - Make search command available only if connected to backend - Show warning notice on clicking khoj search ribbon button - Call saveData after configureKhojBackend to ensure connnectedToBackend setting saved after being (potentially) updated in configureKhojBackend function	2023-01-10 17:28:47 -03:00
Debanjum Singh Solanky	768e874185	Load obsidian plugin even if fail to connect to backend but show warning - Previously the plugin would not load if cannot connect to Khoj backend - Silently failing to load with no reason provided is not helpful - Load plugin to allow user to fix the Khoj URL in their plugin setting - Show reason for khoj plugin not working. More helpful than failing silently	2023-01-10 17:20:02 -03:00
Debanjum Singh Solanky	aa22d83172	Create and use a context manager to time code Use the timer context manager in all places where code was being timed - Benefits - Deduplicate timing code scattered across codebase. - Provides single place to manage perf timing code - Use consistent timing log patterns	2023-01-09 19:48:16 -03:00
Debanjum Singh Solanky	93f39dbd43	Add typing to text_search. Reformat code to set existing_embedding	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	db7483329c	Only import type hint packages for type checking. Avoids circular imports Use annotations from the __future__ package to avoid having to quote type hints. This import will not be required after Python 3.11	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	e5254a8e56	Create BaseEncoder class. Make OpenAI encoder its child. Use for typing - Set type of all bi_encoders to BaseEncoder - Make load_model return type Union of CrossEncoder and BaseEncoder	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	cf7400759b	Remove unused render_results method from text and image search It's a relic from when khoj was being used as a python module	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	afcfc3cd62	Split text_search.query logic into separate methods for modularity The query method had become too big. Extract out filter, score, sort and deduplicate logic used by text_search.query into separate methods. This should improve readabilty of code.	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	8dc6ee8b6c	Pass `model' arg to extract_search_type method from beta search API Issue caught by mypy	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	8498903641	Fix, add typing to Filter and TextSearchModel classes - Changes - Fix method signatures of BaseFilter subclasses. Else typing information isn't translating to them - Explicitly pass `entries: list[Entry]' as arg to `load' method - Fix type of `raw_entries' arg to `apply' method to list[Entry] from list[str] - Rename `raw_entries' arg to `apply' method to `entries' - Fix `raw_query' arg used in `apply' method of subclasses to `query' - Set type of entries, corpus_embeddings in TextSearchModel - Verification Ran `mypy --config-file .mypy.ini src' to verify typing	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	eace7c6215	Use torch.tensor as torch.Tensor cannot create tensor on MPS device - `torch.Tensor' is apparently a legacy tensor constructor - Using that to create tensor on MPS devices throws error: RuntimeError: legacy constructor expects device type: cpu but device type: mps was passed - `torch.tensor' can handle creating tensors on Mac GPU (MPS) fine	2023-01-09 19:47:19 -03:00
Debanjum Singh Solanky	9def3f8c6f	Add exception handling to beta APIs, in case OpenAI API call fails	2023-01-09 01:27:06 -03:00
Debanjum Singh Solanky	7b164de021	Add beta API to summarize top search result using an OpenAI model This is unlike the more general chat API that combines summarization of top search result and conversing with the OpenAI model This should give faster summary results. As no intent categorization API call required	2023-01-09 01:25:59 -03:00
Debanjum Singh Solanky	d36da46f7b	Truncate prompt to not exceed OpenAI prompt limit Truncate prompt containing the top retrieved entry to 500 words to avoid triggering the max_token limit error	2023-01-09 00:51:46 -03:00
Debanjum Singh Solanky	237123d18c	Fix tests for the conversation processor - Use latest davinci model for tests - Wrap prompt in triple quotes to improve legibilty - `understand' method returns dictionary instead of string. Fix its test - Fix prompt for new model to pass `chat_with_history' test	2023-01-09 00:22:26 -03:00
Debanjum Singh Solanky	918af5e6f8	Make OpenAI conversation model configurable via khoj.yml - Default to using `text-davinci-003' if conversation model not explicitly configured by user. Stop using the older `davinci' and `davinci-instruct' models - Use `model' instead of `engine' as parameter. Usage of `engine' parameter in OpenAI API is deprecated	2023-01-09 00:17:51 -03:00
Debanjum Singh Solanky	74e779f8d0	Fix /beta/chat API to use Entry class instead of old dictionary pattern Search returns response of type SearchResponse instead of a dict now	2023-01-08 15:28:26 -03:00
Debanjum Singh Solanky	f2436039a0	Improve readability of GPT prompt strings in conversation processor	2023-01-08 15:27:41 -03:00
Debanjum Singh Solanky	6119005838	Improve comments, exceptions, typing and init of OpenAI model code	2023-01-08 00:36:18 -03:00
Debanjum Singh Solanky	c0ae8eee99	Allow using OpenAI models for search in Khoj - Init processor before search to instantiate `openai_api_key' from `khoj.yml'. The key is used to configure search with openai models - To use OpenAI models for search in Khoj - Set `encoder' to name of an OpenAI model. E.g text-embedding-ada-002 - Set `encoder-type' in `khoj.yml' to `src.utils.models.OpenAI' - Set `model-directory' to `null', as online model cannot be stored on disk	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	826f9dc054	Drop long words from compiled entries to be within max token limit of models Long words (>500 characters) provide less useful context to models. Dropping very long words allow models to create better embeddings by passing more of the useful context from the entry to the model	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	6a30a13326	Only create model directory if the optional field is set in SearchConfig	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	2fe37a090f	Make type of encoder to use for embeddings configurable via khoj.yml - Previously `model_type' was set in the setup of each `search_type' - All encoders were of type `SentenceTransformer' - All cross_encoders were of type `CrossEncoder' - Now `encoder-type' can be configured via the new `encoder_type' field in `TextSearchConfig' under `search-type` in `khoj.yml`. - All the specified `encoder-type' class needs is an `encode' method that takes entries and returns embedding vectors	2023-01-07 23:09:12 -03:00
Debanjum Singh Solanky	d55d7d53dc	Fix GPU usage by Khoj on Macs to speed up search and indexing - Ensure all tensors are on MPS device before doing operations across them - Background - GPU is used by default for Khoj on MacOS now - Needed PyTorch > 1.13.0 on Macs to use GPU, which we do now - MPS should speed up search and indexing on MacOS	2023-01-05 15:39:09 -03:00
Debanjum	abd035e2fa	Merge PR #112 to fix quote usage in khoj.el docstring from suliveevil/master Fix usage warning for unescaped single quote in `khoj.el' docstring. Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs	2023-01-05 13:24:11 -03:00
Debanjum Singh Solanky	e792523849	Bump version in metadata packages for khoj, khoj.el and obsidian plugin	2023-01-05 12:50:27 -03:00
suliveevil	b2812b409f	fix docstring usage warning ⛔ Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)	2023-01-05 16:47:38 +08:00
Debanjum Singh Solanky	47015ee6cc	Fold Demo video descriptions, analysis by default in main Readme	2023-01-04 20:13:43 -03:00
Debanjum Singh Solanky	da17ff6ac8	Add Upgrade instructions for Khoj.el Readme. Fix version of khoj.el	2023-01-04 20:06:39 -03:00
Debanjum Singh Solanky	66ccd0c970	Create Obsidian plugin for Khoj - Features - Search using Khoj from within the Obsidian app Allow Natural language search on your (markdown) notes in Obsidian Vault - Show search results as rendered (instead of raw) Markdown Improve legibility of the results - Jump to selected note from search result in Khoj search modal Simplify seeing result within its original note context - Automatically configure khoj to index markdown files in current vault Reduce khoj setup steps for plugin users by using reasonable defaults - Code updates the markdown config in khoj.yml and triggers index update - It can be configured by user in khoj plugin settings, if required - Add Demo and detailed Readme for the Obsidian plugin Ease setup and usage. Give context about capabilities - Miscellaneous - Trying keep a mono repo until the Khoj project is mature enough to reduce maintainance burden	2023-01-04 18:28:16 -03:00
Debanjum Singh Solanky	feddb6ce62	Add start_url to khoj webmanifest to show Khoj as PWA on Chrome	2023-01-04 13:37:56 -03:00
Debanjum Singh Solanky	3dee1aed9e	Create /config/data/default API endpoint to serve default khoj config This can ease configuring khoj from the different interfaces - Don't need to know all the (default) config used by khoj. - Just get default config by calling the above API endpoint. - Then modify desired portions and call POST /api/config/data to configure khoj.	2023-01-03 21:52:34 -03:00
Debanjum Singh Solanky	ce945f7a90	Configure processors too on calling /update API - Previously only search was being reconfigured - But Processors are configured on app start too - Match that behavior on calling /update API	2023-01-03 21:51:02 -03:00
Debanjum Singh Solanky	9d31988f42	Allow starting khoj in non-GUI mode without config file instantiated - Start khoj server (in non-GUI mode) without needing config file already instantiated. - But throw warning to configure khoj to use it - This allows plugins to configure the app via the /config/data APIs - To be used by the Khoj obsidian plugin to configure markdown content in khoj	2023-01-03 21:36:59 -03:00
Debanjum Singh Solanky	52664dd96c	Allow recursive glob pattern (**) to add files to search index - Simplify configuring files to index For Obsidian/Org-Roam type systems with lots of small files in khoj.yml using `input-filter'	2023-01-03 01:32:58 -03:00
Debanjum Singh Solanky	152e5f1661	Return the file of each search result in response - Useful for enabling jump to note functionality in interfaces - It will be used in the Khoj plugin for Obsidian	2023-01-03 01:25:34 -03:00
Debanjum Singh Solanky	c535953915	Update index automatically in non GUI mode too - Poll scheduler every minute using threading.Timer - Use 60 seconds polling interval to avoid fork bombing - Schedule next via the same poll scheduler - Allow clean program interrupt by running scheduler in daemon mode	2023-01-01 21:03:19 -03:00
Debanjum Singh Solanky	701d92e17b	Lock the index before updating it via API or Scheduler - There are 3 paths to updating/setting the index (stored in state.model) - App start - API - Scheduler - Put all updates to the index behind a lock. As multiple updates path that could (potentially) run at the same time (via API or Scheduler)	2023-01-01 17:09:36 -03:00
Debanjum Singh Solanky	3b0783aab9	Automate updating embeddings, search index on a hourly schedule - Use the schedule pypi package - Use QTimer to poll schedule.run_pending() regularly for jobs to run	2023-01-01 17:09:36 -03:00
Debanjum	06c25682c9	Split text entries by max tokens supported by ML models ### Background There is a limit to the maximum input tokens (words) that an ML model can encode into an embedding vector. For the models used for text search in khoj, a max token size of 256 words is appropriate [1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1#:~:text=model%20was%20just%20trained%20on%20input%20text%20up%20to%20250%20word%20pieces),[2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#:~:text=input%20text%20longer%20than%20256%20word%20pieces%20is%20truncated) ### Issue Until now entries exceeding max token size would silently get truncated during embedding generation. So the truncated portion of the entries would be ignored when matching queries with entries This would degrade the quality of the results ### Fix - `e057c8e` Add method to split entries by specified max tokens limit - Split entries by max tokens while converting [Org](https://github.com/debanjum/khoj/commit/c79919b), [Markdown](https://github.com/debanjum/khoj/commit/f209e30) and [Beancount](https://github.com/debanjum/khoj/commit/17fa123) entries to JSONL - `b283650` Deduplicate results for user query by raw text before returning results ### Results - The quality of the search results should improve - Relevant, long entries should show up in results more often	2022-12-26 18:23:43 +00:00
Debanjum Singh Solanky	17fa123b4e	Split entries by max tokens while converting Beancount entries To JSONL	2022-12-26 15:14:32 -03:00
Debanjum Singh Solanky	f209e30a3b	Split entries by max tokens while converting Markdown entries To JSONL	2022-12-26 13:14:15 -03:00
Debanjum Singh Solanky	24676f95d8	Fix comments, use minimal test case, regenerate test index, merge debug logs - Remove property drawer from test entry for max_words splitting test - Property drawer is not required for the test - Keep minimal test case to reduce chance for confusion	2022-12-25 22:33:04 -03:00
Debanjum Singh Solanky	b283650991	Deduplicate results for user query by raw text before returning results - Required because entries are now split by the max_word count supported by the ML models - This would now result in potentially duplicate hits, entries being returned to user - Do deduplication after ranking to get the top ranked deduplicated results	2022-12-25 21:36:15 -03:00

... 7 8 9 10 11 ...

1307 commits