sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-28 18:03:01 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	45f461d175	Keep search results passed to GPT as context in conversation logs This will be useful to 1. Show source references used to arrive at answer 2. Carry out multi-turn conversations	2023-03-05 16:00:19 -06:00
Debanjum Singh Solanky	7cad1c9428	Only use past chat message, not session summaries as chat context Passing only chat messages for current active, and summaries for past session isn't currently as useful	2023-03-05 16:00:18 -06:00
Debanjum Singh Solanky	ad1f1cf620	Improve and simplify Khoj Chat using ChatGPT - Set context by either including last 2 chat messages from active session or past 2 conversation summaries from conversation logs - Set personality in system message - Place personality system message before last completed back & forth This may stop ChatGPT forgetting its personality as conversation progresses given: - The conditioning based on system role messages is light - If system message is too far back in conversation history, the model may forget its personality conditioning - If system message at end of conversation, the model can think its the start of a new conversation - Inserting the system message before last completed back & forth should prevent ChatGPT from assuming its the start of a new conversation while not losing personality conditioning from the system message - Simplfy the Khoj Chat API to for now just answer from users notes instead of trying to infer other potential interaction types. - This is the default expected behavior from the feature anyway - Use the compiled text of the top 2 search results for context - Benefits of using ChatGPT - Better model - 1/10th the price - No hand rolled prompt required to make GPT provide more chatty, assistant type responses	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	9d42b5d60d	Use multiple compiled search results for more relevant context to GPT Increase temperature to allow GPT to collect answer across multiple notes	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	c3b624e351	Introduce improved answer API and prompt. Use by default in chat web interface - Improve GPT prompt - Make GPT answer users query based on provided notes instead of summarizing the provided notes - Make GPT be truthful using prompt and reduced temperature - Use Official OpenAI Q&A prompt from cookbook as starting reference - Replace summarize API with the improved answer API endpoint - Default to answer type in chat web interface. The chat type is not fit for default consumption yet	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	7184508784	Mention Python and Pip need to be installed in Main and Emacs Readme	2023-03-02 21:28:54 -06:00
Debanjum Singh Solanky	211e460398	Output date filter from cache log at debug level. Remove unused imports Other logs not directly useful to user have already been converted to debug log levels in `1ae4016`. Just forgot to convert this log line too	2023-03-02 15:41:32 -06:00
Debanjum Singh Solanky	b6dbe4dd1d	Do not try retrieve an unconfigured core content type in Config GUI Previous behavior was resulting in a null reference error. As key for the core content/search type was not present in current config Fallback to using default config for unconfigured core content type instead See #165 for details	2023-03-02 11:09:31 -06:00
Debanjum Singh Solanky	1ae40163a9	Show user friendly information logs by default for context - Use emojis to make info logs easier to read - Inform when khoj is ready to use - Provide information on what khoj is doing while starting up - Inform when content/search types and processors are setup - Inform when models are being loaded from the web as this step can take time - Convert all other info logs to be only shown in verbose mode	2023-03-01 16:39:07 -06:00
Debanjum Singh Solanky	fe03ba3dce	Index intro text before headings in org files - Text before headings was not being indexed due to buggy orgnode parsing logic - Resolved indexing intro text from files with and without headings in them - Ensure intro text node has heading set to all title lines collected from the file Resolves #165	2023-03-01 12:11:33 -06:00
Debanjum Singh Solanky	7ad251b8ef	Log and Continue on OSError while collating dates for date filters Log to understand if error, date can be handled better Mitigates #172	2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky	2bed4c3b50	Fix configuring search types & /config/types API when no plugin configured - Test /config/types API when no plugin configured, only plugin configured and no content configured scenarios - Do not throw null reference exception while configuring search types when no plugin configured - Do not throw null reference exception on calling /config/types API when no plugin configured Resolves bug introduced by #173	2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky	8914dbd073	Fix creating GUI panels for unconfigured search, processor types Repro: 1. Open khoj server with `khoj` on first run 2. Install/enable Khoj Obsidian plugin (to configure khoj server) 3. Restart khoj server with `khoj` Bug: - Unconfigured processor and search_types are instantiated as None in self.current_config - While creating the desktop GUI, these null configs are attempted to be accessed as valid dictionaries for creating their GUI panels - This results in the null ref errors Fix: Use default config to create their GUI elements for unconfigured search and processor types Resolves #167	2023-03-01 01:20:58 -06:00
Debanjum Singh Solanky	b09350c052	Fix to return only enabled content types via the new config/types API - Previously was return all core content types even if they had not been setup - Add test to validate only configured content types are returned by the api/config/types API endpoint	2023-02-28 22:08:26 -06:00
Debanjum Singh Solanky	b177adf3a7	Return value of search_type in /config/type API endpoint - Remove need for interfaces to downcase content types returned by API before using the type in search and other API endpoint - Fix to check for search_type.name in plugin keys instead of value	2023-02-28 21:49:26 -06:00
Debanjum Singh Solanky	88344f9ed2	Improve rendering search results of plugin content types on web interface Render only the entry from plugin search response instead of raw json Use the results-ledger styling for results-plugin styling	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	c2814fce58	Improve rendering search results of plugin content types in khoj.el Render only the entry from plugin search response instead of raw json	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	f3f24387ec	Use new config/types API to set enabled content types on web interface	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	1e43f1a12e	Use new config/types API to set enabled content types in khoj.el menu	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	9d38eadd42	Return enabled content types via api/config/types API endpoint Simplifies dynamically populating enabled content types for interfaces	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	68bd5d9ebc	Configure API routes after set up search types while configuring server Configure app routes after configuring server. Import API routers after search type is dynamically populated. Allow API to recognize the dynamically populated plugin search types as valid type query param. Enable searching for plugin type content.	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	d91c7e2761	Search for plugin content via the search API	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	47b58a2a4d	Configure, use dynamically instantiated SearchType enum on app start The SearchType is now dynamically populated with core and configured plugin types Use the new dynamic SearchType enum from state.py across codebase	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	ab0d3a08e2	Index configured plugins on app start and via update API endpoint	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	55a032e8c4	Add processor to index entries from jsonl files for plugins - Read, merge entries from input jsonl files and filters - Mark new, modified entries for update	2023-02-24 02:54:12 -06:00
Debanjum Singh Solanky	fcbbe8c759	Read content plugin configs from Khoj config YAML Configure external text content plugins via the Khoj YAML Reuse existing TextContentConfig definition for external text content plugins	2023-02-23 23:57:32 -06:00
Debanjum Singh Solanky	61b6ee2857	Use helper script to bump khoj pre-release versions	2023-02-17 20:31:51 -06:00
Debanjum Singh Solanky	053d6141f3	Ignore ts typing error, Fix SPDX license identifier in Obsidian plugin	2023-02-17 18:19:01 -06:00
Debanjum Singh Solanky	36be3c4b8f	Fix or ignore MyPy issues in PyQt desktop GUI code - Remove unneeded type ignore for mps with the latest mypy - Stop excluding PyQT desktop GUI code from MyPy checks - Do not warn about unused ignores. Some issue with mypy giving different errors in different environments (venv, system and pre-commit)	2023-02-17 16:13:05 -06:00
Debanjum Singh Solanky	051f0e3fb5	Add, configure and run pre-commit locally and in test workflow	2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky	5e83baab21	Use Black to format Khoj server code and tests	2023-02-17 11:55:17 -06:00
Debanjum Singh Solanky	8b293edd7c	Move mypy config into pyproject.toml. Ignore 2 remaining mypy issues	2023-02-16 03:33:08 -06:00
Debanjum Singh Solanky	c641eb4ad6	Improve rendering log and error stacktraces using the Rich package - Use Rich to render uvicorn, fastAPI logs as well The previous CustomFormatter only worked on khoj logs - Improve rendering stacktrace on errors using Rich	2023-02-15 16:19:32 -06:00
Debanjum Singh Solanky	bc7477ea3e	Move Emacs, Obsidian plugin code out from under src/khoj directory - What - The Emacs and Obsidian interfaces stay in their original directories under src/ - src/khoj now only contains code meant for pypi packaging - Benefits - This avoids having to update khoj MELPA, Obsidian plugin config as the Emacs, Obsidian code is under their original directories - It separates the code in src/khoj meant for python packaging from code for external interfaces like Emacs and Obsidian	2023-02-14 15:44:22 -06:00
Debanjum Singh Solanky	25a749ca1d	Use the src/ layout to fix packaging Khoj for PyPi - Why The khoj pypi packages should be installed in `khoj' directory. Previously it was being installed into `src' directory, which is a generic top level directory name that is discouraged from being used - Changes - move src/* to src/khoj/* - update `setup.py' to `find_packages' in `src' instead of project root - rename imports to form `from khoj.*' in complete project - update `constants.web_directory' path to use `khoj' directory - rename root logger to `khoj' in `main.py' - fix image_search tests to use the newly rename `khoj' logger - update config, docs, workflows to reference new path `src/khoj'	2023-02-14 15:19:06 -06:00
Debanjum	84322b2a45	Demo using Search in Khoj Obsidian Plugin	2023-02-14 08:43:50 -08:00
Debanjum Singh Solanky	a4dcb20622	Add setting to toggle auto configuring of khoj backend from Obsidian - By default the obsidian plugin automatically configures the khoj backend to index the current vault - For more complex scenarios, users can manage their ~/.khoj/khoj.yml manually by toggling the auto-configure setting off in the khoj plugin settings Resolves #156	2023-02-13 20:15:28 -06:00
Debanjum Singh Solanky	24aa696ef5	Indicate indexing active on Update button in Obsidian plugin settings Use moon rotating through phases to indicate notes indexing in progress Resolves #129	2023-02-13 19:28:19 -06:00
Debanjum Singh Solanky	11517ba8eb	Encode jsonl data as utf8 for gzip write for consistent read/write encoding Should help with issue #89	2023-02-12 17:33:23 -06:00
Debanjum Singh Solanky	3ec41c4d64	Wrap lines for org, markdown results in khoj search results buffer	2023-02-12 07:33:50 -06:00
Debanjum Singh Solanky	9a013ec48f	Add more details to setup Khoj backend in Obsidian plugin readme	2023-02-12 07:31:13 -06:00
Jason Axelson	6d5930363a	Fix obsidian plugins doc link Also make it more obvious where the link is going, initially I thought the link was to another official khoj documentation site.	2023-02-10 07:11:21 -10:00
Debanjum Singh Solanky	215235efd2	Bump khoj pre-release version	2023-02-08 20:24:36 -03:00
Debanjum Singh Solanky	2445664d40	Deprioritize searching for Music content over other text content	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	2e052913b6	Search in first configured content type when no search type set Instead of searching through all configured content types but only returning results of the last configured content type	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	a26ab31d20	Allow chat with markdown notes if no org-mode content configured	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	fbb7747dcc	Read Markdown file as utf8 instead of the default encoding used by OS - Background 1. Obsidian stores markdown notes as utf8[1] 2. By default, the python `open' command uses the OS locale encoding[2] This was causing the `UnicodeDecodeError: <locale_encoding> codec can't decode byte' error - Fix - Read markdown files as utf8 The Obsidian plugin is the main use-case for markdown files in khoj currently and that stores md files as utf8. Do not assume utf8 for other content types like org-mode, beancount for now. - Fail if error in reading file as utf8, instead of ignoring errors. Would rather have user realize that their files are not going to get indexed correctly. [1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3 [2]: https://docs.python.org/3/library/functions.html#open	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	66dca6cf33	Add Docs to Search across Languages, Uninstall Khoj to Readme Add details and fixes to Obsidian, Main readme based on feedback, confusion from the Obsidian plugin announcement	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	cba9a6a703	Use List, Tuple, Set from typing to support Python 3.8 for khoj Before Python 3.9, you can't directly use list, tuple, set etc for type hinting Resolves #130	2023-02-06 01:23:52 -03:00
Debanjum Singh Solanky	f26cee604d	Update Khoj Plugin Install Instructions. Rename main Readme to README Khoj plugin page from within Obsidian isn't recognized. Seems like it needs an uppercase readme file only. So it doesn't show the Khoj readme from within Obsidian itself.	2023-01-27 20:01:31 -03:00
Debanjum Singh Solanky	2e13e15625	Ensure markdown entries in khoj.el results separated by empty line - Update khoj.el test to reflect updated rendering logic - Move ledger render function before image rendered to group functions with similar logic closer	2023-01-26 19:13:02 -03:00
Debanjum Singh Solanky	85ae46f429	Use thread_last to make results rendering funcs more readable in khoj.el	2023-01-26 18:59:44 -03:00
Debanjum Singh Solanky	b415f87093	Split code in onChooseSuggestion method to make it more readable Split find file, jump to file code to make onChooseSuggestion more readable - Use find, instead of using return in forEach to get first match - Move the jump to file+heading code out from forEach	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	37063f6a38	Truncate query to 8k chars for find similar notes from obsidian plugin Truncate current file data passed to khoj backend API via query string below default query size supported by popular servers	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	4456cf5c8f	No need to use then or finally in async functions after an await	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	4070be637c	Pass app object from plugin instance to child objects and functions Do not reference global app object from child objects and funcs directly. It is only available for debugging purposes and access to it maybe dropped in the future.	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	c203c6a3fd	Use Sentence case for Find Similar Note command name in Khoj Obsidian	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	e18124ef6f	Add badge for tests and update project subtitle in khoj.el Readme	2023-01-23 20:52:03 -03:00
Debanjum Singh Solanky	86e808abfb	Test get-current-text helpers for Find Similar feature in khoj.el	2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky	be6acda212	Create khoj.el tests. Test rendering results of each content types	2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky	0d0bf3b5aa	Simplify get-current-text functions for Find Similar in khoj.el Use existing functions like `string-trim', `thing-at-point' and remove unneeded code from the two functions	2023-01-23 19:15:52 -03:00
Debanjum Singh Solanky	07e9e4ecc3	Get current paragraph text when point at start of paragraph in khoj.el Previously if cursor was at start of current paragraph, it would get text for the current and next paragraph, instead of just the current one	2023-01-23 18:05:54 -03:00
Debanjum Singh Solanky	a0b03c8bb1	Get current entry text when point at heading for Find Similar in khoj.el Previously if cursor was at heading of current entry, it would find entries similar to the previous outline heading, instead of the current one	2023-01-23 10:01:25 -03:00
Debanjum Singh Solanky	013c7c10a4	Bump khoj pre-release version	2023-01-22 18:45:56 -03:00
Debanjum Singh Solanky	ad3c9b5f44	Bump khoj version to 0.2.5 in preparation for release	2023-01-22 18:18:21 -03:00
Debanjum Singh Solanky	9ed056c7e7	Use consistent indentation in Khoj Emacs Readme	2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky	0980c6e87f	Update Emacs Usage section in Readme. Add find-similar, menu usage	2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky	6908b6eed3	Truncate image queries below max tokens length supported by ML model This would previously return the infamous tensor size mismatch error Verify this error is not raised since adding the query truncation logic	2023-01-21 14:11:00 -03:00
Debanjum Singh Solanky	3d9ed91e42	Search by image at path only if query of form "file:/path/to/image" Previously no query syntax helpers, like the "file:" prefix, were used before checking if query contains file path. This made query to image search brittle to misinterpretation and pointless checking Add test to verify search by image at file works as expected	2023-01-21 14:06:56 -03:00
Debanjum Singh Solanky	b7aa22a059	Change order of arg passed to query-api-and-render-results by importance	2023-01-20 22:13:24 -03:00
Debanjum Singh Solanky	936a88fa7e	Find items of specified type similar to current text item at point - Support querying with text surrounding point in any text buffer Previously could only find items similar to org entry at point - Find similar items of specified content type indexed on khoj Previously only looked for similar org entries indexed on khoj Now uses the content-type configured in khoj transient menu to find items of the specified content type - Details - Generalize the get-current-org-entry-text func to get text for any outline section - Replace leading whitespaces from query text as well - Create method to get current paragraph text from non-outline mode buffers - Update transient, find-similar funcs to pass, use content-type configured in khoj transient menu - Generalize query title creation logic to remove markdown headings prefix (#) apart from org heading prefix (*) as well - Update last used khoj content-type and results from the find-similar and update funcs for later reuse - Jump to top of results buffer after results rendered	2023-01-20 22:12:54 -03:00
Debanjum Singh Solanky	17aaadea1f	Find notes similar to current org entry at point	2023-01-20 05:14:54 -03:00
Debanjum Singh Solanky	44bbc0a417	Add section separators to khoj.el for easier code traversal	2023-01-19 23:36:54 -03:00
Debanjum Singh Solanky	48ad3c535e	Use default content types if fail to call backend on khoj.el load Do not want khoj.el to fail on init/load if khoj backend not running	2023-01-19 20:13:49 -03:00
Debanjum Singh Solanky	9f0bd0a361	Add Github workflow for khoj.el build and quality checks Add khoj.el build badge to khoj.el Readme	2023-01-19 20:13:19 -03:00
Debanjum Singh Solanky	0dd1cba272	Rename configuration sections in khoj.el transient menu	2023-01-19 03:03:08 -03:00
Debanjum Singh Solanky	5d0f369186	Add ability to quit khoj transient with standard q keybinding	2023-01-19 02:47:07 -03:00
Debanjum Singh Solanky	87c7cf4272	Use single khoj func as entrypoint. Group khoj.el code into sections - Give more relevant, specific name to khoj suffix commands - Remove `khoj-simple'. Have single `khoj' function for entrypoint	2023-01-19 02:38:19 -03:00
Debanjum Singh Solanky	9d64a009fd	Allow updating khoj content index from within khoj.el - Split transient config menu by type	2023-01-18 23:07:59 -03:00
Debanjum Singh Solanky	a8d0c7d905	Rename search type to more apt content type in khoj.el	2023-01-18 22:13:49 -03:00
Debanjum Singh Solanky	00daea16df	Allow setting default-search-type to image. Make docstrings compact	2023-01-18 22:01:17 -03:00
Debanjum Singh Solanky	216b17cfd0	Dynamically populate content type choices when khoj transient invoked	2023-01-18 22:00:56 -03:00
Debanjum Singh Solanky	5f446b1440	Convert main khoj.el entrypoint into transient menu for richer configuration	2023-01-18 21:50:07 -03:00
Debanjum Singh Solanky	5c07dcd219	Fix, update Obsidian Readme. Add Find Similar Notes to Implementation section	2023-01-18 00:22:26 -03:00
Debanjum	b7fc344be1	Search for Similar Notes from Obsidian Plugin Enable searching for notes similar to the current note being viewed ## Main Changes - `39a18e2` Extend search modal to search for similar notes - Hide input field on init, Trigger search on opening modal when in similar notes mode - Set input to contents of current markdown file and get notes similar to it - Re-rank, by default, when searching for similar notes - Filter out current note from similar note search results - `0bed410` Only show `Find Similar Note' command in Editor	2023-01-18 00:10:10 -03:00
Debanjum Singh Solanky	6119d0a69e	Add usage of "Find Similar Notes" command to the Khoj Obsidian Readme	2023-01-18 00:03:13 -03:00
Debanjum Singh Solanky	657e455785	Remove unused `onunload' method in main.ts of khoj obsidian plugin	2023-01-17 23:46:38 -03:00
Debanjum Singh Solanky	0bed410712	Limit Find Similar Note command to be triggered from Editor Fixup indentation and comments	2023-01-17 19:34:48 -03:00
Debanjum Singh Solanky	39a18e2080	Add ability to search for similar notes in Khoj Obsidian - Hide input field on init, Trigger search on opening modal in similar notes mode - Set input to current markdown file and get similar notes to it - Enable rerank when searching for similar notes - Filter out current note from similar note search results	2023-01-17 19:07:18 -03:00
Debanjum Singh Solanky	ffaef92476	Encode query string before passing as query param to search API	2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky	d5a7cc5b0f	Compact code to map results from search API into SearchResult objects Make code compact for readability Remove unneeded temporary variables and return statements	2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky	8ab7a26bde	Update Khoj on Obsidian screenshots in Main and Plugin Readme - Screenshot querying "Setup Editor" on test vault with Khoj Readmes - New features showcase: - information keybindings, rerank keybinding at bottom of modal - fixed top level headings in search results - search results snipped if greater than N words	2023-01-17 13:58:50 -03:00
Debanjum Singh Solanky	7b4f78776c	Fix extracting Markdown Entries with Top Level Headings - Previously top level headings would have get stripped of the space between heading text and the prefix # symbols. That is, `# Top Level Heading' would get converted to `#Top Level Heading' - This would mess up their rendering as a heading in search results - Add unit tests to text_to_jsonl processors to prevent regression	2023-01-17 13:06:28 -03:00
Debanjum Singh Solanky	1a296518c5	Limit total words for each Search Result rendered in search modal Provides a more consistent rendering of results in modal. Makes it easier to see more results in modal. To see complete entry, user can always just jump to entry from modal	2023-01-17 13:06:14 -03:00
Debanjum Singh Solanky	e7b89f7fd0	Return compiled entry in additional details of /api/search response This can be used to highlight portion of raw entry to highlight and for passing to summarizer to stay with max_tokens limit supported by GPT models	2023-01-16 22:56:06 -03:00
Debanjum Singh Solanky	7071d081e9	Increase max_tokens returned by GPT summarizer. Remove default params	2023-01-16 22:55:36 -03:00
Debanjum Singh Solanky	3d9cdadbbb	Add codebase visualization of Khoj Obsidian to Khoj Obsidian Readme	2023-01-15 14:09:21 -03:00
Debanjum Singh Solanky	d02ba325aa	Handle empty chat history returned by API to chat.html on web interface	2023-01-15 13:51:16 -03:00
Debanjum	3f2ea039a7	Add Chat page to the Khoj Web Interface ### Overview - Provide a chat interface to engage with and inquire your notes - Simplify interacting with the beta `chat` and `summarize` APIs ### Use - Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize - Type your queries, see summarized response by Khoj from your notes Note: - You will need to add an API key from OpenAI to your khoj.yml - Your query and top note from search result will be sent to OpenAI for processing ## Details - `177756b` Show chat history on loading chat page on web interface - `d8ee0f0` Save chat history to disk for persistence, seeing chat logs - `5294693` Style chat messages as speech bubbles - `d170747` Add khoj web interface and chat styling to new chat page on khoj web - `de6c146` Implement functional, unstyled chat page for khoj web interface	2023-01-13 23:02:19 -03:00
Debanjum Singh Solanky	16d4560ff8	Comment css styling of chat page for later reference	2023-01-13 22:40:01 -03:00
Debanjum Singh Solanky	cfef346d03	Do not update query field to ever chat message It doesn't work as well with chat, unlike for search page Use more appropriate thinking face emoji for you instead of surprise face	2023-01-13 22:24:26 -03:00
Debanjum Singh Solanky	177756be7e	Fetch chat history from backend and render it on chat page load	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	330febaa1a	Update conversation logs from /beta/summary API endpoint too	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	cb6f0b53c9	Make user_message_metadata arg to message_to_log in gpt.py optional - Use a default user_message_metadata if arg not set - Update conversation to use `by' as `you' and `khoj'	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	cc2456e411	Update /beta/chat API to return chat history if no query param passed	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	d8ee0f0e9a	Use scheduler to save chat history to disk every 5 minutes - The previous mechanism to trigger saving on shutdown event did not work - Use scheduler to persist chat sessions to disk at a 5 minute interval - This improve time granularity, fixed interval of saving chat logs - It may lose ~5 minutes of chat history until mechanism to also write on shutdown found/resolved - Create conversation directory if it doesn't exist before attempting write - Reset chat_session after writing it to disk	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	5294693e97	Style message as speech bubbles on chat page of web interface - Wrap messages into speech bubbles - Color messages by khoj blue, sender grey - Add those standard protrusions to the speech bubbles for fun - Align bubbles left or right based on sender - messages by khoj are left aligned, message by self are right aligned - Put message metadata like sender and time under speech bubble - use data-* attribute and ::after css pseudo-selector for this - Update renderMessage func to accept time param, remove unused type_ param	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	7723d656dc	Do not force GPT to summarize note using past tense Not all notes are in the past. Notes can be about stuff in the future. Casting them to past tense gives the impression that they've already happened / been done.	2023-01-13 13:10:35 -03:00
Debanjum Singh Solanky	2842e3a035	Automatically scroll to bottom of chat body on new messages	2023-01-13 13:09:51 -03:00
Debanjum Singh Solanky	34014635d0	Improve colors, fix contrast for accessability on web interface - Changes - Use blue color for khoj heading font - This fixes the title color issue - Update background to lighter shade - This fixes the body text color issue - Update colors for todo, done, miscellaneous todo state, tag color - This does not fix the color contrast issue but seems like an acceptable solution - Using white text rather than black text on blue background better even though the black text on blue background passes the WCAG acceptable contrast score - For details see blog post: https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/ - Add border to tags to give them tag pills look and differntiate from todo states - Buttons and inputs - Change background color of input fields like type dropdown, update button and results count counter, to match background color of page - Add shadow on hover over button, dropdowns Resolves #111	2023-01-12 21:59:50 -03:00
Debanjum Singh Solanky	d170747ec2	Add khoj web interface & chat styling to new chat page on khoj web - Ensure message input box sticks to bottom of screen - Ensure chat logs div is scrollable when logs become longer than screen Do not make the whole page scroll, just the chat logs body div	2023-01-12 21:58:46 -03:00
Debanjum Singh Solanky	de6c146290	Implement functional, unstyled chat page for khoj web interface Expose it at /chat URL	2023-01-12 21:53:25 -03:00
Debanjum Singh Solanky	e6793816f9	Upgrade Khoj.el Readme. Add TOC, Screenshot, Features Sections - Update Query filter details	2023-01-12 02:14:02 -03:00
Debanjum Singh Solanky	26f791e9ad	Update Obsidian Plugin Readme. Add Khoj icon to Khoj Modal Placeholder text - Fold Query Filter, Demo Description - Add Limitations to Readme - Add Update index bullet to Troubleshooting Options	2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky	3e63af5c94	Constrain grid rows to fix layout of Khoj web interface on Chrome	2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky	50c797962c	Jump to Search Result from Khoj Modal even on Obsidian Android Uses longest file path match to find markdown file in vault corresponding to file of search result returned by Khoj Allow jumping to search result from khoj plugin modal on Android too	2023-01-11 19:44:11 -03:00
Debanjum Singh Solanky	51ea6d9c9b	Do not force index update when configure backend on plugin load - Backend can handle incremental updates - Avoid khoj usability delay by avoiding recomputed everytime vault opened	2023-01-11 17:17:08 -03:00
Debanjum Singh Solanky	5996d47d7c	Trigger input event to Get, Render Reranked results from Khoj backend Previous mechanism of manually triggering getSuggestions, renderSuggestions flow was corrupting traversing and opening reranked search results in KhojModal Emulate event that would anyway trigger the get & render of results in modal. This lets obsidian core handle the flow without digging too deep into obsidian cores handling of the flow. Lowers the chance of breakage	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	1c813a6884	Convert results count setting to slider in plugin settings pane	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	4e1abd1b72	Disable update button while indexing vault in plugin settings	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	513c86c6a1	Set index file paths relative to current or default path on khoj backend We need the index file paths to make sense on the khoj backend server Having path of index on backend relative to current vault directory on frontend ignores the fact that the frontend maybe on a different machine than the khoj backend server Using unique index name per vault allows switching vaults without overwriting indices of other vaults created on khoj backend when khoj obsidian plugin is loaded on opening a different vault	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	4407e23c19	Only index current vault on Khoj. Remove plugin setting to configure it - Overview Limits using Khoj with a single vault at a time. This is automatically configured to the most recently opened vault. Once directory filters are supported on backend, the plugin will be updated to index multiple vault but search only current vault from current vaults khoj obsidian plugin - Code Details - Remove setting to configure Vault directory from Khoj Obsidian plugin - Automatically configure Khoj to index only current Vault. - Overwrites any previous vaults that were intended to be indexed by Khoj backend - Force update of index after configuring vault - Why It's not helpful for now and can lead to more problems, confusion. Once directory filters	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	86a1e43605	Return HTTP Exception on /api/update API call failure - Previously the backend was just throwing backend error. The frontend calling the /update API wasn't getting notified - Now the frontend can react appropriately and make the issue visible to the user	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	5af2b68e2b	Update plugin notifications for errors and success - Only show notification on plugin load and failure. - In settings page, set current backend status at top of pane instead of showing notification Notices bubbles cluttered the UI while typing updates to settings - Show notification once index updated via settings pane button click There was no notification on index updated, which usually takes time on the backend	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	853192932a	setCTA on Khoj Obsidian plugin button. Minor cleanup of space, tabs	2023-01-10 23:36:02 -03:00
Debanjum Singh Solanky	da49ea272c	Add placeholder text to modal in Khoj Obsidian plugin	2023-01-10 22:50:11 -03:00
Debanjum Singh Solanky	580f4aca23	Add hints to Modal for available Keybindings	2023-01-10 22:03:47 -03:00
Debanjum Singh Solanky	b52cd85c76	Allow Reranking Results using Keybinding from Khoj Search Modal	2023-01-10 21:59:38 -03:00
Debanjum Singh Solanky	7991ab7a86	Add button in Obsidian plugin settings to force re-indexing your vault	2023-01-10 19:49:12 -03:00
Debanjum Singh Solanky	f046a95f3d	Track connectedToBackend as a setting. Use it across obsidian plugin - Display warning at top of khoj obsidian plugin settings - Make search command available only if connected to backend - Show warning notice on clicking khoj search ribbon button - Call saveData after configureKhojBackend to ensure connnectedToBackend setting saved after being (potentially) updated in configureKhojBackend function	2023-01-10 17:28:47 -03:00
Debanjum Singh Solanky	768e874185	Load obsidian plugin even if fail to connect to backend but show warning - Previously the plugin would not load if cannot connect to Khoj backend - Silently failing to load with no reason provided is not helpful - Load plugin to allow user to fix the Khoj URL in their plugin setting - Show reason for khoj plugin not working. More helpful than failing silently	2023-01-10 17:20:02 -03:00
Debanjum Singh Solanky	aa22d83172	Create and use a context manager to time code Use the timer context manager in all places where code was being timed - Benefits - Deduplicate timing code scattered across codebase. - Provides single place to manage perf timing code - Use consistent timing log patterns	2023-01-09 19:48:16 -03:00
Debanjum Singh Solanky	93f39dbd43	Add typing to text_search. Reformat code to set existing_embedding	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	db7483329c	Only import type hint packages for type checking. Avoids circular imports Use annotations from the __future__ package to avoid having to quote type hints. This import will not be required after Python 3.11	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	e5254a8e56	Create BaseEncoder class. Make OpenAI encoder its child. Use for typing - Set type of all bi_encoders to BaseEncoder - Make load_model return type Union of CrossEncoder and BaseEncoder	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	cf7400759b	Remove unused render_results method from text and image search It's a relic from when khoj was being used as a python module	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	afcfc3cd62	Split text_search.query logic into separate methods for modularity The query method had become too big. Extract out filter, score, sort and deduplicate logic used by text_search.query into separate methods. This should improve readabilty of code.	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	8dc6ee8b6c	Pass `model' arg to extract_search_type method from beta search API Issue caught by mypy	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	8498903641	Fix, add typing to Filter and TextSearchModel classes - Changes - Fix method signatures of BaseFilter subclasses. Else typing information isn't translating to them - Explicitly pass `entries: list[Entry]' as arg to `load' method - Fix type of `raw_entries' arg to `apply' method to list[Entry] from list[str] - Rename `raw_entries' arg to `apply' method to `entries' - Fix `raw_query' arg used in `apply' method of subclasses to `query' - Set type of entries, corpus_embeddings in TextSearchModel - Verification Ran `mypy --config-file .mypy.ini src' to verify typing	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	eace7c6215	Use torch.tensor as torch.Tensor cannot create tensor on MPS device - `torch.Tensor' is apparently a legacy tensor constructor - Using that to create tensor on MPS devices throws error: RuntimeError: legacy constructor expects device type: cpu but device type: mps was passed - `torch.tensor' can handle creating tensors on Mac GPU (MPS) fine	2023-01-09 19:47:19 -03:00
Debanjum Singh Solanky	9def3f8c6f	Add exception handling to beta APIs, in case OpenAI API call fails	2023-01-09 01:27:06 -03:00
Debanjum Singh Solanky	7b164de021	Add beta API to summarize top search result using an OpenAI model This is unlike the more general chat API that combines summarization of top search result and conversing with the OpenAI model This should give faster summary results. As no intent categorization API call required	2023-01-09 01:25:59 -03:00
Debanjum Singh Solanky	d36da46f7b	Truncate prompt to not exceed OpenAI prompt limit Truncate prompt containing the top retrieved entry to 500 words to avoid triggering the max_token limit error	2023-01-09 00:51:46 -03:00
Debanjum Singh Solanky	237123d18c	Fix tests for the conversation processor - Use latest davinci model for tests - Wrap prompt in triple quotes to improve legibilty - `understand' method returns dictionary instead of string. Fix its test - Fix prompt for new model to pass `chat_with_history' test	2023-01-09 00:22:26 -03:00
Debanjum Singh Solanky	918af5e6f8	Make OpenAI conversation model configurable via khoj.yml - Default to using `text-davinci-003' if conversation model not explicitly configured by user. Stop using the older `davinci' and `davinci-instruct' models - Use `model' instead of `engine' as parameter. Usage of `engine' parameter in OpenAI API is deprecated	2023-01-09 00:17:51 -03:00
Debanjum Singh Solanky	74e779f8d0	Fix /beta/chat API to use Entry class instead of old dictionary pattern Search returns response of type SearchResponse instead of a dict now	2023-01-08 15:28:26 -03:00
Debanjum Singh Solanky	f2436039a0	Improve readability of GPT prompt strings in conversation processor	2023-01-08 15:27:41 -03:00
Debanjum Singh Solanky	6119005838	Improve comments, exceptions, typing and init of OpenAI model code	2023-01-08 00:36:18 -03:00
Debanjum Singh Solanky	c0ae8eee99	Allow using OpenAI models for search in Khoj - Init processor before search to instantiate `openai_api_key' from `khoj.yml'. The key is used to configure search with openai models - To use OpenAI models for search in Khoj - Set `encoder' to name of an OpenAI model. E.g text-embedding-ada-002 - Set `encoder-type' in `khoj.yml' to `src.utils.models.OpenAI' - Set `model-directory' to `null', as online model cannot be stored on disk	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	826f9dc054	Drop long words from compiled entries to be within max token limit of models Long words (>500 characters) provide less useful context to models. Dropping very long words allow models to create better embeddings by passing more of the useful context from the entry to the model	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	6a30a13326	Only create model directory if the optional field is set in SearchConfig	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	2fe37a090f	Make type of encoder to use for embeddings configurable via khoj.yml - Previously `model_type' was set in the setup of each `search_type' - All encoders were of type `SentenceTransformer' - All cross_encoders were of type `CrossEncoder' - Now `encoder-type' can be configured via the new `encoder_type' field in `TextSearchConfig' under `search-type` in `khoj.yml`. - All the specified `encoder-type' class needs is an `encode' method that takes entries and returns embedding vectors	2023-01-07 23:09:12 -03:00
Debanjum Singh Solanky	d55d7d53dc	Fix GPU usage by Khoj on Macs to speed up search and indexing - Ensure all tensors are on MPS device before doing operations across them - Background - GPU is used by default for Khoj on MacOS now - Needed PyTorch > 1.13.0 on Macs to use GPU, which we do now - MPS should speed up search and indexing on MacOS	2023-01-05 15:39:09 -03:00
Debanjum	abd035e2fa	Merge PR #112 to fix quote usage in khoj.el docstring from suliveevil/master Fix usage warning for unescaped single quote in `khoj.el' docstring. Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs	2023-01-05 13:24:11 -03:00
Debanjum Singh Solanky	e792523849	Bump version in metadata packages for khoj, khoj.el and obsidian plugin	2023-01-05 12:50:27 -03:00
suliveevil	b2812b409f	fix docstring usage warning ⛔ Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)	2023-01-05 16:47:38 +08:00
Debanjum Singh Solanky	47015ee6cc	Fold Demo video descriptions, analysis by default in main Readme	2023-01-04 20:13:43 -03:00
Debanjum Singh Solanky	da17ff6ac8	Add Upgrade instructions for Khoj.el Readme. Fix version of khoj.el	2023-01-04 20:06:39 -03:00
Debanjum Singh Solanky	66ccd0c970	Create Obsidian plugin for Khoj - Features - Search using Khoj from within the Obsidian app Allow Natural language search on your (markdown) notes in Obsidian Vault - Show search results as rendered (instead of raw) Markdown Improve legibility of the results - Jump to selected note from search result in Khoj search modal Simplify seeing result within its original note context - Automatically configure khoj to index markdown files in current vault Reduce khoj setup steps for plugin users by using reasonable defaults - Code updates the markdown config in khoj.yml and triggers index update - It can be configured by user in khoj plugin settings, if required - Add Demo and detailed Readme for the Obsidian plugin Ease setup and usage. Give context about capabilities - Miscellaneous - Trying keep a mono repo until the Khoj project is mature enough to reduce maintainance burden	2023-01-04 18:28:16 -03:00
Debanjum Singh Solanky	feddb6ce62	Add start_url to khoj webmanifest to show Khoj as PWA on Chrome	2023-01-04 13:37:56 -03:00
Debanjum Singh Solanky	3dee1aed9e	Create /config/data/default API endpoint to serve default khoj config This can ease configuring khoj from the different interfaces - Don't need to know all the (default) config used by khoj. - Just get default config by calling the above API endpoint. - Then modify desired portions and call POST /api/config/data to configure khoj.	2023-01-03 21:52:34 -03:00
Debanjum Singh Solanky	ce945f7a90	Configure processors too on calling /update API - Previously only search was being reconfigured - But Processors are configured on app start too - Match that behavior on calling /update API	2023-01-03 21:51:02 -03:00
Debanjum Singh Solanky	9d31988f42	Allow starting khoj in non-GUI mode without config file instantiated - Start khoj server (in non-GUI mode) without needing config file already instantiated. - But throw warning to configure khoj to use it - This allows plugins to configure the app via the /config/data APIs - To be used by the Khoj obsidian plugin to configure markdown content in khoj	2023-01-03 21:36:59 -03:00
Debanjum Singh Solanky	52664dd96c	Allow recursive glob pattern (**) to add files to search index - Simplify configuring files to index For Obsidian/Org-Roam type systems with lots of small files in khoj.yml using `input-filter'	2023-01-03 01:32:58 -03:00
Debanjum Singh Solanky	152e5f1661	Return the file of each search result in response - Useful for enabling jump to note functionality in interfaces - It will be used in the Khoj plugin for Obsidian	2023-01-03 01:25:34 -03:00
Debanjum Singh Solanky	c535953915	Update index automatically in non GUI mode too - Poll scheduler every minute using threading.Timer - Use 60 seconds polling interval to avoid fork bombing - Schedule next via the same poll scheduler - Allow clean program interrupt by running scheduler in daemon mode	2023-01-01 21:03:19 -03:00
Debanjum Singh Solanky	701d92e17b	Lock the index before updating it via API or Scheduler - There are 3 paths to updating/setting the index (stored in state.model) - App start - API - Scheduler - Put all updates to the index behind a lock. As multiple updates path that could (potentially) run at the same time (via API or Scheduler)	2023-01-01 17:09:36 -03:00
Debanjum Singh Solanky	3b0783aab9	Automate updating embeddings, search index on a hourly schedule - Use the schedule pypi package - Use QTimer to poll schedule.run_pending() regularly for jobs to run	2023-01-01 17:09:36 -03:00
Debanjum	06c25682c9	Split text entries by max tokens supported by ML models ### Background There is a limit to the maximum input tokens (words) that an ML model can encode into an embedding vector. For the models used for text search in khoj, a max token size of 256 words is appropriate [1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1#:~:text=model%20was%20just%20trained%20on%20input%20text%20up%20to%20250%20word%20pieces),[2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#:~:text=input%20text%20longer%20than%20256%20word%20pieces%20is%20truncated) ### Issue Until now entries exceeding max token size would silently get truncated during embedding generation. So the truncated portion of the entries would be ignored when matching queries with entries This would degrade the quality of the results ### Fix - `e057c8e` Add method to split entries by specified max tokens limit - Split entries by max tokens while converting [Org](https://github.com/debanjum/khoj/commit/c79919b), [Markdown](https://github.com/debanjum/khoj/commit/f209e30) and [Beancount](https://github.com/debanjum/khoj/commit/17fa123) entries to JSONL - `b283650` Deduplicate results for user query by raw text before returning results ### Results - The quality of the search results should improve - Relevant, long entries should show up in results more often	2022-12-26 18:23:43 +00:00
Debanjum Singh Solanky	17fa123b4e	Split entries by max tokens while converting Beancount entries To JSONL	2022-12-26 15:14:32 -03:00
Debanjum Singh Solanky	f209e30a3b	Split entries by max tokens while converting Markdown entries To JSONL	2022-12-26 13:14:15 -03:00
Debanjum Singh Solanky	24676f95d8	Fix comments, use minimal test case, regenerate test index, merge debug logs - Remove property drawer from test entry for max_words splitting test - Property drawer is not required for the test - Keep minimal test case to reduce chance for confusion	2022-12-25 22:33:04 -03:00
Debanjum Singh Solanky	b283650991	Deduplicate results for user query by raw text before returning results - Required because entries are now split by the max_word count supported by the ML models - This would now result in potentially duplicate hits, entries being returned to user - Do deduplication after ranking to get the top ranked deduplicated results	2022-12-25 21:36:15 -03:00
Debanjum Singh Solanky	53cd2e5605	Regenerate initial model in asymmetric reload test to reduce flakyness - Fix logger message when converting org node to entries - Remove unused import from conftest	2022-12-25 21:36:15 -03:00
Debanjum Singh Solanky	c79919bd68	Split entries by max tokens while converting Org entries To JSONL - Test usage the entry splitting by max tokens in text search	2022-12-25 21:36:00 -03:00
Debanjum Singh Solanky	08dc5e3324	Update instructions in khoj.el to install it from MELPA stable - The instructions suggest installing khoj-assistant via pip install. This installs the latest tagged/release version of khoj - To match that version user should install khoj.el from MELPA stable instead of MELPA	2022-12-23 19:08:38 -03:00
Debanjum Singh Solanky	e057c8e208	Add method to split entries by specified max tokens limit - Issue ML Models truncate entries exceeding some max token limit. This lowers the quality of search results - Fix Split entries by max tokens before indexing. This should improve searching for content in longer entries. - Miscellaneous - Test method to split entries by max tokens	2022-12-23 16:24:04 -03:00
Debanjum Singh Solanky	d3e175370f	Update readme to install khoj.el from MELPA stable unless using pre-release khoj Update readme to ask user to install khoj.el from MELPA when a pre-release version of the main khoj app is installed. Else install khoj.el from MELPA Stable	2022-12-20 23:29:22 -03:00
Debanjum Singh Solanky	cd463c5085	Update Khoj.el Install Instructions on Emacs	2022-12-20 11:06:33 -03:00
Debanjum Singh Solanky	23ca5a2d43	Improve (un-)quoting of funcs used in `khoj--get-enabled-content-types' - Based on melpa package feedback for khoj.el - Verified these changes don't affect behavior of the function	2022-12-19 18:02:23 -03:00
Debanjum Singh Solanky	5db3a67df5	Fix Khoj Emacs package URL in khoj.el	2022-12-14 22:49:19 -03:00
Debanjum Singh Solanky	abad6d5f44	Declare external khoj.el funcs. Remove undefined func warnings on install	2022-12-14 22:36:04 -03:00
Debanjum Singh Solanky	c52383b11c	Delete stale, unused installation helper script	2022-12-03 13:36:47 -03:00
Debanjum Singh Solanky	1990d09032	Bump khoj version in setup.py, khoj.el to 0.2.0	2022-12-02 14:58:54 -03:00
Debanjum Singh Solanky	a9cfd8b800	Extract hash func for incremental text indexing into separate method	2022-10-26 13:56:58 +05:30
Debanjum Singh Solanky	0de2ff9c97	Add __init__.py to routers directory to register it as a package	2022-10-25 20:40:40 +05:30
Debanjum Singh Solanky	55d2fea9be	Move Custom Formatter class for logger to util.helper module from main.py	2022-10-20 00:32:24 +05:30
Debanjum Singh Solanky	1c40f97114	Merge branch 'master' of github.com:debanjum/khoj into modularize-api-and-increase-typing - Conflicts: - src/interface/emacs/khoj.el Use our update to `config-url', use their `url-request-method'	2022-10-19 16:46:53 +05:30
Debanjum Singh Solanky	e1b5a87920	Rename Frontend Router to Web Client. Fix logger usage in routers - Use logger in api_beta router instead of print statements - Remove unused logger in web client router	2022-10-19 16:36:48 +05:30
Debanjum	4abd51cb04	Merge pull request #99 from telotortium/method Explicitly set `url-request-method' to GET in khoj.el	2022-10-19 10:31:37 +00:00
Debanjum Singh Solanky	c467df8fa3	Setup `mypy' for static type checking	2022-10-08 17:33:13 +03:00
Debanjum Singh Solanky	d292bdcc11	Do not version API. Premature given current state of the codebase - Reason - All clients that currently consume the API are part of Khoj - Any breaking API changes will be fixed in clients immediately - So decoupling client from API is not required - This removes the burden of maintaining muliple versions of the API	2022-10-08 16:32:46 +03:00
Debanjum Singh Solanky	7e9298f315	Use new Text Entry class to track text entries in Intermediate Format - Context - The app maintains all text content in a standard, intermediate format - The intermediate format was loaded, passed around as a dictionary for easier, faster updates to the intermediate format schema initially - The intermediate format is reasonably stable now, given it's usage by all 3 text content types currently implemented - Changes - Concretize text entries into `Entries' class instead of using dictionaries - Code is updated to load, pass around entries as `Entries' objects instead of as dictionaries - `text_search' and `text_to_jsonl' methods are annotated with type hints for the new `Entries' type - Code and Tests referencing entries are updated to use class style access patterns instead of the previous dictionary access patterns - Move `mark_entries_for_update' method into `TextToJsonl' base class - This is a more natural location for the method as it is only (to be) used by `text_to_jsonl' classes - Avoid circular reference issues on importing `Entries' class	2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky	99754970ab	Type the /search API response to better document the response schema - Both Text, Image Search were already giving list of entry, score - This change just concretizes this change and exposes this in the API documentation (i.e OpenAPI, Swagger, Redocs)	2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky	0521ea10d6	Put image score breakdown under `additional' field in search response - Update web, emacs interfaces to consume the scores from new schema	2022-10-08 12:06:01 +03:00
Debanjum Singh Solanky	e42a38e825	Version Khoj API, Update frontends, tests and docs to reflect it - Split router.py into v1.0, beta and frontend (no-prefix) api modules under new router package. Version tag in main.py via prefix - Update frontends to use the versioned api endpoints - Update tests to work with versioned api endpoints - Update docs to mentioned, reference only versioned api endpoints	2022-09-28 20:08:38 +03:00
Robert Irelan	d25e1d8e86	fix: explicitly set url-request-method In my installation, it appears that `url-request-method` is sometimes set globally to POST. Need to explicitly set it to ensure that GET is always used as intended.	2022-09-19 15:46:46 -04:00
Debanjum Singh Solanky	ee65a4f2c7	Merge /reload, /regenerate into single /update API endpoint - Pass force=true to /update API to force regenerating index from scratch - Otherwise calls to the /update API endpoint will result in an incremental update to index	2022-09-16 00:53:19 +03:00
Debanjum Singh Solanky	02d944030f	Use Base TextToJsonl class to standardize <text>_to_jsonl processors - Start standardizing implementation of the `text_to_jsonl' processors - `text_to_jsonl; scripts already had a shared structure - This change starts to codify that implicit structure - Benefits - Ease adding more `text_to_jsonl; processors - Allow merging shared functionality - Help with type hinting - Drawbacks - Lower agility to change. But this was already an implicit issue as the text_to_jsonl processors got more deeply wired into the app	2022-09-16 00:53:11 +03:00
Debanjum Singh Solanky	c16ae9e344	Ignore "Legacy way to download model" warning for upstream dependency	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	3169e3b78e	Use ellipsis instead of pass in base filter abstract methods for aesthetic	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	bf1ae038cb	Get XMP metadata from image using Pillow. Remove ExifTool dependency - Pillow already supports reading XMP metadata from Images - Removes need to maintain my fork of unmaintained PyExiftool - This also removes dependency on system Exiftool package for XMP metadata extraction - Add test to verify XMP metadata extracted from test images - Remove references to Exiftool from Documentation	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	8f57a62675	Remove unused imports. Fix typing and indentation - Typing issues discovered using `mypy'. Fixed manually - Unused imports discovered and fixed using `autoflake' - Fix indentation in `org_to_jsonl' manually	2022-09-14 04:56:52 +03:00
Debanjum Singh Solanky	be57c711fd	Revert OrgNode.hasTag func to method instead of property as accepts argument	2022-09-14 04:56:48 +03:00
Debanjum Singh Solanky	0109c7bd91	Disable ability to call <text>_to_jsonl, <type>_search packages directly - This code is de-synced with expected args by above scripts - Better to remove unused capabilitity that needlessly increases maintainance burden	2022-09-14 04:56:48 +03:00
Debanjum Singh Solanky	1680a617da	Reflect updates to query and results count in URL - Simplify tracking khoj query history, saving/sharing links - Do not execute search, when query only contains whitespaces - Prevents error when try process results of empty query	2022-09-13 23:39:24 +03:00
Debanjum Singh Solanky	34314e859a	Call /reload instead of /regenerate API to update index from web interface - As `/reload` updates index incrementally, it's relatively quick - This makes exposing `/reload` endpoint a better default to expose via the web interface than `the /regenerate' endpoint	2022-09-12 23:39:10 +03:00
Debanjum Singh Solanky	13b5d5082f	Create input field to set results count on the web interface Resolves #96	2022-09-12 23:24:46 +03:00
Debanjum Singh Solanky	1bfe9c4ef2	Handle filter only queries. Short-circuit and return filtered results - For queries with only filters in them short-circuit and return filtered results. No need to run semantic search, re-ranking. - Add client test for filter only query and quote query in client tests	2022-09-12 17:13:05 +03:00
Debanjum Singh Solanky	afc84de234	Make word filter regex explicit. Allow hyphen in word filters Helps with #88	2022-09-12 17:05:29 +03:00
Debanjum Singh Solanky	536f03af8f	Process text content files in sorted order for stable indexing - Image search already uses a sorted list of images to process - Prevents index of entries to desync when entries, embeddings generated by a separate server/app instance	2022-09-12 11:09:40 +03:00
Debanjum Singh Solanky	a701ad08b9	Support multiple input-filters to configure content to index via khoj.yml - Update existings code, tests to process input-filters as list instead of str - Test `text_to_jsonl' get files methods to work with combination of `input-files' and `input-filters' Resolves #84	2022-09-12 11:08:59 +03:00
Debanjum Singh Solanky	940c8fac8c	Use app LRU, not functools LRU decorator, to cache search results in router - Provides more control to invalidate cache on update to entries, embeddings - Allows logging when results are being returned from cache etc - FastAPI, Swagger API docs look better as the `search' controller not wrapped in generically named function when using functools LRU decorator	2022-09-12 09:38:48 +03:00
Debanjum Singh Solanky	c6fa09d8fc	Fix querying with include word filter from web interface - Not encoding the `query' string before querying the backend API with it was causing the "+" prefix for include word filter to be lost	2022-09-12 09:27:02 +03:00
Debanjum Singh Solanky	1502fbc9e9	Add index_heading_entries flag to default and sample khoj configs	2022-09-11 17:33:37 +03:00
Debanjum Singh Solanky	7216cdff58	Add Date, Word filter for Org-Music content	2022-09-11 17:29:34 +03:00
Debanjum Singh Solanky	9d369ae4df	Fix OrgNode render of entries with property drawers and empty body - Issue - Indent regex was previously catching escape sequences like newlines - This was resulting in entries with only escape sequences in body to be prepended to property drawers etc during rendering - Fix - Update indent regex to only look for spaces in each line - Only render body when body contains non-escape characters - Create test to prevent this regression from silently resurfacing	2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky	253c9eae9a	Set index_heading_entries field in config to index entries with no body - Previously heading entries were not indexed to maintain search quality - But given that there are use-cases for indexing entries with no body - Add a configurable `index_heading_entries' field to index heading entries - This `TextContentConfig' field is currently only used for OrgMode content	2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky	1d3b3d5f39	Convert field get/set methods in OrgNode class to @property - Use more descriptive variable names in OrgNode parser and class - Convert OrgNode fields to private/protected, use property methods to get/set them	2022-09-11 14:59:28 +03:00
Debanjum Singh Solanky	db37e38df7	Create OrgNode hasBody method. Use it in org_to_jsonl checks	2022-09-11 12:50:03 +03:00
Debanjum Singh Solanky	b4878d76ea	Extract entries from scratch when regenerate requested - Do not rely on previously extracted entries to find new entries in regenerate scenario	2022-09-11 12:50:03 +03:00
Debanjum Singh Solanky	52e3dd9835	Pass the whole TextContentConfig as argument to text_to_jsonl methods - Let the specific text_to_jsonl method decide which of the TextContentConfig fields it needs to convert <text> type to jsonl - This simplifies extending TextContentConfig for a specific type without modifying all text_to_jsonl methods - It keeps the number of args being passed to the `text_to_jsonl' methods in check	2022-09-11 12:49:56 +03:00
Debanjum Singh Solanky	e951ba37ad	Raise exception when org file not found - No need to catch the IOError in OrgNode	2022-09-11 01:09:24 +03:00
Debanjum Singh Solanky	2e1bbe0cac	Fix striping empty escape sequences from strings - Fix log message on jsonl write	2022-09-10 23:57:05 +03:00
Debanjum Singh Solanky	a7cf6c8458	Use dictionary instead of list to track entry to file maps	2022-09-10 23:08:30 +03:00
Debanjum Singh Solanky	3e1323971b	Stack function calls in jsonl converters to avoid unneeded variables	2022-09-10 22:56:06 +03:00
Debanjum Singh Solanky	4eb84c7f51	Log performance metrics for beancount, markdown to jsonl conversion	2022-09-10 22:47:54 +03:00
Debanjum Singh Solanky	ebd5039bd1	Merge branch 'master' into support-incremental-updates-of-embeddings	2022-09-10 22:37:13 +03:00
Debanjum Singh Solanky	030fab9bb2	Support incremental update of Markdown entries, embeddings	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	91aac83c6a	Support incremental update of Beancount transactions, embeddings	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	b01b4d7daa	Extract logic to mark entries for embeddings update into helper function - This could be re-used by other text_to_jsonl converters like markdown, beancount	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	f97308bef2	Fix log message on writing JSONL data to file	2022-09-10 21:40:08 +03:00
Debanjum Singh Solanky	c17a0fd05b	Do not store word filters index to file. Not necessary for now - It's more of a hassle to not let word filter go stale on entry updates - Generating index on 120K lines of notes takes 1s. Loading from file takes 0.2s. For less content load time difference will be even smaller - Let go of startup time improvement for simplicity for now	2022-09-10 21:01:54 +03:00
Debanjum Singh Solanky	91d11ccb49	Only hash compiled entry to identify new/updated entries to update - Comparing compiled entries is the appropriately narrow target to identify entries that need to encode their embedding vectors. Given we pass the compiled form of the entry to the model for encoding - Hashing the whole entry along with it's raw form was resulting in a bunch of entries being marked for updated as LINE: <entry_line_no> is a string added to each entries raw format. - This results in an update to a single entry resulting in all entries below it in the file being marked for update (as all their line numbers have changed) - Log performance metrics for steps to convert org entries to jsonl	2022-09-10 21:01:44 +03:00
Debanjum Singh Solanky	b9a6e80629	Make OrgNode tags stable sorted to find new entries for incremental updates - Having Tags as sets was returning them in a different order everytime - This resulted in spuriously identifying existing entries as new because their tags ordering changed - Converting tags to list fixes the issue and identifies updated new entries for incremental update correctly	2022-09-10 20:59:52 +03:00
Debanjum Singh Solanky	2f7a6af56a	Support incremental update of org-mode entries and embeddings - What - Hash the entries and compare to find new/updated entries - Reuse embeddings encoded for existing entries - Only encode embeddings for updated or new entries - Merge the existing and new entries and embeddings to get the updated entries, embeddings - Why - Given most note text entries are expected to be unchanged across time. Reusing their earlier encoded embeddings should significantly speed up embeddings updates - Previously we were regenerating embeddings for all entries, even if they had existed in previous runs	2022-09-10 20:58:33 +03:00
Debanjum Singh Solanky	ec675d27d3	Suppress non-actionable HuggingFace FutureWarning shown on app start	2022-09-10 16:43:14 +03:00
Debanjum Singh Solanky	1ac6a71ff0	Add --version flag to show installed version of khoj	2022-09-10 16:40:19 +03:00
Debanjum Singh Solanky	976397bd82	Ignore empty #+TITLE, merge multiple #+TITLE for 0th level headings	2022-09-10 15:34:47 +03:00
Debanjum Singh Solanky	11917c6ddd	Do not normalize absolute filenames for creating links in OrgNode	2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky	07b98d35f1	Use filename or #+TITLE as heading for 0th level content in org files - Set LINE, SOURCE link properties in property drawer correctly for content which falls under no heading - See Issue #83 for more details	2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky	d6bd7bf3e1	Fix initializing OrgNode level to string to parse org files - Parsed `level` argument passed to OrgNode during init is expected to be a string, not an integer - This was resulting in app failure only when parsing org files with no headings, like in issue #83, as level is set to string of `*`s the moment a heading is found in the current file	2022-09-10 14:21:08 +03:00
Debanjum Singh Solanky	d835467f2c	Throw exception if no valid entries found in specified content files - Previously we were failing if no valid entries while computing embeddings. This was obscuring the actual issue of no valid entries found in the specified content files - Throwing an exception early with clear message when no entries found should make clarify the issue to be fixed - See issue #83 for details	2022-09-10 14:20:10 +03:00
Debanjum Singh Solanky	e00bb53336	Init word filter dictionary with default value as set to simplify code	2022-09-10 12:19:09 +03:00
Debanjum Singh Solanky	4d776d9c7a	Bump khoj version to 0.1.9	2022-09-09 07:50:15 +03:00
Debanjum Singh Solanky	588f598949	Pass empty list of `input_files' to FileBrowser on first run - Default config has `input_files' set to None - This was being passed to `FileBrowser' on Initialization - But `FileBrowser' expects `content_files' of list type, not None - This resulted in an unexpected NoneType failure	2022-09-09 07:26:40 +03:00
Debanjum Singh Solanky	3ddffdfba4	Create config directory before setting up logging to file under it - The logging to file code expects the config directory to already be setup - But parent directory of config file was being set up later in code - This resulted in app start failing with ~/.khoj dir does not exist error	2022-09-09 07:21:42 +03:00
Debanjum Singh Solanky	762607fc9f	Log processed entries by org_to_jsonl only if verbosity > 2 Output too verbose for even debug mode logging. So gated behind -vvv	2022-09-06 23:03:29 +03:00
Debanjum Singh Solanky	490157cafa	Setup File Filter for Markdown and Ledger content types - Pass file associated with entries in markdown, beancount to json converters - Add File, Word, Date Filters to Ledger, Markdown Types - Word, Date Filters were accidently removed from the above types yesterday - File Filter is the only filter that newly got added	2022-09-06 15:31:26 +03:00
Debanjum Singh Solanky	94cf3e97f3	Log app logs to file for posthoc debugging and performance analysis	2022-09-06 14:51:48 +03:00

... 3 4 5 6 7 ...

830 commits