sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-04 21:03:01 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	87975e589a	Fix passing auth token to Github API to increase rate limits by x85 - Previously wasn't prefixing "token" to PAT token in Auth header This resulted in the request being considered unauthenticated - Unauthenticated requests to Github API are limited to 60 requests/hour Authenticated requests to Github API are allowed 5000 requests/hour	2023-06-18 01:19:26 -07:00
Debanjum Singh Solanky	9c70af960c	Extract logic to get file content from Github into a separate method	2023-06-18 01:19:13 -07:00
Debanjum Singh Solanky	10d4c38ce9	Extract Wait for rate limit reset logic into a function for reuse	2023-06-18 01:06:46 -07:00
sabaimran	aad7f825e0	Remove music configuration	2023-06-17 21:23:56 -07:00
sabaimran	5f97afbfac	Ignore type checks from mypy in subindexed fields	2023-06-17 16:53:36 -07:00
sabaimran	c2d46de8bc	Add endpoint for regenerating directly from the config page and add music content-type	2023-06-17 15:47:33 -07:00
sabaimran	ded3100caf	Update the configuration page to make config management easier - Add a central configuration management page to make management of config details easier - Add relevant api endpoints both for client and server to update/request data as necessary - Attempt to update the favicon	2023-06-17 15:21:28 -07:00
Debanjum Singh Solanky	3f24e53b6e	Render URL as link in web interface if file param of result is a web link	2023-06-17 04:26:40 -07:00
Debanjum Singh Solanky	63ec84ad78	Store Github URL of Markdown files on Github in file jsonl param	2023-06-17 04:23:01 -07:00
Debanjum Singh Solanky	0c1c7583b5	Handle pagination, API rate limits. Get all commits from Github repo	2023-06-17 04:21:39 -07:00
Debanjum Singh Solanky	31d17d0b22	Index commits message from repository with the github plugin	2023-06-17 02:59:54 -07:00
Debanjum Singh Solanky	c29c141a7e	Use Github Rest API to index Markdown files in Github Repository The Llama_Hub Github plugin is fairly limited. The Github Rest API is well supported and can easily be extended to index commit messages, issues, discussions, PRs etc.	2023-06-17 02:16:13 -07:00
Saba	ac96f43b1b	Remove try-catch specific to Github plugin; consolidate GUI logic	2023-06-16 23:46:25 -07:00
Saba	019d3732de	Rename orgmode_search to org_search	2023-06-13 16:06:54 -07:00
Saba	08d79f5ba4	Unify types used in Github and other text-based configs. Fix typing issues	2023-06-13 15:52:36 -07:00
Saba	a6cd96a6a9	Add a Github plugin which can be used to read from a Github repository	2023-06-13 14:40:06 -07:00
Debanjum	c68cde4803	Log clients calling API endpoints on Khoj server - Make API endpoints on Khoj server accept `client` as request parameter - Khoj API endpoints: /chat, /search, /update - Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server - Khoj clients: Emacs, Obsidian and Web - Also log khoj server_version running to telemetry server	2023-06-09 18:36:49 +05:30
sabaimran	59fa48036f	Merge pull request #224 from debanjum/fix/message-exceeds-prompt-size Pass truncated message as string in ChatMessage when exceeding max prompt size	2023-06-08 17:32:53 -07:00
Debanjum Singh Solanky	139a3ba060	Update server to log new server version field to telemetry db	2023-06-08 14:14:21 +05:30
Saba	5d5ebcbf7c	Rename truncate messages method and update unit tests to simplify assertion logic	2023-06-06 23:25:43 -07:00
Saba	7119ed0849	Run pre-commit script	2023-06-05 19:29:23 -07:00
Saba	6212d7c2e8	Remove debug line	2023-06-05 19:00:25 -07:00
Saba	f65ff9815d	Move message truncation logic into a separate function. Add unit tests with factory boy.	2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky	eb6175e9b0	Update description field in webmanifest of Khoj, Khoj Chat PWA	2023-06-06 01:53:42 +05:30
Debanjum Singh Solanky	bb2363f324	Set client request param when calling khoj server APIs from Web	2023-06-06 00:05:00 +05:30
Debanjum Singh Solanky	caab55fbdd	Set client request param when calling khoj server APIs from Obsidian	2023-06-06 00:04:46 +05:30
Debanjum Singh Solanky	de2494154f	Set client request param when calling khoj server APIs from Emacs	2023-06-06 00:02:10 +05:30
Debanjum Singh Solanky	168c11cea7	Make server API endpoints accept client as query param - The chat, search and update API will accept client as request param. - This will allow logging the client from which these APIs was called.	2023-06-05 23:57:08 +05:30
Debanjum Singh Solanky	8617cf1389	Push telemetry to Posthog to grok Khoj usage	2023-06-05 22:47:49 +05:30
Debanjum Singh Solanky	d13db2e666	Make old telemetry server forward requests to new server	2023-06-05 13:06:45 +05:30
Saba	5f4223efb4	Increase timeout to OpenAI call	2023-06-04 20:49:47 -07:00
Saba	0e63a90377	Fix the mechanism to retrieve the message content	2023-06-04 20:25:37 -07:00
Saba	f0efe0177e	Pass truncated message as string in ChatMessage when exceeding max prompt size	2023-06-04 19:33:46 -07:00
Saba	068ee0ac5e	Swap elif with else, as usage of this method does not use openai_api_key	2023-06-04 02:25:08 -07:00
Saba	6508379d7b	Use api_key keyword argument to set the openai_api_key parameter for GPT	2023-06-04 00:57:00 -07:00
Debanjum Singh Solanky	7af8a56434	Remove filename from reference before rendering references in khoj.el Fixes bug where actual reference heading in next line jumping out of references footnote section	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	ec280067ef	Do not retrieve relevant notes when having a general chat with Khoj - This improves latency of @general chat by avoiding unnecessary compute - It also avoids passing references in API response when they haven't been used to generate the chat response. So interfaces don't have to add logic to not render them unnecessarily	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	90439a8db1	Update Khoj subtitle to AI personal assistant for your digital brain	2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky	e9ed7a19fd	Update search prompt to extract PDF search type. Fix extract_question prompt	2023-06-02 10:06:03 +05:30
Debanjum Singh Solanky	bbe3bf9733	Render PDF search results in Khoj Obsidian interface - Make plugin update khoj server config to index PDF files in vault too - Make Obsidian plugin update index for PDF files in vault too - Show PDF results in Khoj Search modal as well - Ensure combined results are sorted by score across both types - Jump to PDF file when select it PDF search result from modal	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	e3892945d4	Render PDF search results in Khoj.el Emacs interface	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	85144006a1	Render PDF search results in khoj web interface	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	acd14a5e41	Wire up PDF to jsonl processor to Khoj server layer (API, config) - Specify PDF content to index via khoj.yml - Index PDF content on app start, reconfigure - Expose PDF as a search type via API	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	286b500f66	Create PDF to JSONL processor using PyPDF and LangChain Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts throwing typing error for python 3.8, 3.9	2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky	1b3effd8e6	Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor	2023-06-01 09:13:31 +05:30
Debanjum Singh Solanky	1cd9ecd449	Truncate last message if still over max supported prompt size by model	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	ed4d0f9076	Simplify argument names used in khoj openai completion functions - Match argument names passed to khoj openai completion funcs with arguments passed to langchain calls to OpenAI - This simplifies the logic in the khoj openai completion funcs	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	703a7c89c0	Reduce retry count and request timeout for faster response or failure - Fix bug where both LangChain and Khoj retry requests 6 times each. So a total of 12 requests at >1minute intervals for each chat response in case of OpenAI API being down - Retrying too many times when the API is failing doesn't help - The earlier 60 second request timeout was spacing out the interval between retries way too much. This slowed down chat response times quite a bit when API was being flaky - With these updates you'll know if call to chat API failed in under a minute	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	18081b3bc6	Use LangChain to call GPT over API	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	277d2f5c96	Do not add "Notes:" suffix to chat messages when no notes retrieved This was causing spurious "Notes:" suffix being added to Khoj Chat in response	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	334be4e600	Use LangChain to call OpenAI for Khoj Chat - Use ChatModel and ChatOpenAI to call OpenAI chat model instead of using OpenAI package directly - This is being done as part of migration to rely on LangChain for creating agents and managing their state	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	efcf7d1508	Extract prompts as LangChain Prompt Templates into a separate module Improves code modularity, cleanliness. Reduces bloat in GPT.py module	2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky	b484953bb3	Import app state correctly to generate embeddings with OpenAI model Resolves #216	2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky	a0d0dbaca7	Fix link to Khoj Obsidian Demo video in Readmes	2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky	ebb5d7b8e5	Release Khoj version 0.6.2	2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky	d02415edcc	Write generated server id to env file when env file does not contain it	2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky	dc0626856e	Put the telemetry db in a separate directory by default	2023-05-17 18:58:47 +05:30
Debanjum Singh Solanky	e9f04dc644	Add dockerfile to containerize telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	07b19964d4	Schedule jobs at (co-)prime intervals to reduce overlap in job runs	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	d42f0f5055	Add basic telemetry server for khoj	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	134cce9d32	Batch upload telemetry data at regular interval instead of while querying	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	3ede919c66	Log usage of /search, /chat, /update API endpoints to telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	f2e89f6f46	Add khoj app helper methods to log app usage to a telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	9ca61d62ff	Enable/disable logging telemetry by setting bool in khoj.yml config We log usage telemetry by default, unless setting explicitly set in khoj.yml	2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky	131b8407b5	Allow Khoj Chat to respond to general queries not in reference notes - Khoj chat will now respond to general queries if: 1. no relevant reference notes available or 2. when explicitly induced by prefixing the chat message with "@general" - Previously Khoj Chat would a lot of times refuse to respond to general queries not answerable from reference notes or chat history - Make chat quality tests more robust - Add more equivalent chat response options refusing to answer - Force haiku writing to not give any preable, just the haiku	2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky	cc75f986b2	Test text search index only updates on changes to text content	2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky	f9ccce430e	Allow configuring OpenAI chat model for Khoj chat - Simplifies switching between different OpenAI chat models. E.g GPT4 - It was previously hard-coded to use gpt-3.5-turbo. Now it just defaults to using gpt-3.5-turbo, unless chat-model field under conversation processor updated in khoj.yml	2023-05-03 23:01:13 +08:00
Debanjum Singh Solanky	6b535cc345	Snip prepended heading to avoid crossing model max_token limits Otherwise if heading > max_tokens than the search models will just see a heading (with repeated filename) for each compiled entry and not actual content. 100 characters should be sufficient to include filename (not path) and entry heading. If longer rather truncate to pass entry unique text to model for search context	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	02aeee60aa	Set filename as top heading of org entries for better search context Previously filename was only being appended to markdown entries. Test filename getting prepended to compiled entry as heading	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	94825a70b9	Set heading of md entries to improve search context for long entries Otherwise if a markdown entry is longer than max_tokens, the split entries (apart from first one) do not get their heading context set	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	5de04621b5	Set filename as top heading of md entries for better search context Previously filename was appended to the end of the compiled entry. This didn't provide appropriate structured context Test filename getting prepended as heading to compiled entry	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	0e3fb59e09	Entries with no md headings should not get heading prefix prepended Files with no headings would previously get their entry be prefixed with a markdown heading prefix (#)	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	45a991d75c	Prepend entry heading to all compiled org snippets to improve search context All compiled snippets split by max tokens (apart from first) do not get the heading as context. This limits search context required to retrieve these continuation entries	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	3386cc92b5	Fix khoj server config update in khoj.el by unquoting list to cl-push to - cl-push expects a generatlized variable. Else throws (setf quote) undefined warning - This results in the config call failing on calling khoj entrypoint	2023-05-03 15:10:56 +08:00
Debanjum Singh Solanky	948a4274e4	Fix documentation strings and simplify not null checks	2023-05-02 21:47:50 +08:00
Debanjum Singh Solanky	731ef5688f	Use cl-pushnew to fix byte-compile errors with using add-to-list	2023-05-02 21:47:38 +08:00
Debanjum Singh Solanky	f046523b33	Improve khoj.el messages to convey state of khoj server - Remove waiting for server message as it hides the messages from the server - Fix the nil message that were being rendered, by checking before showing messages from server - Consistently prefix messages from khoj with khoj.el	2023-04-28 11:15:13 +08:00
Debanjum Singh Solanky	76df393eb5	Only call khoj server configure API from khoj.el when config updated Previously khoj.el was calling the server configure API even when config was same as before. This had broken the khoj search as you type experience from emacs Also show more details to user about what in khoj is being configured	2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky	ceae06ae9d	Fix khoj.el compilation warnings around unused variables	2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky	8269adf849	Refactor khoj-setup in khoj.el for readability. No functional change	2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky	865d12b6f2	Fix escaping quote in chat references to prevent it breaking out of html	2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky	26cb878327	Add Yarn lockfile for Khoj Obsidian	2023-04-18 00:57:11 +07:00
Debanjum Singh Solanky	e3180d63e6	Sync Khoj Obsidian Tagline with Khoj tagline	2023-04-18 00:56:50 +07:00
Debanjum Singh Solanky	62e6e09521	Release Khoj version 0.6.1	2023-04-17 23:31:35 +07:00
Debanjum Singh Solanky	b079fb31bc	Replace Windows path separators in indexName configured via Khoj Obsidian Resolves #185, #199 - Issue IndexName created from Obsidian Absolute Vault path wasn't replacing windows path, drive separators with underscore. It was only replacing unix path separators - Fix Also replace windows drive and path separators with _ while creating IndexName in Khoj Obsidian plugin	2023-04-17 16:55:33 +07:00
Debanjum Singh Solanky	d90df966a9	Make khoj logger use utf-8 encoding when writing to khoj log file Resolve logger error issue mentioned in #199	2023-04-17 16:55:07 +07:00
Debanjum Singh Solanky	dc3f399f91	Fix to get score associated with SearchResponse in result as string	2023-04-16 20:22:51 +07:00
Debanjum Singh Solanky	d5000c63e1	Update Readmes to use python -m pip install khoj-assistant Makes it easier to tell pip associated with which python is being used. Easier to debug when users have different versions of python installed (e.g 3.10 and 3.11)	2023-04-16 20:17:20 +07:00
Debanjum Singh Solanky	453c84ab79	Add Screenshots of Khoj Chat Interface on Emacs, Obsidian to Readmes	2023-04-07 23:19:47 +07:00
Debanjum Singh Solanky	35aa06067f	Release Khoj version 0.6.0 Upload styles.css via release workflow	2023-03-31 18:13:16 +07:00
Debanjum Singh Solanky	5673bd5b96	Keep original formatting in compiled text entry strings - Explicity split entry string by space during split by max_tokens - Prevent formatting of compiled entry from being lost - The formatting itself contains useful information No point in dropping the formatting unnecessarily, even if (say) the currrent search models don't account for it (yet)	2023-03-30 14:02:46 +07:00
Debanjum Singh Solanky	a2ab68a7a2	Include filename of markdown entries for search indexing Append originating filename to compiled string of each entry for better search quality by providing more context to model Update markdown_to_jsonl tests to ensure filename being added Resolves #142	2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky	67129964a7	Create Note with Query as title from within Khoj Search Modal This follows expected behavior for obsidain search modals E.g Ominsearch and default Obsidian search. The note creation code is borrowed from Omnisearch. Resolves #133	2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky	d3257cb24e	Style the search result. Use Obsidian theme colors and font-size Based on PR #135	2023-03-30 12:35:29 +07:00
Debanjum Singh Solanky	40091489c0	For each result: snip it by lines, show filename, remove frontmatter Based on PR #135 Resolves #134	2023-03-30 12:34:55 +07:00
Debanjum Singh Solanky	240db7b4f0	Add screenshot of Khoj chat on Obsidian to Readme. Fix links	2023-03-30 02:49:05 +07:00
Debanjum Singh Solanky	234be96e53	Fix processor key used to configure chat model in khoj obsidian	2023-03-30 01:47:09 +07:00
Debanjum Singh Solanky	c8c0cfd10e	Add Chat features, setup and usage to Khoj Obsidian plugin Readme	2023-03-30 00:32:24 +07:00
Debanjum Singh Solanky	7ecae224e7	Configure OpenAI API Key from the Khoj plugin setting in Obsidian	2023-03-29 23:54:08 +07:00
Debanjum Singh Solanky	3d616c8d65	Use Obsidian font sizes. Improve input field, reference indexing - Give space in the input field. Too narrow previously - References should be indexed from 1 instead of 0 - Use Obsidian font size variables to scale fonts in chat appropriately	2023-03-29 22:13:55 +07:00
Debanjum Singh Solanky	23bd737f6b	Use chat input element to send message on Enter. No send button required	2023-03-29 22:13:30 +07:00
Debanjum Singh Solanky	81e98c3079	Scroll to bottom of modal on open and message send	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	59ff1ae27f	Use obsidian theme colors for bg, text. Restrict css namespace via prefix	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	001ac7b5eb	Style Obsidian Chat Modal like Khoj Chat Web Interface - Add message sender, date metadata as message footer - Use css directly from Khoj Chat Web Interface. - Modify it to work under a Obsidian modal - So replace html, body styling from web interface to instead styling new "khoj-chat" class attached to contentEl of modal	2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky	112f388ada	Render references next to chat responses by khoj in chat modal	2023-03-28 18:11:03 +07:00
Debanjum Singh Solanky	1d3d949962	Render conversation logs on page load	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	cd46a17e5f	Add Khoj Chat Modal, Command in Khoj Obsidian to Chat using API	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	c0972e09e6	Rename KhojModal to KhojSearchModal, a more specific name for it In preparation to introduce Khoj chat in Obsidian	2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky	64fff1d372	Release Khoj version 0.5.0	2023-03-28 03:35:59 +07:00
Debanjum Singh Solanky	fc218508f9	Update khoj.el docs and Emacs Readme for chat, simplified setup	2023-03-27 22:02:47 +07:00
Debanjum Singh Solanky	83a7ccd729	Fix docstrings and method ordering in khoj.el	2023-03-27 18:33:09 +07:00
Debanjum Singh Solanky	5c2327ee4f	Configure org directories to index from khoj.el Converts paths to glob style regexes that will index all org files recursively under the specified list of path Should help setup for org-roam users from khoj.el	2023-03-27 18:30:53 +07:00
Debanjum Singh Solanky	6e8a40906d	Allow disabling automatic server setup. Fix server start vs ready logic - khoj-auto-setup controls whether to automatically check for and setup khoj server from within Emacs - extract install, start, configure sequence into public, interactive method. Allows calling khoj-setup during package load via init.el - Fix: Do not attempt to configure or wait for server ready if user has said no to auto-setup request - Fix logic to mark server started vs ready - Previously the started/running vs ready variables defs were getting intertwined - Server started indicates server bootup has been triggered - Server ready indicates server API ready to accept requests	2023-03-27 17:53:08 +07:00
Debanjum Singh Solanky	526a927bce	Fix org entry extraction test, variable prefixed with khoj in khoj.el Discovered via failing build and test workflows on Github	2023-03-27 16:44:50 +07:00
Debanjum Singh Solanky	7243059507	Track index update asynchronously via moon phase progressbar in khoj.el	2023-03-27 06:01:04 +07:00
Debanjum Singh Solanky	8a9055f918	Restrict server messages show in echo area to main server files	2023-03-27 04:59:55 +07:00
Debanjum Singh Solanky	ae535a06eb	Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs	2023-03-27 04:59:54 +07:00
Debanjum Singh Solanky	36b17d4ae0	Generalize the directory from config extraction elisp method	2023-03-27 03:44:03 +07:00
Debanjum Singh Solanky	924424c754	Throw actionable exceptions when content types or chat not configured	2023-03-27 02:47:44 +07:00
Debanjum Singh Solanky	359a2cacef	Fix khoj--server-running to work with unconfigured or external server - If khoj server started outside emacs, khoj--server-ready should be set to true by khoj--server-running method (instead of waiting for proc msg) - If khoj server is unconfigured the /config/types endpoint wouldn't return anything. Using config/data/default allows checking khoj server running status without requiring it to be configured as well	2023-03-27 02:45:59 +07:00
Debanjum Singh Solanky	d7fb9a596e	Auto configure server before loading khoj-menu If the config hasn't changed there'll be no update. If config has changed indexing will get triggered asynchronously. But user cannot make query till indexing done As easier to know when server ready to configure	2023-03-27 02:44:02 +07:00
Debanjum Singh Solanky	8a21aff438	Make khoj.el server start, stop, restart, setup methods interactive No need to erase temporary buffers before working on them	2023-03-27 01:53:15 +07:00
Debanjum Singh Solanky	cb40a96c85	Index configured org files from khoj.el - Set `khoj-org-files-index' to list of files to index - Defaults to indexing org-agenda-files - Uses khoj server api to configure org files to index	2023-03-27 01:05:26 +07:00
Debanjum Singh Solanky	50760acc37	Wait for Khoj server to get ready before opening khoj.el transient menu - Use process filter, sentinel to mark when khoj server is ready or not - Display server messages for visibility into server boot-up process - Wait until server ready to open khoj transient menu in Emacs Until then khoj features wouldn't work anyway, so avoids confusion	2023-03-26 13:00:01 +07:00
Debanjum Singh Solanky	82eb4bfd0d	Setup Khoj server on opening khoj from with Emacs - Create helper methods to check, stop, restart, setup khoj server - (Ask to) setup khoj server on calling khoj main entrypoint function	2023-03-26 10:12:06 +07:00
Debanjum Singh Solanky	99d19dcf43	Start Khoj server from Emacs using khoj.el	2023-03-26 09:38:46 +07:00
Debanjum Singh Solanky	c92d79118a	Install Khoj server from Emacs using khoj.el	2023-03-26 08:50:03 +07:00
Debanjum Singh Solanky	e281a498b4	Style Khoj search org buffer via elisp instead of in-buffer settings	2023-03-26 06:34:18 +07:00
Debanjum Singh Solanky	4f655d20ae	Style Khoj chat directly via elisp instead of via in-buffer settings	2023-03-26 06:03:30 +07:00
Debanjum Singh Solanky	f6ff7b1beb	Render foonote reference links as superscript for Khoj Chat on Emacs	2023-03-26 05:33:08 +07:00
Debanjum Singh Solanky	67c850a4ac	Add retry logic to OpenAI API queries to increase Chat tenacity - Move completion and chat_completion into helper methods under utils.py - Add retry with exponential backoff on OpenAI exceptions using tenacity package. This is officially suggested and used by other popular GPT based libraries	2023-03-26 05:12:35 +07:00
Debanjum Singh Solanky	ff846f05c5	Clean-up khoj.el based on linting helpers and manual review	2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky	7e36f421f9	Truncate message logs to below max supported prompt size by model - Use tiktoken to count tokens for chat models - Make conversation turns to add to prompt configurable via method argument to generate_chatml_messages_with_context method	2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky	4725416fbd	Use shortcut keybindings in buffer to ease sending messages to Khoj	2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky	508b2176b7	Update Chat API, Logs, Interfaces to store, use references as list - Remove the need to split by magic string in emacs and chat interfaces - Move compiling references into string as context for GPT to GPT layer - Update setup in tests to use new style of setting references - Name first argument to converse as more appropriate "references"	2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky	b08745b541	Keep chat messages at 1 empty line visible distance in khoj.el - Clean redundant concat, format string - Improve variable name to emojified sender	2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky	27217a330d	Time chat API sub-components for performance analysis Time and the search query extraction, search and response generation components	2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky	5e9558d39d	Stylize references shown as footnote links in chat messages - Render references as superscript - Show reference definitions on hover over reference links to ease access - Truncate reference def shown on hover to 70 char - Add continuation suffix, ..., when reference definition truncated	2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky	cf28f104c7	Register separate timestamps for user query and response by Khoj Chat	2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky	93e2aff786	Add references as org footnotes instead of links	2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky	d78454d4ad	Load Khoj Chat buffer before asking for query to provide context	2023-03-24 13:43:46 +07:00
Debanjum Singh Solanky	863933daaa	Resolve build issues found by melpazoid	2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky	e9ca04af0d	Require dash, org to run ERT tests for khoj.el	2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky	06df394d6c	Style chat messages as org-mode entries in Emacs - Style Message as Org Entries instead of List - Put khoj response as child of user query entry - Improves color coding for readability - Allows folding each back-n-forth - Put timestamp of message received into property drawer - Use standardized time format for new and old chat messages	2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky	364e6c11af	Render chat history from API in chat buffer on first run - Generalize the render-chat-response method to handle rendering history or chat response from chat API reponse - Trigger rendering of khoj chat history if Khoj chat buffer not created for this session yet	2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky	36b52fdd0a	Properly escape reference links before rendering - Use org-insert-link method to improve link rendering robustness Previous simple mechanism to crete org-links would result in links escaping out of formating. Use a user-facing org-mode method to remove/reduce probability of this - Replace newlines with space to render reference notes as links	2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky	72f63a6ef7	Add basic chat interface for Khoj on Emacs - Query khoj chat API to get Khoj Chat response to user message - Render chat messages as a org-mode list in format: - [sender-name]: [message] - /[receive-date]/ - Add references as org links with context visible on hover, but no jump to note - Require dash library for khoj.el to simplify list manipulation. Use `-map-indexed' method from dash	2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky	e4d67694e1	Add search to method, variable names meant for khoj search in khoj.el In preparation to introduce Khoj chat in Emacs	2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky	2f6284872d	Mention Khoj needs Python version 3.10 or lower in docs	2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky	601ff2541b	Revert to using GPT to extract search queries from users message - Reasons: - GPT can extract date aware search queries with date filters better than ChatGPT given the same prompt. - Need quality more than cost savings for now. - Need to figure ways to improve prompt for ChatGPT before using it	2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky	e28526bbc9	Extract search queries from users message using ChatGPT as Search Actor - Reasons - ChatGPT should be better at following instructions than GPT - At 1/10th the cost, it's much cheaper than using older GPT models	2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky	939d7731da	Fix-up Search Actor GPT's response for decoding it as valid JSON	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	f63fd0995e	Pass more search results as context to Chat Actor to improve inference	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	10836dedee	Search should return user message if GPT response is not valid JSON Previously would throw if GPT response is not valid JSON. Better to return original message to use for search instead	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	08f5fb315f	Add answers to context for Search Actor to generate relevant queries Update Search Actor prompt with answers, more precise primer and two more examples for context Mark the 3 chat quality tests using answer as context to generate queries as expected to pass. Verify that the 3 tests pass now, unlike before when the Search Actor did not have the answers for context	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	45cb510421	Loosen search results score thresold used by chat for more context	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	d871e04a81	Use past user messages, inferred questions as context to extract questions - Keep inferred questions in logs - Improve prompt to GPT to try use past questions as context - Pass past user message and inferred questions as context to help GPT extract complete questions - This should improve search results quality - Example Expected Inferred Questions from User Message using History: 1. "What is the name of Arun's daughter?" => "What is the name of Arun's daughter" 2. "Where does she study?" => => "Where does Arun's daughter study?" OR => "Where does Arun's daughter, Reena study?"	2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky	1a5d1130f4	Generate search queries from message to answer users chat questions The Search Actor allows for 1. Looking up multiple pieces of information from the notes E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches 2. Allow date aware user queries in Khoj chat Answer time range based questions Limit search to specified timeframe in question using date filter E.g "What national parks did I visit last year?" adds dt>="2022-01-01" dt<"2023-01-01" to Khoj search Note: Temperature set to 0. Message to search queries should be deterministic	2023-03-18 16:28:51 -06:00
Debanjum	e75e13d788	Create Tests to Measure Chat Quality, Capabilities Create Rubric to Test Chat Quality and Capabilities ### Issues - Previously the improvements in quality of Khoj Chat on changes was uncertain - Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities ### Fix 1. Create an Evaluation Dataset to assess Chat Capabilities - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋) - Add a few of Paul Graham's more personal essays. [Easy to get as markdown](https://github.com/ofou/graham-essays) 2. Write Unit Tests to Measure Chat Capabilities - Measure quality at 2 separate layers - Chat Actor: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py` - Chat Director: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes. - Mark desired but not currently available capabilities as expected to fail <br /> This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat	2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky	7526a50dd4	Extract conversation processor utility funcs from gpt.py into utils.py	2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky	24ddebf3ce	Make converse prompt more precise. Fix default arg vals in gpt methods - Set conversation_log arg default to dict - Increase default temperature to 0.2 for a little creativity in answering - Make GPT be more reliable in looking at past conversations for forming response	2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky	8609e3129e	Fix, improve displaying chat messages, sources by Khoj in web interface Pretty pretty json in conversation logs	2023-03-14 11:24:47 -06:00
Debanjum	6c0e82b2d6	Merge Improve Khoj Chat PR #183 from debanjum/improve-chat-interface # Improve Khoj Chat ## Main Changes - Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost - Improve Prompt to answer query using indexed notes - Previously was asking GPT to summarize the notes - Both the chat and answer API use this new prompt - Support Multi-Turn conversations - Pass previous messages and associated reference notes to ChatGPT for context - Show note snippets referenced to generate response - Allows fact-checking, getting details - Simplify chat interface by using only single unified chat type for now ## Miscellaneous - Replace summarize with answer API. Summarize via API not useful for now - Only pass Khoj search results above a threshold confidence to GPT for context - Allows Khoj to say don't know if it can't find answer to query from notes - Allows relying on (only) conversation history to generate response in multi-turn conversation - Move Chat API out of beta. Update Readme	2023-03-10 19:03:44 -06:00
Debanjum Singh Solanky	cccd225247	Deduplicate and simplify logic to render chat message with reference	2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky	b9caad458e	Type score_threshold with union, not \|, to support python <3.10	2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky	a71f168273	Move the chat API out of beta. Save chat sessions at 15min intervals	2023-03-10 17:20:52 -06:00
Debanjum Singh Solanky	8bb8824d0c	Bump khoj versions in obsidian, emacs files	2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky	e16d0b6d7e	Open references notes used for chat on mobile too (by clicking) Requires clicking the reference as hover doesn't work on mobile	2023-03-09 17:13:07 -06:00
Debanjum Singh Solanky	c3c7b8a951	Make Khoj chat a separate Progressive Web App (PWA) for easier access	2023-03-09 13:45:06 -06:00
Debanjum Singh Solanky	3838f9d8e3	Remove explicitly asking GPT to say I don't know in prompt for now GPT still mostly says I don't know when answer not in notes or chats But with this its more inclined to answer general questions not in chats or notes while informing user that the information is not from existing chats or notes	2023-03-09 12:11:44 -06:00
Debanjum Singh Solanky	f7b8cdd02e	Log prompts being passed to GPT for debugging	2023-03-08 19:17:52 -06:00
Debanjum Singh Solanky	2739a492b4	Log message metadata along with Khoj message instead of user message References should be attached to khoj chat messsage rather than the users message in the chat interface	2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky	87d1e1341d	Show reference notes used as response context in chat interface	2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky	280061e1fa	Do not deduplicate search results used for chat context - Chat uses compiled form of search results, not the raw entries to provide context for chat. The compiled snipped search results themselves are unique and using multiple of them for context from the same raw note is fine if they cross the score and rank thresholds This should improve the context provided for chat - Also apply score_threshold, no deduplication to the answers API	2023-03-06 23:51:31 -06:00
Debanjum Singh Solanky	672f61529e	Make getting deduped search results configurable via Search API	2023-03-06 23:48:46 -06:00
Debanjum Singh Solanky	4fb628975c	Fix jumping to note from Khoj Obsidian search modal result on Windows - Issue The file path separator by khoj server and the Obsidian vault were different on Windows - Fix Normalize file path to use forward slash(/) to find the matching note file in the Obsidian vault for jump to it Resolves #177	2023-03-05 21:07:54 -06:00
Debanjum Singh Solanky	b6cdc5c7cb	Do not expose answer API as a chat type in chat web interface or API Answer does not rely on past conversations, just the knowledge base. It is meant for one off interactions, like search rather than a continuing conversation like chat For now it is only exposed via API. Later it will be expose in the interfaces as well Remove ability to select different chat types from the chat web interface as there is only a single chat type Stop appending answers to the conversation logs	2023-03-05 18:21:59 -06:00
Debanjum Singh Solanky	7f994274bb	Support multi-turn conversations in chat mode - Only use decent quality search results, if any, as context - Pass source results used by previous chat messages as context - Loosen prompt to allow looking at previous chats and notes to answer - Pass current date for context - Make GPT provide reason when it can't answer the question. Gives user context to tune their questions	2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky	d73042426d	Support filtering for results above threshold score in search API	2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky	45f461d175	Keep search results passed to GPT as context in conversation logs This will be useful to 1. Show source references used to arrive at answer 2. Carry out multi-turn conversations	2023-03-05 16:00:19 -06:00
Debanjum Singh Solanky	7cad1c9428	Only use past chat message, not session summaries as chat context Passing only chat messages for current active, and summaries for past session isn't currently as useful	2023-03-05 16:00:18 -06:00
Debanjum Singh Solanky	ad1f1cf620	Improve and simplify Khoj Chat using ChatGPT - Set context by either including last 2 chat messages from active session or past 2 conversation summaries from conversation logs - Set personality in system message - Place personality system message before last completed back & forth This may stop ChatGPT forgetting its personality as conversation progresses given: - The conditioning based on system role messages is light - If system message is too far back in conversation history, the model may forget its personality conditioning - If system message at end of conversation, the model can think its the start of a new conversation - Inserting the system message before last completed back & forth should prevent ChatGPT from assuming its the start of a new conversation while not losing personality conditioning from the system message - Simplfy the Khoj Chat API to for now just answer from users notes instead of trying to infer other potential interaction types. - This is the default expected behavior from the feature anyway - Use the compiled text of the top 2 search results for context - Benefits of using ChatGPT - Better model - 1/10th the price - No hand rolled prompt required to make GPT provide more chatty, assistant type responses	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	9d42b5d60d	Use multiple compiled search results for more relevant context to GPT Increase temperature to allow GPT to collect answer across multiple notes	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	c3b624e351	Introduce improved answer API and prompt. Use by default in chat web interface - Improve GPT prompt - Make GPT answer users query based on provided notes instead of summarizing the provided notes - Make GPT be truthful using prompt and reduced temperature - Use Official OpenAI Q&A prompt from cookbook as starting reference - Replace summarize API with the improved answer API endpoint - Default to answer type in chat web interface. The chat type is not fit for default consumption yet	2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky	7184508784	Mention Python and Pip need to be installed in Main and Emacs Readme	2023-03-02 21:28:54 -06:00
Debanjum Singh Solanky	211e460398	Output date filter from cache log at debug level. Remove unused imports Other logs not directly useful to user have already been converted to debug log levels in `1ae4016`. Just forgot to convert this log line too	2023-03-02 15:41:32 -06:00
Debanjum Singh Solanky	b6dbe4dd1d	Do not try retrieve an unconfigured core content type in Config GUI Previous behavior was resulting in a null reference error. As key for the core content/search type was not present in current config Fallback to using default config for unconfigured core content type instead See #165 for details	2023-03-02 11:09:31 -06:00
Debanjum Singh Solanky	1ae40163a9	Show user friendly information logs by default for context - Use emojis to make info logs easier to read - Inform when khoj is ready to use - Provide information on what khoj is doing while starting up - Inform when content/search types and processors are setup - Inform when models are being loaded from the web as this step can take time - Convert all other info logs to be only shown in verbose mode	2023-03-01 16:39:07 -06:00
Debanjum Singh Solanky	fe03ba3dce	Index intro text before headings in org files - Text before headings was not being indexed due to buggy orgnode parsing logic - Resolved indexing intro text from files with and without headings in them - Ensure intro text node has heading set to all title lines collected from the file Resolves #165	2023-03-01 12:11:33 -06:00
Debanjum Singh Solanky	7ad251b8ef	Log and Continue on OSError while collating dates for date filters Log to understand if error, date can be handled better Mitigates #172	2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky	2bed4c3b50	Fix configuring search types & /config/types API when no plugin configured - Test /config/types API when no plugin configured, only plugin configured and no content configured scenarios - Do not throw null reference exception while configuring search types when no plugin configured - Do not throw null reference exception on calling /config/types API when no plugin configured Resolves bug introduced by #173	2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky	8914dbd073	Fix creating GUI panels for unconfigured search, processor types Repro: 1. Open khoj server with `khoj` on first run 2. Install/enable Khoj Obsidian plugin (to configure khoj server) 3. Restart khoj server with `khoj` Bug: - Unconfigured processor and search_types are instantiated as None in self.current_config - While creating the desktop GUI, these null configs are attempted to be accessed as valid dictionaries for creating their GUI panels - This results in the null ref errors Fix: Use default config to create their GUI elements for unconfigured search and processor types Resolves #167	2023-03-01 01:20:58 -06:00
Debanjum Singh Solanky	b09350c052	Fix to return only enabled content types via the new config/types API - Previously was return all core content types even if they had not been setup - Add test to validate only configured content types are returned by the api/config/types API endpoint	2023-02-28 22:08:26 -06:00
Debanjum Singh Solanky	b177adf3a7	Return value of search_type in /config/type API endpoint - Remove need for interfaces to downcase content types returned by API before using the type in search and other API endpoint - Fix to check for search_type.name in plugin keys instead of value	2023-02-28 21:49:26 -06:00
Debanjum Singh Solanky	88344f9ed2	Improve rendering search results of plugin content types on web interface Render only the entry from plugin search response instead of raw json Use the results-ledger styling for results-plugin styling	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	c2814fce58	Improve rendering search results of plugin content types in khoj.el Render only the entry from plugin search response instead of raw json	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	f3f24387ec	Use new config/types API to set enabled content types on web interface	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	1e43f1a12e	Use new config/types API to set enabled content types in khoj.el menu	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	9d38eadd42	Return enabled content types via api/config/types API endpoint Simplifies dynamically populating enabled content types for interfaces	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	68bd5d9ebc	Configure API routes after set up search types while configuring server Configure app routes after configuring server. Import API routers after search type is dynamically populated. Allow API to recognize the dynamically populated plugin search types as valid type query param. Enable searching for plugin type content.	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	d91c7e2761	Search for plugin content via the search API	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	47b58a2a4d	Configure, use dynamically instantiated SearchType enum on app start The SearchType is now dynamically populated with core and configured plugin types Use the new dynamic SearchType enum from state.py across codebase	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	ab0d3a08e2	Index configured plugins on app start and via update API endpoint	2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky	55a032e8c4	Add processor to index entries from jsonl files for plugins - Read, merge entries from input jsonl files and filters - Mark new, modified entries for update	2023-02-24 02:54:12 -06:00
Debanjum Singh Solanky	fcbbe8c759	Read content plugin configs from Khoj config YAML Configure external text content plugins via the Khoj YAML Reuse existing TextContentConfig definition for external text content plugins	2023-02-23 23:57:32 -06:00
Debanjum Singh Solanky	61b6ee2857	Use helper script to bump khoj pre-release versions	2023-02-17 20:31:51 -06:00
Debanjum Singh Solanky	053d6141f3	Ignore ts typing error, Fix SPDX license identifier in Obsidian plugin	2023-02-17 18:19:01 -06:00
Debanjum Singh Solanky	36be3c4b8f	Fix or ignore MyPy issues in PyQt desktop GUI code - Remove unneeded type ignore for mps with the latest mypy - Stop excluding PyQT desktop GUI code from MyPy checks - Do not warn about unused ignores. Some issue with mypy giving different errors in different environments (venv, system and pre-commit)	2023-02-17 16:13:05 -06:00
Debanjum Singh Solanky	051f0e3fb5	Add, configure and run pre-commit locally and in test workflow	2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky	5e83baab21	Use Black to format Khoj server code and tests	2023-02-17 11:55:17 -06:00
Debanjum Singh Solanky	8b293edd7c	Move mypy config into pyproject.toml. Ignore 2 remaining mypy issues	2023-02-16 03:33:08 -06:00
Debanjum Singh Solanky	c641eb4ad6	Improve rendering log and error stacktraces using the Rich package - Use Rich to render uvicorn, fastAPI logs as well The previous CustomFormatter only worked on khoj logs - Improve rendering stacktrace on errors using Rich	2023-02-15 16:19:32 -06:00
Debanjum Singh Solanky	bc7477ea3e	Move Emacs, Obsidian plugin code out from under src/khoj directory - What - The Emacs and Obsidian interfaces stay in their original directories under src/ - src/khoj now only contains code meant for pypi packaging - Benefits - This avoids having to update khoj MELPA, Obsidian plugin config as the Emacs, Obsidian code is under their original directories - It separates the code in src/khoj meant for python packaging from code for external interfaces like Emacs and Obsidian	2023-02-14 15:44:22 -06:00
Debanjum Singh Solanky	25a749ca1d	Use the src/ layout to fix packaging Khoj for PyPi - Why The khoj pypi packages should be installed in `khoj' directory. Previously it was being installed into `src' directory, which is a generic top level directory name that is discouraged from being used - Changes - move src/* to src/khoj/* - update `setup.py' to `find_packages' in `src' instead of project root - rename imports to form `from khoj.*' in complete project - update `constants.web_directory' path to use `khoj' directory - rename root logger to `khoj' in `main.py' - fix image_search tests to use the newly rename `khoj' logger - update config, docs, workflows to reference new path `src/khoj'	2023-02-14 15:19:06 -06:00
Debanjum	84322b2a45	Demo using Search in Khoj Obsidian Plugin	2023-02-14 08:43:50 -08:00
Debanjum Singh Solanky	a4dcb20622	Add setting to toggle auto configuring of khoj backend from Obsidian - By default the obsidian plugin automatically configures the khoj backend to index the current vault - For more complex scenarios, users can manage their ~/.khoj/khoj.yml manually by toggling the auto-configure setting off in the khoj plugin settings Resolves #156	2023-02-13 20:15:28 -06:00
Debanjum Singh Solanky	24aa696ef5	Indicate indexing active on Update button in Obsidian plugin settings Use moon rotating through phases to indicate notes indexing in progress Resolves #129	2023-02-13 19:28:19 -06:00
Debanjum Singh Solanky	11517ba8eb	Encode jsonl data as utf8 for gzip write for consistent read/write encoding Should help with issue #89	2023-02-12 17:33:23 -06:00
Debanjum Singh Solanky	3ec41c4d64	Wrap lines for org, markdown results in khoj search results buffer	2023-02-12 07:33:50 -06:00
Debanjum Singh Solanky	9a013ec48f	Add more details to setup Khoj backend in Obsidian plugin readme	2023-02-12 07:31:13 -06:00
Jason Axelson	6d5930363a	Fix obsidian plugins doc link Also make it more obvious where the link is going, initially I thought the link was to another official khoj documentation site.	2023-02-10 07:11:21 -10:00
Debanjum Singh Solanky	215235efd2	Bump khoj pre-release version	2023-02-08 20:24:36 -03:00
Debanjum Singh Solanky	2445664d40	Deprioritize searching for Music content over other text content	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	2e052913b6	Search in first configured content type when no search type set Instead of searching through all configured content types but only returning results of the last configured content type	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	a26ab31d20	Allow chat with markdown notes if no org-mode content configured	2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky	fbb7747dcc	Read Markdown file as utf8 instead of the default encoding used by OS - Background 1. Obsidian stores markdown notes as utf8[1] 2. By default, the python `open' command uses the OS locale encoding[2] This was causing the `UnicodeDecodeError: <locale_encoding> codec can't decode byte' error - Fix - Read markdown files as utf8 The Obsidian plugin is the main use-case for markdown files in khoj currently and that stores md files as utf8. Do not assume utf8 for other content types like org-mode, beancount for now. - Fail if error in reading file as utf8, instead of ignoring errors. Would rather have user realize that their files are not going to get indexed correctly. [1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3 [2]: https://docs.python.org/3/library/functions.html#open	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	66dca6cf33	Add Docs to Search across Languages, Uninstall Khoj to Readme Add details and fixes to Obsidian, Main readme based on feedback, confusion from the Obsidian plugin announcement	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	cba9a6a703	Use List, Tuple, Set from typing to support Python 3.8 for khoj Before Python 3.9, you can't directly use list, tuple, set etc for type hinting Resolves #130	2023-02-06 01:23:52 -03:00
Debanjum Singh Solanky	f26cee604d	Update Khoj Plugin Install Instructions. Rename main Readme to README Khoj plugin page from within Obsidian isn't recognized. Seems like it needs an uppercase readme file only. So it doesn't show the Khoj readme from within Obsidian itself.	2023-01-27 20:01:31 -03:00
Debanjum Singh Solanky	2e13e15625	Ensure markdown entries in khoj.el results separated by empty line - Update khoj.el test to reflect updated rendering logic - Move ledger render function before image rendered to group functions with similar logic closer	2023-01-26 19:13:02 -03:00
Debanjum Singh Solanky	85ae46f429	Use thread_last to make results rendering funcs more readable in khoj.el	2023-01-26 18:59:44 -03:00
Debanjum Singh Solanky	b415f87093	Split code in onChooseSuggestion method to make it more readable Split find file, jump to file code to make onChooseSuggestion more readable - Use find, instead of using return in forEach to get first match - Move the jump to file+heading code out from forEach	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	37063f6a38	Truncate query to 8k chars for find similar notes from obsidian plugin Truncate current file data passed to khoj backend API via query string below default query size supported by popular servers	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	4456cf5c8f	No need to use then or finally in async functions after an await	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	4070be637c	Pass app object from plugin instance to child objects and functions Do not reference global app object from child objects and funcs directly. It is only available for debugging purposes and access to it maybe dropped in the future.	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	c203c6a3fd	Use Sentence case for Find Similar Note command name in Khoj Obsidian	2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky	e18124ef6f	Add badge for tests and update project subtitle in khoj.el Readme	2023-01-23 20:52:03 -03:00
Debanjum Singh Solanky	86e808abfb	Test get-current-text helpers for Find Similar feature in khoj.el	2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky	be6acda212	Create khoj.el tests. Test rendering results of each content types	2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky	0d0bf3b5aa	Simplify get-current-text functions for Find Similar in khoj.el Use existing functions like `string-trim', `thing-at-point' and remove unneeded code from the two functions	2023-01-23 19:15:52 -03:00
Debanjum Singh Solanky	07e9e4ecc3	Get current paragraph text when point at start of paragraph in khoj.el Previously if cursor was at start of current paragraph, it would get text for the current and next paragraph, instead of just the current one	2023-01-23 18:05:54 -03:00
Debanjum Singh Solanky	a0b03c8bb1	Get current entry text when point at heading for Find Similar in khoj.el Previously if cursor was at heading of current entry, it would find entries similar to the previous outline heading, instead of the current one	2023-01-23 10:01:25 -03:00
Debanjum Singh Solanky	013c7c10a4	Bump khoj pre-release version	2023-01-22 18:45:56 -03:00
Debanjum Singh Solanky	ad3c9b5f44	Bump khoj version to 0.2.5 in preparation for release	2023-01-22 18:18:21 -03:00
Debanjum Singh Solanky	9ed056c7e7	Use consistent indentation in Khoj Emacs Readme	2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky	0980c6e87f	Update Emacs Usage section in Readme. Add find-similar, menu usage	2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky	6908b6eed3	Truncate image queries below max tokens length supported by ML model This would previously return the infamous tensor size mismatch error Verify this error is not raised since adding the query truncation logic	2023-01-21 14:11:00 -03:00
Debanjum Singh Solanky	3d9ed91e42	Search by image at path only if query of form "file:/path/to/image" Previously no query syntax helpers, like the "file:" prefix, were used before checking if query contains file path. This made query to image search brittle to misinterpretation and pointless checking Add test to verify search by image at file works as expected	2023-01-21 14:06:56 -03:00
Debanjum Singh Solanky	b7aa22a059	Change order of arg passed to query-api-and-render-results by importance	2023-01-20 22:13:24 -03:00
Debanjum Singh Solanky	936a88fa7e	Find items of specified type similar to current text item at point - Support querying with text surrounding point in any text buffer Previously could only find items similar to org entry at point - Find similar items of specified content type indexed on khoj Previously only looked for similar org entries indexed on khoj Now uses the content-type configured in khoj transient menu to find items of the specified content type - Details - Generalize the get-current-org-entry-text func to get text for any outline section - Replace leading whitespaces from query text as well - Create method to get current paragraph text from non-outline mode buffers - Update transient, find-similar funcs to pass, use content-type configured in khoj transient menu - Generalize query title creation logic to remove markdown headings prefix (#) apart from org heading prefix (*) as well - Update last used khoj content-type and results from the find-similar and update funcs for later reuse - Jump to top of results buffer after results rendered	2023-01-20 22:12:54 -03:00
Debanjum Singh Solanky	17aaadea1f	Find notes similar to current org entry at point	2023-01-20 05:14:54 -03:00
Debanjum Singh Solanky	44bbc0a417	Add section separators to khoj.el for easier code traversal	2023-01-19 23:36:54 -03:00
Debanjum Singh Solanky	48ad3c535e	Use default content types if fail to call backend on khoj.el load Do not want khoj.el to fail on init/load if khoj backend not running	2023-01-19 20:13:49 -03:00
Debanjum Singh Solanky	9f0bd0a361	Add Github workflow for khoj.el build and quality checks Add khoj.el build badge to khoj.el Readme	2023-01-19 20:13:19 -03:00
Debanjum Singh Solanky	0dd1cba272	Rename configuration sections in khoj.el transient menu	2023-01-19 03:03:08 -03:00
Debanjum Singh Solanky	5d0f369186	Add ability to quit khoj transient with standard q keybinding	2023-01-19 02:47:07 -03:00
Debanjum Singh Solanky	87c7cf4272	Use single khoj func as entrypoint. Group khoj.el code into sections - Give more relevant, specific name to khoj suffix commands - Remove `khoj-simple'. Have single `khoj' function for entrypoint	2023-01-19 02:38:19 -03:00
Debanjum Singh Solanky	9d64a009fd	Allow updating khoj content index from within khoj.el - Split transient config menu by type	2023-01-18 23:07:59 -03:00
Debanjum Singh Solanky	a8d0c7d905	Rename search type to more apt content type in khoj.el	2023-01-18 22:13:49 -03:00
Debanjum Singh Solanky	00daea16df	Allow setting default-search-type to image. Make docstrings compact	2023-01-18 22:01:17 -03:00
Debanjum Singh Solanky	216b17cfd0	Dynamically populate content type choices when khoj transient invoked	2023-01-18 22:00:56 -03:00
Debanjum Singh Solanky	5f446b1440	Convert main khoj.el entrypoint into transient menu for richer configuration	2023-01-18 21:50:07 -03:00
Debanjum Singh Solanky	5c07dcd219	Fix, update Obsidian Readme. Add Find Similar Notes to Implementation section	2023-01-18 00:22:26 -03:00
Debanjum	b7fc344be1	Search for Similar Notes from Obsidian Plugin Enable searching for notes similar to the current note being viewed ## Main Changes - `39a18e2` Extend search modal to search for similar notes - Hide input field on init, Trigger search on opening modal when in similar notes mode - Set input to contents of current markdown file and get notes similar to it - Re-rank, by default, when searching for similar notes - Filter out current note from similar note search results - `0bed410` Only show `Find Similar Note' command in Editor	2023-01-18 00:10:10 -03:00
Debanjum Singh Solanky	6119d0a69e	Add usage of "Find Similar Notes" command to the Khoj Obsidian Readme	2023-01-18 00:03:13 -03:00
Debanjum Singh Solanky	657e455785	Remove unused `onunload' method in main.ts of khoj obsidian plugin	2023-01-17 23:46:38 -03:00
Debanjum Singh Solanky	0bed410712	Limit Find Similar Note command to be triggered from Editor Fixup indentation and comments	2023-01-17 19:34:48 -03:00
Debanjum Singh Solanky	39a18e2080	Add ability to search for similar notes in Khoj Obsidian - Hide input field on init, Trigger search on opening modal in similar notes mode - Set input to current markdown file and get similar notes to it - Enable rerank when searching for similar notes - Filter out current note from similar note search results	2023-01-17 19:07:18 -03:00
Debanjum Singh Solanky	ffaef92476	Encode query string before passing as query param to search API	2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky	d5a7cc5b0f	Compact code to map results from search API into SearchResult objects Make code compact for readability Remove unneeded temporary variables and return statements	2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky	8ab7a26bde	Update Khoj on Obsidian screenshots in Main and Plugin Readme - Screenshot querying "Setup Editor" on test vault with Khoj Readmes - New features showcase: - information keybindings, rerank keybinding at bottom of modal - fixed top level headings in search results - search results snipped if greater than N words	2023-01-17 13:58:50 -03:00
Debanjum Singh Solanky	7b4f78776c	Fix extracting Markdown Entries with Top Level Headings - Previously top level headings would have get stripped of the space between heading text and the prefix # symbols. That is, `# Top Level Heading' would get converted to `#Top Level Heading' - This would mess up their rendering as a heading in search results - Add unit tests to text_to_jsonl processors to prevent regression	2023-01-17 13:06:28 -03:00
Debanjum Singh Solanky	1a296518c5	Limit total words for each Search Result rendered in search modal Provides a more consistent rendering of results in modal. Makes it easier to see more results in modal. To see complete entry, user can always just jump to entry from modal	2023-01-17 13:06:14 -03:00
Debanjum Singh Solanky	e7b89f7fd0	Return compiled entry in additional details of /api/search response This can be used to highlight portion of raw entry to highlight and for passing to summarizer to stay with max_tokens limit supported by GPT models	2023-01-16 22:56:06 -03:00
Debanjum Singh Solanky	7071d081e9	Increase max_tokens returned by GPT summarizer. Remove default params	2023-01-16 22:55:36 -03:00
Debanjum Singh Solanky	3d9cdadbbb	Add codebase visualization of Khoj Obsidian to Khoj Obsidian Readme	2023-01-15 14:09:21 -03:00
Debanjum Singh Solanky	d02ba325aa	Handle empty chat history returned by API to chat.html on web interface	2023-01-15 13:51:16 -03:00
Debanjum	3f2ea039a7	Add Chat page to the Khoj Web Interface ### Overview - Provide a chat interface to engage with and inquire your notes - Simplify interacting with the beta `chat` and `summarize` APIs ### Use - Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize - Type your queries, see summarized response by Khoj from your notes Note: - You will need to add an API key from OpenAI to your khoj.yml - Your query and top note from search result will be sent to OpenAI for processing ## Details - `177756b` Show chat history on loading chat page on web interface - `d8ee0f0` Save chat history to disk for persistence, seeing chat logs - `5294693` Style chat messages as speech bubbles - `d170747` Add khoj web interface and chat styling to new chat page on khoj web - `de6c146` Implement functional, unstyled chat page for khoj web interface	2023-01-13 23:02:19 -03:00
Debanjum Singh Solanky	16d4560ff8	Comment css styling of chat page for later reference	2023-01-13 22:40:01 -03:00
Debanjum Singh Solanky	cfef346d03	Do not update query field to ever chat message It doesn't work as well with chat, unlike for search page Use more appropriate thinking face emoji for you instead of surprise face	2023-01-13 22:24:26 -03:00
Debanjum Singh Solanky	177756be7e	Fetch chat history from backend and render it on chat page load	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	330febaa1a	Update conversation logs from /beta/summary API endpoint too	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	cb6f0b53c9	Make user_message_metadata arg to message_to_log in gpt.py optional - Use a default user_message_metadata if arg not set - Update conversation to use `by' as `you' and `khoj'	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	cc2456e411	Update /beta/chat API to return chat history if no query param passed	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	d8ee0f0e9a	Use scheduler to save chat history to disk every 5 minutes - The previous mechanism to trigger saving on shutdown event did not work - Use scheduler to persist chat sessions to disk at a 5 minute interval - This improve time granularity, fixed interval of saving chat logs - It may lose ~5 minutes of chat history until mechanism to also write on shutdown found/resolved - Create conversation directory if it doesn't exist before attempting write - Reset chat_session after writing it to disk	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	5294693e97	Style message as speech bubbles on chat page of web interface - Wrap messages into speech bubbles - Color messages by khoj blue, sender grey - Add those standard protrusions to the speech bubbles for fun - Align bubbles left or right based on sender - messages by khoj are left aligned, message by self are right aligned - Put message metadata like sender and time under speech bubble - use data-* attribute and ::after css pseudo-selector for this - Update renderMessage func to accept time param, remove unused type_ param	2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky	7723d656dc	Do not force GPT to summarize note using past tense Not all notes are in the past. Notes can be about stuff in the future. Casting them to past tense gives the impression that they've already happened / been done.	2023-01-13 13:10:35 -03:00
Debanjum Singh Solanky	2842e3a035	Automatically scroll to bottom of chat body on new messages	2023-01-13 13:09:51 -03:00
Debanjum Singh Solanky	34014635d0	Improve colors, fix contrast for accessability on web interface - Changes - Use blue color for khoj heading font - This fixes the title color issue - Update background to lighter shade - This fixes the body text color issue - Update colors for todo, done, miscellaneous todo state, tag color - This does not fix the color contrast issue but seems like an acceptable solution - Using white text rather than black text on blue background better even though the black text on blue background passes the WCAG acceptable contrast score - For details see blog post: https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/ - Add border to tags to give them tag pills look and differntiate from todo states - Buttons and inputs - Change background color of input fields like type dropdown, update button and results count counter, to match background color of page - Add shadow on hover over button, dropdowns Resolves #111	2023-01-12 21:59:50 -03:00
Debanjum Singh Solanky	d170747ec2	Add khoj web interface & chat styling to new chat page on khoj web - Ensure message input box sticks to bottom of screen - Ensure chat logs div is scrollable when logs become longer than screen Do not make the whole page scroll, just the chat logs body div	2023-01-12 21:58:46 -03:00
Debanjum Singh Solanky	de6c146290	Implement functional, unstyled chat page for khoj web interface Expose it at /chat URL	2023-01-12 21:53:25 -03:00
Debanjum Singh Solanky	e6793816f9	Upgrade Khoj.el Readme. Add TOC, Screenshot, Features Sections - Update Query filter details	2023-01-12 02:14:02 -03:00
Debanjum Singh Solanky	26f791e9ad	Update Obsidian Plugin Readme. Add Khoj icon to Khoj Modal Placeholder text - Fold Query Filter, Demo Description - Add Limitations to Readme - Add Update index bullet to Troubleshooting Options	2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky	3e63af5c94	Constrain grid rows to fix layout of Khoj web interface on Chrome	2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky	50c797962c	Jump to Search Result from Khoj Modal even on Obsidian Android Uses longest file path match to find markdown file in vault corresponding to file of search result returned by Khoj Allow jumping to search result from khoj plugin modal on Android too	2023-01-11 19:44:11 -03:00
Debanjum Singh Solanky	51ea6d9c9b	Do not force index update when configure backend on plugin load - Backend can handle incremental updates - Avoid khoj usability delay by avoiding recomputed everytime vault opened	2023-01-11 17:17:08 -03:00
Debanjum Singh Solanky	5996d47d7c	Trigger input event to Get, Render Reranked results from Khoj backend Previous mechanism of manually triggering getSuggestions, renderSuggestions flow was corrupting traversing and opening reranked search results in KhojModal Emulate event that would anyway trigger the get & render of results in modal. This lets obsidian core handle the flow without digging too deep into obsidian cores handling of the flow. Lowers the chance of breakage	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	1c813a6884	Convert results count setting to slider in plugin settings pane	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	4e1abd1b72	Disable update button while indexing vault in plugin settings	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	513c86c6a1	Set index file paths relative to current or default path on khoj backend We need the index file paths to make sense on the khoj backend server Having path of index on backend relative to current vault directory on frontend ignores the fact that the frontend maybe on a different machine than the khoj backend server Using unique index name per vault allows switching vaults without overwriting indices of other vaults created on khoj backend when khoj obsidian plugin is loaded on opening a different vault	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	4407e23c19	Only index current vault on Khoj. Remove plugin setting to configure it - Overview Limits using Khoj with a single vault at a time. This is automatically configured to the most recently opened vault. Once directory filters are supported on backend, the plugin will be updated to index multiple vault but search only current vault from current vaults khoj obsidian plugin - Code Details - Remove setting to configure Vault directory from Khoj Obsidian plugin - Automatically configure Khoj to index only current Vault. - Overwrites any previous vaults that were intended to be indexed by Khoj backend - Force update of index after configuring vault - Why It's not helpful for now and can lead to more problems, confusion. Once directory filters	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	86a1e43605	Return HTTP Exception on /api/update API call failure - Previously the backend was just throwing backend error. The frontend calling the /update API wasn't getting notified - Now the frontend can react appropriately and make the issue visible to the user	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	5af2b68e2b	Update plugin notifications for errors and success - Only show notification on plugin load and failure. - In settings page, set current backend status at top of pane instead of showing notification Notices bubbles cluttered the UI while typing updates to settings - Show notification once index updated via settings pane button click There was no notification on index updated, which usually takes time on the backend	2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky	853192932a	setCTA on Khoj Obsidian plugin button. Minor cleanup of space, tabs	2023-01-10 23:36:02 -03:00
Debanjum Singh Solanky	da49ea272c	Add placeholder text to modal in Khoj Obsidian plugin	2023-01-10 22:50:11 -03:00
Debanjum Singh Solanky	580f4aca23	Add hints to Modal for available Keybindings	2023-01-10 22:03:47 -03:00
Debanjum Singh Solanky	b52cd85c76	Allow Reranking Results using Keybinding from Khoj Search Modal	2023-01-10 21:59:38 -03:00
Debanjum Singh Solanky	7991ab7a86	Add button in Obsidian plugin settings to force re-indexing your vault	2023-01-10 19:49:12 -03:00
Debanjum Singh Solanky	f046a95f3d	Track connectedToBackend as a setting. Use it across obsidian plugin - Display warning at top of khoj obsidian plugin settings - Make search command available only if connected to backend - Show warning notice on clicking khoj search ribbon button - Call saveData after configureKhojBackend to ensure connnectedToBackend setting saved after being (potentially) updated in configureKhojBackend function	2023-01-10 17:28:47 -03:00
Debanjum Singh Solanky	768e874185	Load obsidian plugin even if fail to connect to backend but show warning - Previously the plugin would not load if cannot connect to Khoj backend - Silently failing to load with no reason provided is not helpful - Load plugin to allow user to fix the Khoj URL in their plugin setting - Show reason for khoj plugin not working. More helpful than failing silently	2023-01-10 17:20:02 -03:00
Debanjum Singh Solanky	aa22d83172	Create and use a context manager to time code Use the timer context manager in all places where code was being timed - Benefits - Deduplicate timing code scattered across codebase. - Provides single place to manage perf timing code - Use consistent timing log patterns	2023-01-09 19:48:16 -03:00
Debanjum Singh Solanky	93f39dbd43	Add typing to text_search. Reformat code to set existing_embedding	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	db7483329c	Only import type hint packages for type checking. Avoids circular imports Use annotations from the __future__ package to avoid having to quote type hints. This import will not be required after Python 3.11	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	e5254a8e56	Create BaseEncoder class. Make OpenAI encoder its child. Use for typing - Set type of all bi_encoders to BaseEncoder - Make load_model return type Union of CrossEncoder and BaseEncoder	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	cf7400759b	Remove unused render_results method from text and image search It's a relic from when khoj was being used as a python module	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	afcfc3cd62	Split text_search.query logic into separate methods for modularity The query method had become too big. Extract out filter, score, sort and deduplicate logic used by text_search.query into separate methods. This should improve readabilty of code.	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	8dc6ee8b6c	Pass `model' arg to extract_search_type method from beta search API Issue caught by mypy	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	8498903641	Fix, add typing to Filter and TextSearchModel classes - Changes - Fix method signatures of BaseFilter subclasses. Else typing information isn't translating to them - Explicitly pass `entries: list[Entry]' as arg to `load' method - Fix type of `raw_entries' arg to `apply' method to list[Entry] from list[str] - Rename `raw_entries' arg to `apply' method to `entries' - Fix `raw_query' arg used in `apply' method of subclasses to `query' - Set type of entries, corpus_embeddings in TextSearchModel - Verification Ran `mypy --config-file .mypy.ini src' to verify typing	2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky	eace7c6215	Use torch.tensor as torch.Tensor cannot create tensor on MPS device - `torch.Tensor' is apparently a legacy tensor constructor - Using that to create tensor on MPS devices throws error: RuntimeError: legacy constructor expects device type: cpu but device type: mps was passed - `torch.tensor' can handle creating tensors on Mac GPU (MPS) fine	2023-01-09 19:47:19 -03:00
Debanjum Singh Solanky	9def3f8c6f	Add exception handling to beta APIs, in case OpenAI API call fails	2023-01-09 01:27:06 -03:00
Debanjum Singh Solanky	7b164de021	Add beta API to summarize top search result using an OpenAI model This is unlike the more general chat API that combines summarization of top search result and conversing with the OpenAI model This should give faster summary results. As no intent categorization API call required	2023-01-09 01:25:59 -03:00
Debanjum Singh Solanky	d36da46f7b	Truncate prompt to not exceed OpenAI prompt limit Truncate prompt containing the top retrieved entry to 500 words to avoid triggering the max_token limit error	2023-01-09 00:51:46 -03:00
Debanjum Singh Solanky	237123d18c	Fix tests for the conversation processor - Use latest davinci model for tests - Wrap prompt in triple quotes to improve legibilty - `understand' method returns dictionary instead of string. Fix its test - Fix prompt for new model to pass `chat_with_history' test	2023-01-09 00:22:26 -03:00
Debanjum Singh Solanky	918af5e6f8	Make OpenAI conversation model configurable via khoj.yml - Default to using `text-davinci-003' if conversation model not explicitly configured by user. Stop using the older `davinci' and `davinci-instruct' models - Use `model' instead of `engine' as parameter. Usage of `engine' parameter in OpenAI API is deprecated	2023-01-09 00:17:51 -03:00
Debanjum Singh Solanky	74e779f8d0	Fix /beta/chat API to use Entry class instead of old dictionary pattern Search returns response of type SearchResponse instead of a dict now	2023-01-08 15:28:26 -03:00
Debanjum Singh Solanky	f2436039a0	Improve readability of GPT prompt strings in conversation processor	2023-01-08 15:27:41 -03:00
Debanjum Singh Solanky	6119005838	Improve comments, exceptions, typing and init of OpenAI model code	2023-01-08 00:36:18 -03:00
Debanjum Singh Solanky	c0ae8eee99	Allow using OpenAI models for search in Khoj - Init processor before search to instantiate `openai_api_key' from `khoj.yml'. The key is used to configure search with openai models - To use OpenAI models for search in Khoj - Set `encoder' to name of an OpenAI model. E.g text-embedding-ada-002 - Set `encoder-type' in `khoj.yml' to `src.utils.models.OpenAI' - Set `model-directory' to `null', as online model cannot be stored on disk	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	826f9dc054	Drop long words from compiled entries to be within max token limit of models Long words (>500 characters) provide less useful context to models. Dropping very long words allow models to create better embeddings by passing more of the useful context from the entry to the model	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	6a30a13326	Only create model directory if the optional field is set in SearchConfig	2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky	2fe37a090f	Make type of encoder to use for embeddings configurable via khoj.yml - Previously `model_type' was set in the setup of each `search_type' - All encoders were of type `SentenceTransformer' - All cross_encoders were of type `CrossEncoder' - Now `encoder-type' can be configured via the new `encoder_type' field in `TextSearchConfig' under `search-type` in `khoj.yml`. - All the specified `encoder-type' class needs is an `encode' method that takes entries and returns embedding vectors	2023-01-07 23:09:12 -03:00
Debanjum Singh Solanky	d55d7d53dc	Fix GPU usage by Khoj on Macs to speed up search and indexing - Ensure all tensors are on MPS device before doing operations across them - Background - GPU is used by default for Khoj on MacOS now - Needed PyTorch > 1.13.0 on Macs to use GPU, which we do now - MPS should speed up search and indexing on MacOS	2023-01-05 15:39:09 -03:00
Debanjum	abd035e2fa	Merge PR #112 to fix quote usage in khoj.el docstring from suliveevil/master Fix usage warning for unescaped single quote in `khoj.el' docstring. Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs	2023-01-05 13:24:11 -03:00
Debanjum Singh Solanky	e792523849	Bump version in metadata packages for khoj, khoj.el and obsidian plugin	2023-01-05 12:50:27 -03:00
suliveevil	b2812b409f	fix docstring usage warning ⛔ Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting) ⛔ Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)	2023-01-05 16:47:38 +08:00
Debanjum Singh Solanky	47015ee6cc	Fold Demo video descriptions, analysis by default in main Readme	2023-01-04 20:13:43 -03:00
Debanjum Singh Solanky	da17ff6ac8	Add Upgrade instructions for Khoj.el Readme. Fix version of khoj.el	2023-01-04 20:06:39 -03:00
Debanjum Singh Solanky	66ccd0c970	Create Obsidian plugin for Khoj - Features - Search using Khoj from within the Obsidian app Allow Natural language search on your (markdown) notes in Obsidian Vault - Show search results as rendered (instead of raw) Markdown Improve legibility of the results - Jump to selected note from search result in Khoj search modal Simplify seeing result within its original note context - Automatically configure khoj to index markdown files in current vault Reduce khoj setup steps for plugin users by using reasonable defaults - Code updates the markdown config in khoj.yml and triggers index update - It can be configured by user in khoj plugin settings, if required - Add Demo and detailed Readme for the Obsidian plugin Ease setup and usage. Give context about capabilities - Miscellaneous - Trying keep a mono repo until the Khoj project is mature enough to reduce maintainance burden	2023-01-04 18:28:16 -03:00
Debanjum Singh Solanky	feddb6ce62	Add start_url to khoj webmanifest to show Khoj as PWA on Chrome	2023-01-04 13:37:56 -03:00
Debanjum Singh Solanky	3dee1aed9e	Create /config/data/default API endpoint to serve default khoj config This can ease configuring khoj from the different interfaces - Don't need to know all the (default) config used by khoj. - Just get default config by calling the above API endpoint. - Then modify desired portions and call POST /api/config/data to configure khoj.	2023-01-03 21:52:34 -03:00
Debanjum Singh Solanky	ce945f7a90	Configure processors too on calling /update API - Previously only search was being reconfigured - But Processors are configured on app start too - Match that behavior on calling /update API	2023-01-03 21:51:02 -03:00
Debanjum Singh Solanky	9d31988f42	Allow starting khoj in non-GUI mode without config file instantiated - Start khoj server (in non-GUI mode) without needing config file already instantiated. - But throw warning to configure khoj to use it - This allows plugins to configure the app via the /config/data APIs - To be used by the Khoj obsidian plugin to configure markdown content in khoj	2023-01-03 21:36:59 -03:00
Debanjum Singh Solanky	52664dd96c	Allow recursive glob pattern (**) to add files to search index - Simplify configuring files to index For Obsidian/Org-Roam type systems with lots of small files in khoj.yml using `input-filter'	2023-01-03 01:32:58 -03:00
Debanjum Singh Solanky	152e5f1661	Return the file of each search result in response - Useful for enabling jump to note functionality in interfaces - It will be used in the Khoj plugin for Obsidian	2023-01-03 01:25:34 -03:00
Debanjum Singh Solanky	c535953915	Update index automatically in non GUI mode too - Poll scheduler every minute using threading.Timer - Use 60 seconds polling interval to avoid fork bombing - Schedule next via the same poll scheduler - Allow clean program interrupt by running scheduler in daemon mode	2023-01-01 21:03:19 -03:00
Debanjum Singh Solanky	701d92e17b	Lock the index before updating it via API or Scheduler - There are 3 paths to updating/setting the index (stored in state.model) - App start - API - Scheduler - Put all updates to the index behind a lock. As multiple updates path that could (potentially) run at the same time (via API or Scheduler)	2023-01-01 17:09:36 -03:00
Debanjum Singh Solanky	3b0783aab9	Automate updating embeddings, search index on a hourly schedule - Use the schedule pypi package - Use QTimer to poll schedule.run_pending() regularly for jobs to run	2023-01-01 17:09:36 -03:00
Debanjum	06c25682c9	Split text entries by max tokens supported by ML models ### Background There is a limit to the maximum input tokens (words) that an ML model can encode into an embedding vector. For the models used for text search in khoj, a max token size of 256 words is appropriate [1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1#:~:text=model%20was%20just%20trained%20on%20input%20text%20up%20to%20250%20word%20pieces),[2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#:~:text=input%20text%20longer%20than%20256%20word%20pieces%20is%20truncated) ### Issue Until now entries exceeding max token size would silently get truncated during embedding generation. So the truncated portion of the entries would be ignored when matching queries with entries This would degrade the quality of the results ### Fix - `e057c8e` Add method to split entries by specified max tokens limit - Split entries by max tokens while converting [Org](https://github.com/debanjum/khoj/commit/c79919b), [Markdown](https://github.com/debanjum/khoj/commit/f209e30) and [Beancount](https://github.com/debanjum/khoj/commit/17fa123) entries to JSONL - `b283650` Deduplicate results for user query by raw text before returning results ### Results - The quality of the search results should improve - Relevant, long entries should show up in results more often	2022-12-26 18:23:43 +00:00
Debanjum Singh Solanky	17fa123b4e	Split entries by max tokens while converting Beancount entries To JSONL	2022-12-26 15:14:32 -03:00
Debanjum Singh Solanky	f209e30a3b	Split entries by max tokens while converting Markdown entries To JSONL	2022-12-26 13:14:15 -03:00
Debanjum Singh Solanky	24676f95d8	Fix comments, use minimal test case, regenerate test index, merge debug logs - Remove property drawer from test entry for max_words splitting test - Property drawer is not required for the test - Keep minimal test case to reduce chance for confusion	2022-12-25 22:33:04 -03:00
Debanjum Singh Solanky	b283650991	Deduplicate results for user query by raw text before returning results - Required because entries are now split by the max_word count supported by the ML models - This would now result in potentially duplicate hits, entries being returned to user - Do deduplication after ranking to get the top ranked deduplicated results	2022-12-25 21:36:15 -03:00
Debanjum Singh Solanky	53cd2e5605	Regenerate initial model in asymmetric reload test to reduce flakyness - Fix logger message when converting org node to entries - Remove unused import from conftest	2022-12-25 21:36:15 -03:00
Debanjum Singh Solanky	c79919bd68	Split entries by max tokens while converting Org entries To JSONL - Test usage the entry splitting by max tokens in text search	2022-12-25 21:36:00 -03:00
Debanjum Singh Solanky	08dc5e3324	Update instructions in khoj.el to install it from MELPA stable - The instructions suggest installing khoj-assistant via pip install. This installs the latest tagged/release version of khoj - To match that version user should install khoj.el from MELPA stable instead of MELPA	2022-12-23 19:08:38 -03:00
Debanjum Singh Solanky	e057c8e208	Add method to split entries by specified max tokens limit - Issue ML Models truncate entries exceeding some max token limit. This lowers the quality of search results - Fix Split entries by max tokens before indexing. This should improve searching for content in longer entries. - Miscellaneous - Test method to split entries by max tokens	2022-12-23 16:24:04 -03:00
Debanjum Singh Solanky	d3e175370f	Update readme to install khoj.el from MELPA stable unless using pre-release khoj Update readme to ask user to install khoj.el from MELPA when a pre-release version of the main khoj app is installed. Else install khoj.el from MELPA Stable	2022-12-20 23:29:22 -03:00
Debanjum Singh Solanky	cd463c5085	Update Khoj.el Install Instructions on Emacs	2022-12-20 11:06:33 -03:00
Debanjum Singh Solanky	23ca5a2d43	Improve (un-)quoting of funcs used in `khoj--get-enabled-content-types' - Based on melpa package feedback for khoj.el - Verified these changes don't affect behavior of the function	2022-12-19 18:02:23 -03:00
Debanjum Singh Solanky	5db3a67df5	Fix Khoj Emacs package URL in khoj.el	2022-12-14 22:49:19 -03:00
Debanjum Singh Solanky	abad6d5f44	Declare external khoj.el funcs. Remove undefined func warnings on install	2022-12-14 22:36:04 -03:00
Debanjum Singh Solanky	c52383b11c	Delete stale, unused installation helper script	2022-12-03 13:36:47 -03:00
Debanjum Singh Solanky	1990d09032	Bump khoj version in setup.py, khoj.el to 0.2.0	2022-12-02 14:58:54 -03:00
Debanjum Singh Solanky	a9cfd8b800	Extract hash func for incremental text indexing into separate method	2022-10-26 13:56:58 +05:30
Debanjum Singh Solanky	0de2ff9c97	Add __init__.py to routers directory to register it as a package	2022-10-25 20:40:40 +05:30
Debanjum Singh Solanky	55d2fea9be	Move Custom Formatter class for logger to util.helper module from main.py	2022-10-20 00:32:24 +05:30
Debanjum Singh Solanky	1c40f97114	Merge branch 'master' of github.com:debanjum/khoj into modularize-api-and-increase-typing - Conflicts: - src/interface/emacs/khoj.el Use our update to `config-url', use their `url-request-method'	2022-10-19 16:46:53 +05:30
Debanjum Singh Solanky	e1b5a87920	Rename Frontend Router to Web Client. Fix logger usage in routers - Use logger in api_beta router instead of print statements - Remove unused logger in web client router	2022-10-19 16:36:48 +05:30
Debanjum	4abd51cb04	Merge pull request #99 from telotortium/method Explicitly set `url-request-method' to GET in khoj.el	2022-10-19 10:31:37 +00:00
Debanjum Singh Solanky	c467df8fa3	Setup `mypy' for static type checking	2022-10-08 17:33:13 +03:00
Debanjum Singh Solanky	d292bdcc11	Do not version API. Premature given current state of the codebase - Reason - All clients that currently consume the API are part of Khoj - Any breaking API changes will be fixed in clients immediately - So decoupling client from API is not required - This removes the burden of maintaining muliple versions of the API	2022-10-08 16:32:46 +03:00
Debanjum Singh Solanky	7e9298f315	Use new Text Entry class to track text entries in Intermediate Format - Context - The app maintains all text content in a standard, intermediate format - The intermediate format was loaded, passed around as a dictionary for easier, faster updates to the intermediate format schema initially - The intermediate format is reasonably stable now, given it's usage by all 3 text content types currently implemented - Changes - Concretize text entries into `Entries' class instead of using dictionaries - Code is updated to load, pass around entries as `Entries' objects instead of as dictionaries - `text_search' and `text_to_jsonl' methods are annotated with type hints for the new `Entries' type - Code and Tests referencing entries are updated to use class style access patterns instead of the previous dictionary access patterns - Move `mark_entries_for_update' method into `TextToJsonl' base class - This is a more natural location for the method as it is only (to be) used by `text_to_jsonl' classes - Avoid circular reference issues on importing `Entries' class	2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky	99754970ab	Type the /search API response to better document the response schema - Both Text, Image Search were already giving list of entry, score - This change just concretizes this change and exposes this in the API documentation (i.e OpenAPI, Swagger, Redocs)	2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky	0521ea10d6	Put image score breakdown under `additional' field in search response - Update web, emacs interfaces to consume the scores from new schema	2022-10-08 12:06:01 +03:00
Debanjum Singh Solanky	e42a38e825	Version Khoj API, Update frontends, tests and docs to reflect it - Split router.py into v1.0, beta and frontend (no-prefix) api modules under new router package. Version tag in main.py via prefix - Update frontends to use the versioned api endpoints - Update tests to work with versioned api endpoints - Update docs to mentioned, reference only versioned api endpoints	2022-09-28 20:08:38 +03:00
Robert Irelan	d25e1d8e86	fix: explicitly set url-request-method In my installation, it appears that `url-request-method` is sometimes set globally to POST. Need to explicitly set it to ensure that GET is always used as intended.	2022-09-19 15:46:46 -04:00
Debanjum Singh Solanky	ee65a4f2c7	Merge /reload, /regenerate into single /update API endpoint - Pass force=true to /update API to force regenerating index from scratch - Otherwise calls to the /update API endpoint will result in an incremental update to index	2022-09-16 00:53:19 +03:00
Debanjum Singh Solanky	02d944030f	Use Base TextToJsonl class to standardize <text>_to_jsonl processors - Start standardizing implementation of the `text_to_jsonl' processors - `text_to_jsonl; scripts already had a shared structure - This change starts to codify that implicit structure - Benefits - Ease adding more `text_to_jsonl; processors - Allow merging shared functionality - Help with type hinting - Drawbacks - Lower agility to change. But this was already an implicit issue as the text_to_jsonl processors got more deeply wired into the app	2022-09-16 00:53:11 +03:00
Debanjum Singh Solanky	c16ae9e344	Ignore "Legacy way to download model" warning for upstream dependency	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	3169e3b78e	Use ellipsis instead of pass in base filter abstract methods for aesthetic	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	bf1ae038cb	Get XMP metadata from image using Pillow. Remove ExifTool dependency - Pillow already supports reading XMP metadata from Images - Removes need to maintain my fork of unmaintained PyExiftool - This also removes dependency on system Exiftool package for XMP metadata extraction - Add test to verify XMP metadata extracted from test images - Remove references to Exiftool from Documentation	2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky	8f57a62675	Remove unused imports. Fix typing and indentation - Typing issues discovered using `mypy'. Fixed manually - Unused imports discovered and fixed using `autoflake' - Fix indentation in `org_to_jsonl' manually	2022-09-14 04:56:52 +03:00
Debanjum Singh Solanky	be57c711fd	Revert OrgNode.hasTag func to method instead of property as accepts argument	2022-09-14 04:56:48 +03:00
Debanjum Singh Solanky	0109c7bd91	Disable ability to call <text>_to_jsonl, <type>_search packages directly - This code is de-synced with expected args by above scripts - Better to remove unused capabilitity that needlessly increases maintainance burden	2022-09-14 04:56:48 +03:00
Debanjum Singh Solanky	1680a617da	Reflect updates to query and results count in URL - Simplify tracking khoj query history, saving/sharing links - Do not execute search, when query only contains whitespaces - Prevents error when try process results of empty query	2022-09-13 23:39:24 +03:00
Debanjum Singh Solanky	34314e859a	Call /reload instead of /regenerate API to update index from web interface - As `/reload` updates index incrementally, it's relatively quick - This makes exposing `/reload` endpoint a better default to expose via the web interface than `the /regenerate' endpoint	2022-09-12 23:39:10 +03:00
Debanjum Singh Solanky	13b5d5082f	Create input field to set results count on the web interface Resolves #96	2022-09-12 23:24:46 +03:00
Debanjum Singh Solanky	1bfe9c4ef2	Handle filter only queries. Short-circuit and return filtered results - For queries with only filters in them short-circuit and return filtered results. No need to run semantic search, re-ranking. - Add client test for filter only query and quote query in client tests	2022-09-12 17:13:05 +03:00
Debanjum Singh Solanky	afc84de234	Make word filter regex explicit. Allow hyphen in word filters Helps with #88	2022-09-12 17:05:29 +03:00
Debanjum Singh Solanky	536f03af8f	Process text content files in sorted order for stable indexing - Image search already uses a sorted list of images to process - Prevents index of entries to desync when entries, embeddings generated by a separate server/app instance	2022-09-12 11:09:40 +03:00
Debanjum Singh Solanky	a701ad08b9	Support multiple input-filters to configure content to index via khoj.yml - Update existings code, tests to process input-filters as list instead of str - Test `text_to_jsonl' get files methods to work with combination of `input-files' and `input-filters' Resolves #84	2022-09-12 11:08:59 +03:00
Debanjum Singh Solanky	940c8fac8c	Use app LRU, not functools LRU decorator, to cache search results in router - Provides more control to invalidate cache on update to entries, embeddings - Allows logging when results are being returned from cache etc - FastAPI, Swagger API docs look better as the `search' controller not wrapped in generically named function when using functools LRU decorator	2022-09-12 09:38:48 +03:00
Debanjum Singh Solanky	c6fa09d8fc	Fix querying with include word filter from web interface - Not encoding the `query' string before querying the backend API with it was causing the "+" prefix for include word filter to be lost	2022-09-12 09:27:02 +03:00
Debanjum Singh Solanky	1502fbc9e9	Add index_heading_entries flag to default and sample khoj configs	2022-09-11 17:33:37 +03:00
Debanjum Singh Solanky	7216cdff58	Add Date, Word filter for Org-Music content	2022-09-11 17:29:34 +03:00
Debanjum Singh Solanky	9d369ae4df	Fix OrgNode render of entries with property drawers and empty body - Issue - Indent regex was previously catching escape sequences like newlines - This was resulting in entries with only escape sequences in body to be prepended to property drawers etc during rendering - Fix - Update indent regex to only look for spaces in each line - Only render body when body contains non-escape characters - Create test to prevent this regression from silently resurfacing	2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky	253c9eae9a	Set index_heading_entries field in config to index entries with no body - Previously heading entries were not indexed to maintain search quality - But given that there are use-cases for indexing entries with no body - Add a configurable `index_heading_entries' field to index heading entries - This `TextContentConfig' field is currently only used for OrgMode content	2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky	1d3b3d5f39	Convert field get/set methods in OrgNode class to @property - Use more descriptive variable names in OrgNode parser and class - Convert OrgNode fields to private/protected, use property methods to get/set them	2022-09-11 14:59:28 +03:00
Debanjum Singh Solanky	db37e38df7	Create OrgNode hasBody method. Use it in org_to_jsonl checks	2022-09-11 12:50:03 +03:00
Debanjum Singh Solanky	b4878d76ea	Extract entries from scratch when regenerate requested - Do not rely on previously extracted entries to find new entries in regenerate scenario	2022-09-11 12:50:03 +03:00
Debanjum Singh Solanky	52e3dd9835	Pass the whole TextContentConfig as argument to text_to_jsonl methods - Let the specific text_to_jsonl method decide which of the TextContentConfig fields it needs to convert <text> type to jsonl - This simplifies extending TextContentConfig for a specific type without modifying all text_to_jsonl methods - It keeps the number of args being passed to the `text_to_jsonl' methods in check	2022-09-11 12:49:56 +03:00
Debanjum Singh Solanky	e951ba37ad	Raise exception when org file not found - No need to catch the IOError in OrgNode	2022-09-11 01:09:24 +03:00
Debanjum Singh Solanky	2e1bbe0cac	Fix striping empty escape sequences from strings - Fix log message on jsonl write	2022-09-10 23:57:05 +03:00
Debanjum Singh Solanky	a7cf6c8458	Use dictionary instead of list to track entry to file maps	2022-09-10 23:08:30 +03:00
Debanjum Singh Solanky	3e1323971b	Stack function calls in jsonl converters to avoid unneeded variables	2022-09-10 22:56:06 +03:00
Debanjum Singh Solanky	4eb84c7f51	Log performance metrics for beancount, markdown to jsonl conversion	2022-09-10 22:47:54 +03:00
Debanjum Singh Solanky	ebd5039bd1	Merge branch 'master' into support-incremental-updates-of-embeddings	2022-09-10 22:37:13 +03:00
Debanjum Singh Solanky	030fab9bb2	Support incremental update of Markdown entries, embeddings	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	91aac83c6a	Support incremental update of Beancount transactions, embeddings	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	b01b4d7daa	Extract logic to mark entries for embeddings update into helper function - This could be re-used by other text_to_jsonl converters like markdown, beancount	2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky	f97308bef2	Fix log message on writing JSONL data to file	2022-09-10 21:40:08 +03:00
Debanjum Singh Solanky	c17a0fd05b	Do not store word filters index to file. Not necessary for now - It's more of a hassle to not let word filter go stale on entry updates - Generating index on 120K lines of notes takes 1s. Loading from file takes 0.2s. For less content load time difference will be even smaller - Let go of startup time improvement for simplicity for now	2022-09-10 21:01:54 +03:00
Debanjum Singh Solanky	91d11ccb49	Only hash compiled entry to identify new/updated entries to update - Comparing compiled entries is the appropriately narrow target to identify entries that need to encode their embedding vectors. Given we pass the compiled form of the entry to the model for encoding - Hashing the whole entry along with it's raw form was resulting in a bunch of entries being marked for updated as LINE: <entry_line_no> is a string added to each entries raw format. - This results in an update to a single entry resulting in all entries below it in the file being marked for update (as all their line numbers have changed) - Log performance metrics for steps to convert org entries to jsonl	2022-09-10 21:01:44 +03:00
Debanjum Singh Solanky	b9a6e80629	Make OrgNode tags stable sorted to find new entries for incremental updates - Having Tags as sets was returning them in a different order everytime - This resulted in spuriously identifying existing entries as new because their tags ordering changed - Converting tags to list fixes the issue and identifies updated new entries for incremental update correctly	2022-09-10 20:59:52 +03:00
Debanjum Singh Solanky	2f7a6af56a	Support incremental update of org-mode entries and embeddings - What - Hash the entries and compare to find new/updated entries - Reuse embeddings encoded for existing entries - Only encode embeddings for updated or new entries - Merge the existing and new entries and embeddings to get the updated entries, embeddings - Why - Given most note text entries are expected to be unchanged across time. Reusing their earlier encoded embeddings should significantly speed up embeddings updates - Previously we were regenerating embeddings for all entries, even if they had existed in previous runs	2022-09-10 20:58:33 +03:00
Debanjum Singh Solanky	ec675d27d3	Suppress non-actionable HuggingFace FutureWarning shown on app start	2022-09-10 16:43:14 +03:00
Debanjum Singh Solanky	1ac6a71ff0	Add --version flag to show installed version of khoj	2022-09-10 16:40:19 +03:00
Debanjum Singh Solanky	976397bd82	Ignore empty #+TITLE, merge multiple #+TITLE for 0th level headings	2022-09-10 15:34:47 +03:00
Debanjum Singh Solanky	11917c6ddd	Do not normalize absolute filenames for creating links in OrgNode	2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky	07b98d35f1	Use filename or #+TITLE as heading for 0th level content in org files - Set LINE, SOURCE link properties in property drawer correctly for content which falls under no heading - See Issue #83 for more details	2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky	d6bd7bf3e1	Fix initializing OrgNode level to string to parse org files - Parsed `level` argument passed to OrgNode during init is expected to be a string, not an integer - This was resulting in app failure only when parsing org files with no headings, like in issue #83, as level is set to string of `*`s the moment a heading is found in the current file	2022-09-10 14:21:08 +03:00
Debanjum Singh Solanky	d835467f2c	Throw exception if no valid entries found in specified content files - Previously we were failing if no valid entries while computing embeddings. This was obscuring the actual issue of no valid entries found in the specified content files - Throwing an exception early with clear message when no entries found should make clarify the issue to be fixed - See issue #83 for details	2022-09-10 14:20:10 +03:00
Debanjum Singh Solanky	e00bb53336	Init word filter dictionary with default value as set to simplify code	2022-09-10 12:19:09 +03:00
Debanjum Singh Solanky	4d776d9c7a	Bump khoj version to 0.1.9	2022-09-09 07:50:15 +03:00
Debanjum Singh Solanky	588f598949	Pass empty list of `input_files' to FileBrowser on first run - Default config has `input_files' set to None - This was being passed to `FileBrowser' on Initialization - But `FileBrowser' expects `content_files' of list type, not None - This resulted in an unexpected NoneType failure	2022-09-09 07:26:40 +03:00
Debanjum Singh Solanky	3ddffdfba4	Create config directory before setting up logging to file under it - The logging to file code expects the config directory to already be setup - But parent directory of config file was being set up later in code - This resulted in app start failing with ~/.khoj dir does not exist error	2022-09-09 07:21:42 +03:00
Debanjum Singh Solanky	762607fc9f	Log processed entries by org_to_jsonl only if verbosity > 2 Output too verbose for even debug mode logging. So gated behind -vvv	2022-09-06 23:03:29 +03:00
Debanjum Singh Solanky	490157cafa	Setup File Filter for Markdown and Ledger content types - Pass file associated with entries in markdown, beancount to json converters - Add File, Word, Date Filters to Ledger, Markdown Types - Word, Date Filters were accidently removed from the above types yesterday - File Filter is the only filter that newly got added	2022-09-06 15:31:26 +03:00
Debanjum Singh Solanky	94cf3e97f3	Log app logs to file for posthoc debugging and performance analysis	2022-09-06 14:51:48 +03:00
Debanjum Singh Solanky	3707a4cdd4	Improve date filter perf. Precompute date to entry map, Cache results - Precompute date to entry map - Cache results for faster recall - Log preformance timers in date filter	2022-09-05 18:21:29 +03:00
Debanjum Singh Solanky	31503e7afd	Do not pass embeddings as argument to filter.apply method	2022-09-05 15:46:54 +03:00
Debanjum Singh Solanky	965bd052f1	Make search filters return entry ids satisfying filter - Filter entries, embeddings by ids satisfying all filters in query func, after each filter has returned entry ids satisfying their individual acceptance criteria - Previously each filter would return a filtered list of entries. Each filter would be applied on entries filtered by previous filters. This made the filtering order dependent - Benefits - Filters can be applied independent of their order of execution - Precomputed indexes for each filter is not in danger of running into index out of bound errors, as filters run on original entries instead of on entries filtered by filters that have run before it - Extract entries satisfying filter only once instead of doing this for each filter - Costs - Each filter has to process all entries even if previous filters may have already marked them as non-satisfactory	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	7dd20d764c	Pre-compute file to entry map in file filter to mark ids to include faster	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	2890b4cd44	Simplify extracting entries satisfying file filter	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	7606724dbc	Add file of each entry to entry dict in org_to_jsonl converter - This will help filter query to org content type using file filter - Do not explicitly specify items being extracted from json of each entry in text_search as all text search content types do not have file being set in jsonl converters	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	7e083d3e96	Cache results for file filters passed in query for faster filtering	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	f634399f23	Convert simple file filters with no path separator into regex - Specify just file name to get all notes associated with file at path - E.g `query` with `file:"file1.org"` will return `entry1` if `entry1` is in `file1.org` at `~/notes/file.org` - Test - Test converting simple file name filter to regex for path match - Test file filter with space in file name	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	092b9e329d	Setup Filters when configuring Text Search for each Search Type - Allows enabling different filters for different Text Search Types - Use FileFilter in Text Search on Org Files	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	1f9fd28b34	Create File Filter to filter files to query. Add tests for file filter	2022-09-05 01:09:20 +03:00
Debanjum Singh Solanky	e4418746f2	Create Abstract Base Class for Filters. Make Word, Date Filter Child of BaseFilter	2022-09-04 18:48:16 +03:00
Debanjum Singh Solanky	f930324350	Rename explicit filter to word filter to be more specific	2022-09-04 17:18:47 +03:00
Debanjum Singh Solanky	6087862521	Use LRU helper class for explicit filter cache	2022-09-04 16:42:28 +03:00
Debanjum Singh Solanky	8f3326c8d4	Create LRU helper class for caching	2022-09-04 16:31:46 +03:00
Debanjum Singh Solanky	191a656ed7	Use word to entry map, list comprehension to speed up explicit filter - Code Changes - Use list comprehension and `torch.index_select' methods - to speed selection of entries, embedding tensors satisfying filter - avoid deep copy of entries, embeddings - avoid updating existing lists (of entries, embeddings) - Use word to entry map and set operations to mark entries satisfying inclusion, exclusion filters - Results - Speed up explicit filtering by two orders of magnitude - Improve consistency of speed up across inclusion and exclusion filtering	2022-09-04 15:22:35 +03:00
Debanjum Singh Solanky	28d3dc1434	Deep copy entries, embeddings in filters. Defer till actual filtering - Only the filter knows when entries, embeddings are to be manipulated. So move the responsibility to deep copy before manipulating entries, embeddings to the filters - Create deep copy in filters. Avoids creating deep copy of entries, embeddings when filter results are being loaded from cache etc	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	3308e68edf	Cache explicitly filtered entries, embeddings by required, blocked words	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	cdcee89ae5	Wrap words in quotes to trigger explicit filter from query - Do not run the more expensive explicit filter until the word to be filtered is completed by user. This requires an end sequence marker to identify end of explicit word filter to trigger filtering - Space isn't a good enough delimiter as the explicit filter could be at the end of the query in which case no space	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	8d9f507df3	Load entries_by_word_set from file only once on first load of explicit filter	2022-09-04 00:37:37 +03:00
Debanjum Singh Solanky	858d86075b	Use regexes to check if any explicit filters in query. Test can_filter	2022-09-03 23:47:28 +03:00
Debanjum Singh Solanky	546fad570d	Use regex to extract include, exclude filter words from query	2022-09-03 23:41:43 +03:00
Debanjum Singh Solanky	ffb8e3988e	Use Python Logging Framework to Time Performance of Explicit Filter	2022-09-03 22:24:10 +03:00
Debanjum Singh Solanky	c7de57b8ea	Pre-compute entry word sets to improve explicit filter query performance	2022-09-03 16:16:31 +03:00
Debanjum Singh Solanky	094bd18e57	Use python standard logging framework for app logs - Stop passing verbose flag around app methods - Minor remap of verbosity levels to match python logging framework levels - verbose = 0 maps to logging.WARN - verbose = 1 maps to logging.INFO - verbose >=2 maps to logging.DEBUG - Minor clean-up of app: unused modules, conversation file opening	2022-09-03 14:43:32 +03:00
Debanjum Singh Solanky	d0531c3064	Update URL QueryParam when Type set in Dropdown on Web Interface - This also pushes the updated URL state to history - Allows jumping back to the web interface after clicking on an image and having the type set to image search - Previously type would get reset to the default search type on jumping back	2022-08-28 12:22:22 +03:00
Debanjum Singh Solanky	2eae32d743	Time, Log Image Search Performance	2022-08-28 00:28:46 +03:00
Debanjum Singh Solanky	c3ca99841b	Scale down images to generate image embeddings faster, with less memory - CLIP doesn't need full size images for generating embeddings with decent search results. The sentence transformers docs use images scaled to 640px width - Benefits - Normalize image sizes - Increase image embeddings generation speed - Decrease memory usage while generating embeddings from images	2022-08-24 14:09:02 +03:00
Debanjum Singh Solanky	ea4fdd9134	Fix logic to ignore notes with no body. Add tests to prevent regression - Notes with empty newlines in body were not being ignored - Add regression tests to avoid above regression in org_to_jsonl conversion	2022-08-21 19:41:40 +03:00
Debanjum	144986ebfd	Fix, Improve Desktop GUI Splash Screen and Main Window - `5e6625a` Fix file browser to not add empty line when no file/dir selected - `8098b8c` Bring main window to Top when open from System Tray - `1c122a8` Place window near top so buttons are not hidden by OS bottom bar - `dfe2546` Set Khoj Icon on Main Desktop Window - `1b1f8f9` Move Splash screen text below icon. Set the text color to black - `450f644` Fix path to remove shared libraries when packaging the Windows app	2022-08-20 23:19:01 +00:00
Debanjum Singh Solanky	5e6625ac68	Fix file browser to not add empty line when no file/dir selected - When no file selected in file browser an empty line/entry gets added to input entries list - Bug got introduced due to insufficient update on change to add instead of insert - Update is_none_or_empty helper method to also check for empty string	2022-08-21 02:03:28 +03:00
Debanjum Singh Solanky	8098b8c3a8	Bring Configure Window to Top when Opened from System Tray - Previously the window could get hidden behind other app windows when user clicked configure from the system tray	2022-08-20 23:38:43 +03:00
Debanjum Singh Solanky	1c122a8a91	Place window near top so buttons are not hidden by OS bottom bar	2022-08-20 22:38:06 +03:00
Debanjum Singh Solanky	dfe2546c04	Set Khoj Icon on Main Desktop Window	2022-08-20 20:36:15 +03:00
Debanjum Singh Solanky	82d2891765	Do not pass ML compute `device' around as argument to search funcs - It is a non-user configurable, app state that is set on app start - Reduce passing unneeded arguments around. Just set device where required by looking for ML compute device in global state	2022-08-20 14:44:53 +03:00
Debanjum Singh Solanky	acc9091260	Use MPS on Apple Mac M1 to GPU accelerate Encode, Query Performance - Note: Support for MPS in Pytorch is currently in v1.13.0 nightly builds - Users will have to wait for PyTorch MPS support to land in stable builds - Until then the code can be tweaked and tested to make use of the GPU acceleration on newer Macs	2022-08-20 14:44:06 +03:00
Debanjum Singh Solanky	7de9c58a1c	Load models, corpus embeddings onto GPU device for text search, if available - Pass device to load models onto from app state. - SentenceTransformer models accept device to load models onto during initialization - Pass device to load corpus embeddings onto from app state	2022-08-20 14:04:18 +03:00
Debanjum Singh Solanky	dc8dcc94a6	Bump Khoj.el package version to 0.1.6	2022-08-19 20:48:42 +03:00
Debanjum Singh Solanky	ffbf15eff8	Add helper function to identify when app running as pyinstaller app Useful for when want the app to behave differently in pyinstaller app scenario with frozen python. And in development scenarios	2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky	6c5c1c33c1	Turn off Tokenizers Parallelism. Khoj doesn't support it right now - Forking and multiprocess are problemantic in frozen python scenarios. This will cause issues when running App packaged by pyinstaller	2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky	d4072974d7	Use of XMP metadata in Khoj Image Search is broken. Disable by default - CLIP Image score and XMP metadata score are not combining well. When combined they give non sensical results. Enable only once figure how best to combine the two. - Show scores with higher precision for image search - Image search scores seem to be mostly be between 0.2 - 0.3 for some reason - Higher precision scores make it easier to understand the quality of returned results perceived by the model itself	2022-08-19 19:17:28 +03:00
Debanjum Singh Solanky	7c4417126c	Append files, directories selected by user to config in Desktop GUI - Allows adding multiple image directories via GUI - Allow adding multiple files in different directories via GUI - Previously users couldn't add multiple directories via GUI They'd have to manually append to input field if multiple files, directories - To clear/overwrite is much easier. The user can just select text to delete in input area	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	00ddcfdac8	Use .ico icon when packaging for Windows (and Linux) using Pynstaller	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	60dacf3f2c	Show splash screen on app start. Only supported on Windows, Linux	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	0079c13bf7	Set input-directories in config for image search type on Desktop GUI - Issue Fix configuring image search from Desktop GUI. It was broken before. The Desktop GUI was updating input-files field under content-type > image. This field is not used for image search. So image search couldn't be configured from the Desktop GUI - Fix - Set input-directories when field of search type image is set from GUI - Otherwise set input-files field in config	2022-08-18 18:29:55 +03:00
Debanjum Singh Solanky	c4fd661909	Move the experimental /chat API to under /beta/chat	2022-08-16 16:36:15 +03:00
Debanjum Singh Solanky	b8913476ba	Fix if condition in router to trigger markdown search	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	9bc4fd539e	Set Web Interface URL from loaded state in Desktop GUIs. Not hard-coded	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	7f479b0104	Improve Displaying Error to User on Khoj window in Desktop GUI - Show a helpful error message in the GUI to the user, instead of the crashing if loading config fails, for e.g if file wasn't found - Collate GUI errors into an ErrorType enum class - Remove previous error messages before showing the new one	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	873bb9dd97	Do not force the Khoj window to always be on top. It's needlessly annoying	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	67ab40bb01	Regenerate embeddings everytime user clicks configure in Desktop GUI Previously if the embeddings were already there only the khoj.yml config file would get updated. The embeddings would remain old. 1. This results in a stale app state where the config doesn't match the embeddings 2. Currently the user cannot update their config from the config screen. They'd have to use a combination of config screen and web interface>regenerate button to trigger it or delete their ~/.khoj dir This commit should resolve the above issues	2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky	2647e6bab4	Display re-ranked results triggered via keybinding in khoj.el - Prevent immediate overwrite of re-ranked results by incremental-search without rerank triggered via post-command-hook. - This triggers right after the reranking results are rendered, so user never ends up seeing them	2022-08-15 18:41:12 +03:00
Debanjum Singh Solanky	a91d2df300	Simplify Emacs interface to only rerank results on explicit command	2022-08-15 06:20:13 +03:00
Debanjum Singh Solanky	e846829a2e	Reset Khoj.el version to align with Khoj package version	2022-08-15 06:20:13 +03:00
Debanjum Singh Solanky	fed0b591af	Package Khoj as Debian app in Github Release Workflow	2022-08-14 05:07:58 +03:00
Debanjum Singh Solanky	541e03da3d	Make khoj.el pass checkdoc, package-lint, flycheck checks - Add docstrings, mention args in them. Make docstring crisper - prefix funcs, variables with khoj-- - Require emacs >27.1 for json-parse-buffer - Use lexical binding - Add quickstart docs to elisp file itself - Bump version of khoj.el	2022-08-13 21:37:41 +03:00
Debanjum Singh Solanky	3300378804	Minimal formatting to render beancount results legibly on web interface	2022-08-13 05:03:45 +03:00
Debanjum Singh Solanky	a0759dd923	Convert Configure Screen into the Main Application Window - What - Convert the config screen into the main application window with configuration as just one of the functionality it provides - Rename config screen to main window to match new designation - Why - System Tray isn't available everywhere (e.g Linux) - This requires moving functionality into a normal window for cross-compat	2022-08-13 02:05:52 +03:00
Debanjum Singh Solanky	684f497abe	Handle no System Tray on Linux (Gnome) - What - On Linux - Show Configure Screen, even if not first run experience - Do no show system tray on Linux - Quit app on closing Configure Screen - On Windows, Mac - Show Configure screen only if first run experience - Show system tray always - Do not quit app on closing Configure Screen - Why - Configure screen is the only GUI element on Linux. So closing it should close the application - On Windows, Mac the system tray exists, so app should not be closed on closing configure screen	2022-08-13 01:00:20 +03:00
Debanjum Singh Solanky	c2815c5d09	Open Search from Khoj Configure Screen - Start evolving configure screen away from just being a configure screen - Update Window Title to just say Khoj - Allow Opening Web Interface to Search from Khoj configure screen - Rename "Start" Button to more accurate "Configure" - Disable Search button on first run and while configuring app	2022-08-13 00:43:49 +03:00
Debanjum Singh Solanky	28a91ad1fd	Deep copy the default_config constant to prevent it being overwritten - Issue - In the previous form, updates to self.current_config would update default_config as python does a shallow copy - So self.current_config is just referencing the values of default_config - Hence updates to current_config updates the default_config values too - This is not what we want - Fix - Deep copy the default_config values. Now updates to self.current_config wouldn't affect the default_config	2022-08-12 23:54:16 +03:00
Debanjum Singh Solanky	62ac41ce3b	Reload settings in a separate thread to not freeze Config Screen - Generating embeddings takes time - If user enables a content type and clicks start. The app starts to generate embeddings when loading the new settings - Run this function in a separate thread to keep config screen responsive - But disable start button to prevent re-entrant threads - Also show a minimal visual indication that the app is saving state	2022-08-12 23:34:00 +03:00
Debanjum Singh Solanky	927547d0af	Update Title of Configure Screen to follow "<Screen> - App" pattern	2022-08-12 22:53:10 +03:00
Debanjum Singh Solanky	32ac1ea1b6	Allow user to quit application from the terminal via SIGINT Call python interpreter at regular interval to handle any interrupt signals. create custom handler to terminate server and application	2022-08-12 21:11:58 +03:00
Debanjum Singh Solanky	43301d488a	Increase Width of Configure Screen	2022-08-12 18:34:47 +03:00
Debanjum Singh Solanky	9baea9c9fd	Let Input Fields Wrap. Adjust Height based on Text in Field - Convert Input Fields into PlainTextEdit - Display Each Selected File on a Separate Line in Field - Set Height of FileBrowser Input Field based on Number of Lines/Files	2022-08-12 18:33:56 +03:00
Debanjum Singh Solanky	b7b96110e9	Rename FileBrowser Button Text to "Select" instead of "Add"	2022-08-12 17:08:40 +03:00
Debanjum Singh Solanky	a1c58a9470	Create, Use a Labelled Text Field for the Conversation Input Field - This fixes the field expanding when configure screen is expanded - Allows for reusability of the labelled text field - Simplifies the logic to save settings for conversation processor	2022-08-12 16:59:15 +03:00
Debanjum Singh Solanky	fa7e36cada	Rename external .js files to .min.js to mark them as vendored - Excludes from Github language stats. See linguists/vendor.yml for exclusion rules - Signifies them as external for Khoj developers too	2022-08-12 04:08:50 +03:00
Debanjum Singh Solanky	110e3df0b7	Set default config in the constant module. Use from there to configure app - Avoid having to pass the khoj_sample.yml data file into pip, native apps - Packaging data files into python packages is annoying. - There's `MANIFEST.in`, `data_files` and `package_data` in setup.py - Bdist, wheel, generated source tarball use different set of these fields and put the data files in different locations - Rather just code the default config into a constant. Avoid pointless file reads as well this way	2022-08-12 02:18:46 +03:00
Debanjum Singh Solanky	fad2f3a2e7	Resolve config_file to absolute right at start on parsing args in cli - Assume path is absolute in yaml util module while saving, loading file - This follows same convention as jsonl. Which just operates on passed file path, assuming it is of appropriate form. Responsibility to put it in appropriate form is on the caller, for now	2022-08-12 01:34:08 +03:00
Debanjum Singh Solanky	44fe70513a	Handle situation where default config directory or file does not exist - Include khoj_sample.yml in pip package to load default config from - Create khoj config directory if it doesn't exist - Load config from khoj_sample.yml if khoj.yml config doesn't exist	2022-08-12 01:17:34 +03:00
Debanjum Singh Solanky	41520e1608	Improve Docstring for Configure Screen and System Tray class, funcs	2022-08-11 23:36:02 +03:00
Debanjum Singh Solanky	a748acfeeb	Merge branch 'master' of github.com:debanjum/khoj into create-native-gui Conflicts: - src/main.py - router functions have moved to router - move logic to handle null query perf timer variables into router.py - set main.py to current branch, not master	2022-08-11 21:09:42 +03:00
Debanjum Singh Solanky	6af2d6bb6d	Add Flag to Start App without Native GUI	2022-08-11 20:59:57 +03:00
Debanjum Singh Solanky	b74ca1def6	Wrap error message instead of expanding screen to show message	2022-08-11 20:51:56 +03:00
Debanjum Singh Solanky	2646fa825b	Get Files from File input line to match user expectation - If a user manually edits the input file lines, clicking start should use that. Currently it just looks at the files selected last via file browser - We want to allow users to manually enter file paths in field. Which is why the field hasn't been set to read-only	2022-08-11 20:48:45 +03:00
Debanjum Singh Solanky	dad9133598	Split save_settings method into smaller methods for modularization	2022-08-11 20:00:52 +03:00
Debanjum Singh Solanky	56ba91fec8	Remove unused methods in file browser widget. Improve name of existing	2022-08-11 19:46:09 +03:00
Debanjum Singh Solanky	fd4e41495c	Use appropriate label for directory input types to minimize confusion	2022-08-11 19:45:19 +03:00
Debanjum Singh Solanky	c1e1466fb1	Validate new config before write. Show error if new config invalid	2022-08-11 19:18:22 +03:00
Debanjum Singh Solanky	1ff049599f	Show current config on config screen. Load default config if config unset - Track current (saved/loaded) config separate from the new config (to be written) when user clicks Start - Fallback to using default config when no config for the specific content type or processor is specified in khoj.yml - Earlier were only loading default config on first run, not after - Create Child CheckBox, LineEdit classes for Processor Widgets - Create ProcessorType, similar to SearchType - Track ProcessorType the widgets are associated with - Simplify update, save, load of config based on type	2022-08-11 19:11:25 +03:00
Debanjum Singh Solanky	23e06f483d	Do not emit type tags when dumping config YAML to file	2022-08-11 19:08:36 +03:00
Debanjum Singh Solanky	678fb6a3c7	Add Settings Panel for Conversation Settings to Config Screen	2022-08-11 04:52:40 +03:00
Debanjum Singh Solanky	c1fcf44405	Initialize Settings on Config Screen with Existing Settings from File	2022-08-11 04:51:33 +03:00
Debanjum Singh Solanky	3cec6229ad	Hot swap backend config via config screen start button click - Update configuration to use by the backend, while app is running - Trigger after user hits start button with their config. The config gets written to khoj.yml file first, then the updated config is loaded onto memory	2022-08-11 00:32:11 +03:00
Debanjum Singh Solanky	f7fdf8d8ce	Refactor app start to start server even if backend not configured - Decouple configuring backend from starting server. Backend search and processors can be configured after the backend server has started - Set global state in main instead of in configure_server method. This allows the app to start even if configure_server exits early in the first run scenario, where no config available to configure server - Now start server, even if no config, before GUI started in main - This refactor of app startup flow will allow users to configure backend using the configure screen after server start	2022-08-11 00:13:14 +03:00
Debanjum Singh Solanky	34018c7d4b	Store args passed from commandline at app start in global app state	2022-08-11 00:11:35 +03:00
Debanjum Singh Solanky	cc6ef0f450	Save configure screen settings to app config yaml on clicking Start	2022-08-10 23:10:39 +03:00
Debanjum Singh Solanky	dae65c5b6b	Create child class of Qt CheckBox to track search type it enables/disables	2022-08-10 22:44:37 +03:00
Debanjum Singh Solanky	f42f54019b	Type parent_layout passed as arguments to ConfigureScreen methods	2022-08-10 22:43:20 +03:00
Debanjum Singh Solanky	f63f11186f	Pass config file for app to configure screen	2022-08-10 22:42:32 +03:00
Debanjum Singh Solanky	82a7059b6a	Only setup conversation processor if it has configuration set	2022-08-10 22:34:03 +03:00
Debanjum Singh Solanky	9628ca073c	Extract conversation processor from config into separate function - Only pass processor config arg required by configure_processor. Not the unused full config object - Type arguments passed to methods configure processors - Import json for use by conversation processor to load logs	2022-08-10 22:33:33 +03:00
Debanjum Singh Solanky	62eb66b8ca	Rename load_config_from_file to more descriptive parse_config_from_file	2022-08-10 22:28:51 +03:00
Debanjum Singh Solanky	328cc00439	Create global constant to store app root directory	2022-08-10 20:09:03 +03:00
Debanjum Singh Solanky	d2c7b28172	Extract code to load config from YAML file into new utils.yaml module	2022-08-10 20:07:44 +03:00
Debanjum Singh Solanky	150ae19660	Indent Timestamps, Drawers at Body Level in OrgNode Entry Representation	2022-08-10 18:55:37 +03:00
Debanjum Singh Solanky	fd31d339c1	Remove spurious space in Entries without Todo in OrgNode Entry Repr	2022-08-10 13:48:44 +03:00
Debanjum Singh Solanky	eddf88f818	Org buffer customization settings to tail of khoj.el results buffer - Results get priority screen real estate - Allows quick speed key based traversal of results as cursor on switching to buffer is at top level heading - E.g C-x o n n o 2 jumps to entry in actual file of second result - Unlike before when it is at the #+STARTUP org buffer customization settings	2022-08-10 12:57:37 +03:00
Debanjum Singh Solanky	daef276fd1	Add files for each search type. Extract config on clicking start - Only allow adding files with appropriate file extension for each search type - e.g .org for org-mode search, directory for image search - Extract file paths added to config and enablement state of each search type - This extracted state will be used to populate the khoj.yml config file	2022-08-10 03:27:22 +03:00
Debanjum Singh Solanky	d74134e6cc	Reuse Single Method to Create Setting Panels for each Search Type	2022-08-09 23:50:43 +03:00
Debanjum Singh Solanky	509d52e2cd	Toggle Editability instead of Visibility of Per Search Type Settings - Simplifies the configure screen layout and allows it to be of constant width - It was buggy, the configure screen would dynamically expand but not restore back to original size on disabling search type after enable	2022-08-09 23:34:54 +03:00
Debanjum Singh Solanky	3c788f1d29	Rename configure window to more generic configure screen	2022-08-09 22:44:05 +03:00
Debanjum Singh Solanky	c50ab7c3ad	Split config settings GUI into functions. Convert Config Window to Dialog	2022-08-09 22:36:41 +03:00
Debanjum Singh Solanky	664713b24e	Extract Qt GUI code from main.py into separate interface/desktop dir	2022-08-09 22:12:29 +03:00
Debanjum Singh Solanky	84c1fc701d	Fix query timing variables from being referenced before assignment	2022-08-09 21:06:37 +03:00
Debanjum Singh Solanky	57026b802c	Set size of rendered images using user customizable vars	2022-08-09 21:06:37 +03:00
Debanjum Singh Solanky	0a758c9f0f	By default, wait for 2 seconds before initiating rerank in khoj.el - Subjectively, previous default seems to aggressive based on usage Doesn't give time for user to think and type their query	2022-08-09 21:06:30 +03:00
Debanjum Singh Solanky	f01fb16ebb	Use single hyphen in name of user configurable variables in khoj.el - Follow convention, two hyphens indicate variable private to library - Defcustom are user configurable variables. So they should have single - - Use khoj-results-count variable directly in code	2022-08-09 20:49:34 +03:00
Debanjum Singh Solanky	cd59982c9c	Add Qt Button to save Khoj configuration in Khoj Configuration Window	2022-08-09 20:42:44 +03:00
Debanjum Singh Solanky	2c77caf06c	Group ledger, org setting widgets into child Qt widgets of config window	2022-08-09 20:42:44 +03:00
Debanjum Singh Solanky	027da719aa	Open Configure Window on First Run or from System Tray - Trigger FRE if no config loaded. Open Configure Window automatically - Else user can manually open config window from App on System Tray	2022-08-09 17:05:27 +03:00
Debanjum Singh Solanky	a588a8e21f	Make config_file an optional argument. It can be generated on FRE - Make config_file an optional arg. It defaults to default khoj config dir - Return args.config as None if no config_file explicitly passed by user - Parent can use args.config = None as signal to trigger first run experience	2022-08-09 17:02:02 +03:00
Debanjum Singh Solanky	21af122447	Clean up unused methods, module imports. Add comments	2022-08-09 16:59:38 +03:00
Debanjum Singh Solanky	80fa9fde6a	Quit GUI via SysTray instead of sys.exit to cleanly terminate server	2022-08-08 23:49:26 +03:00
Debanjum Singh Solanky	e5691f9d1d	PyInstaller Spec to Wrap Khoj into a Basic Native App - Verified functionality on MacOS - Add ICNS Icon to use as MacOS App Icon - Spec generated by PyInstaller: ```sh pyinstaller \ src/main.py \ --windowed \ --onefile \ --name "Khoj" \ --target-arch arm64 \ -i src/interface/web/assets/icons/favicon.icns \ --add-data "src/interface/web:src/interface/web" \ --copy-metadata tqdm \ --copy-metadata regex \ --copy-metadata requests \ --copy-metadata packaging \ --copy-metadata filelock \ --copy-metadata numpy \ --copy-metadata tokenizers ```	2022-08-08 23:23:02 +03:00
Debanjum Singh Solanky	ef009323e7	Use sys.exit to quit via system tray. Fix pip install cmd in Readme	2022-08-08 21:42:36 +03:00
Debanjum Singh Solanky	eacd95bebd	Start Creating Native Configure Page using PyQt	2022-08-08 18:31:47 +03:00
Debanjum Singh Solanky	dddc57e132	Rename get-enabled-search-types to get-enabled-content-types as more appropriate	2022-08-07 18:53:14 +03:00
Debanjum Singh Solanky	127c6e78df	Only show keybindings for enabled search types in simple info menu too Convert the khoj--keybindings-info-message into a func Dynamically generate info menu Show keybindings for enabled search types only	2022-08-07 18:40:35 +03:00
Debanjum Singh Solanky	d08c25b62b	Make default search type used in the Emacs interface configurable	2022-08-07 18:24:53 +03:00
Debanjum Singh Solanky	5a10c47499	Allow setting music as search type in khoj.el. Had forgotten to include it earlier	2022-08-07 18:24:53 +03:00
Debanjum Singh Solanky	ebee716026	Only show keybindings reference for enabled search types in khoj.el	2022-08-07 18:24:53 +03:00
Debanjum Singh Solanky	6dc9801f45	Get Khoj search-types enabled by user in Emacs	2022-08-07 18:24:53 +03:00
Debanjum Singh Solanky	f3c1512c38	Fix to let user to start enter query right after initiating khoj on emacs - Fix regression since moving to use `which-key-show-full-keymap~ - The above function reads user keypress, so eats up 1 keypress before starting to enter query - No way to pass no-paging config via the external function to the internally used which-key--show-keymap function that does allow setting no-paging to not read user keypress - So use the internal function instead and set no-paging arg to t	2022-08-07 15:57:08 +03:00
Debanjum Singh Solanky	e95686c89c	Show complete Khoj keybindings when initiate search in Emacs - The keybindings to select search types was previously confusing as it only highlighted the final symbol to press (the C-x was shown but it wasn't made apparent that it had to be pressed before) - Previously some keybindings unrelated to khoj were also being shown in the which-key popup. Now only the khoj keybindings are visible	2022-08-06 16:36:57 +03:00
Debanjum Singh Solanky	4696eadc02	Fix definition of khoj--search-<content-type> functions in khoj.el	2022-08-06 15:19:01 +03:00
Debanjum Singh Solanky	c5bf051a29	Rename initialize_{search,processor,server} to configure_{search,procesor,server} - Search is being reconfigured multiple times in /regenerate and n/reload. More appropriate name is configure_ rather than initialize_ for it - Standardize name of methods under configure.py	2022-08-06 03:23:02 +03:00
Debanjum Singh Solanky	7b04978f52	Put global state variables into separate state module - Variables storing app, device state aren't constants. Do not mix with actual constants like empty_escape_sequence, web_directory	2022-08-06 03:13:18 +03:00
Debanjum Singh Solanky	b04c84721b	Extract configure and routers from main.py into separate modules - Main.py was becoming too big to manage. It had both controllers/routers and component configurations (search, processors) in it - Now that the native app GUI code is also getting added to the main path, good time to split/modularize/clean main.py - Put global state into a separate file to share across modules	2022-08-06 02:39:18 +03:00
Debanjum Singh Solanky	083fefdd07	Create Native Menu Bar with PyQt to open Search, Config webpages - Run FastAPI server in a separate thread. - This allows starting both the server and gui in parallel - Create System Tray for Khoj - Contains menu items that open search or config pages in browser - Rearrange code to have only the code required to start Backend and GUI in the run() method - Move the backend setup code into a separate method	2022-08-06 01:00:25 +03:00
Debanjum Singh Solanky	9fa3345000	Show available Khoj keybindings to customize search using which-key Fallback to showing simple khoj keybindings info message in echo area when which-key not available	2022-08-05 20:24:29 +03:00
Debanjum Singh Solanky	6a8b2a6936	Do not run incremental search when query is empty	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	609cd6e8bb	Show keybindings to set khoj search type in echo area to assist user	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	48e4a983c5	Allow switching search type in the middle of querying Khoj on Emacs - More generally, this allows configuring the khoj search anytime while in khoj minibuffer window - Earlier could only configure search type at the start of the search	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	48c33b93cc	Generalize khoj keymap to func that can update existing keybdings	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	19c4701f3f	Default to ledger search from files with .beancount extensions	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	cc9a395e0a	Keep name of buffer for Khoj results in a variable	2022-08-05 19:35:42 +03:00
Debanjum Singh Solanky	0a5c6d067a	Do not prompt user to set search type before querying Khoj via Emacs - What - Default to last used search type, when no search type specified - Allow user to change search type before they enter query (and after they've called khoj), if they want - Why - Reduce time from intent to results by using reasonable defaults - Make interactions smoother, more intuitive	2022-08-05 19:35:38 +03:00
Debanjum Singh Solanky	24ccba74d4	Put type dropdown, regenerate button on same row. Regain screen space	2022-08-05 06:17:43 +03:00
Debanjum Singh Solanky	017e287b8a	Remove redundant query as title in results section - Regain screen real-estate - Remove unused parameters, html being returned by org.js	2022-08-05 06:17:25 +03:00
Debanjum Singh Solanky	06afeec7e2	Hide stars of org entry results on Emacs to reduce visual clutter They've all been normlized to the same level and hence don't hold much data. So good opportunity to reduce, non-useful visual clutter	2022-08-05 05:27:57 +03:00
Saba	d1fe6353b5	Check whether processor_config exists during shutdown event	2022-08-04 21:57:36 -04:00
Debanjum Singh Solanky	4d4d2ff921	Ensure all org entries are unfolded in results buffer on Emacs	2022-08-05 04:54:29 +03:00
Debanjum Singh Solanky	49ef741d4b	Prevent Zoom on Input in Web Interface. Document Pip upgrade in Readme - Name /Reload API Controller Reload	2022-08-05 03:51:34 +03:00
Debanjum Singh Solanky	675e821d95	Make embeddings, jsonl paths absolute. Create directories if non-existent	2022-08-05 02:57:59 +03:00
Debanjum Singh Solanky	d5b43eb836	Use input filter in image search setup. Input filter wasn't used earlier	2022-08-05 02:40:03 +03:00
Debanjum Singh Solanky	ca5a8bd113	Make config file a positional argument, as it is required - Test invalid config file path throws. Remove redundant cli test - Simplify cli parser code - Do not need to explicitly check if args.config_file set. argparser checks for positional arguments automatically - Use standard semantics for cli args - All positional args are required. Non positional args are optional - Improve command line --help description	2022-08-05 01:09:40 +03:00
Debanjum Singh Solanky	1374065092	Mark all required fields for config. Throw if no input_* field specified - Add custom validator to throw if neither input_filter or input_<files\|directories> are specified - Set field expecting paths to type Path - Now that default_config isn't used in code. We can update fields in rawconfig to specify whether they're required or not. This lets pydantic validate config file and throw appropriate error	2022-08-05 01:08:48 +03:00
Debanjum Singh Solanky	f78d6ae754	Create khoj_sample file with all configurable fields in one place - Reason - Simplifies code. No merge_dict required - 1 place for user to see all configurables, defaults and required values - Details - Remove default_config from code. Set defaults in khoj_sample.yml itself - Keep fields required to be set by user as empty in khoj_sample to YAML - Set defaults for fields not requiring configuration by user	2022-08-05 01:08:33 +03:00
Debanjum Singh Solanky	3abf3e5ee0	Update merge_dicts to recursively merge the dictionaries Previously it was only merging dictionary at the first/top level	2022-08-04 22:46:20 +03:00
Debanjum Singh Solanky	61c26ba611	Only show large Khoj favicon on web interface - Do not want browsers to use the small, grainy favicons - Firefox for Android does use the bigger icon, when it's the only one available - Update svg to match the 144x144 ratio just for consistency	2022-08-04 14:33:29 +03:00
Debanjum Singh Solanky	1649fa644c	Autofocus on Query field in Web Interface. Improve time to query	2022-08-04 05:23:19 +03:00
Debanjum Singh Solanky	71fcb1087f	Add icons for web interface to render on more browsers and as PWA Safari, Firefox for Android etc don't support SVG Favicons yet	2022-08-03 18:52:41 +03:00
Debanjum Singh Solanky	5b6b7ec123	Delete khoj network connections on incremental search teardown on Emacs interface Currently only get into this state when debug breakpoints on backend are keeping the connection open and user exits khoj search from Emacs Results in a number of open connections that slow khoj down.	2022-08-03 18:52:41 +03:00
Debanjum Singh Solanky	555c1088cc	Cache queries in /search controller using LRU cache - Most concretely right now, it eliminates the re-rank latency hit on re-rank triggered on user hitting enter after re-rank is already done on user idle in the emacs interface - Improves search latency of (incremental) search	2022-08-03 18:52:41 +03:00
Debanjum Singh Solanky	38df727ef4	Fix escape sequence usage in strings. Remove unneeded import of os Rename /config API method to config to match it's purpose. UI is anyway too generic, and not what it is doing	2022-08-03 18:51:55 +03:00
Debanjum Singh Solanky	f642450ed9	Disable Incremental Search for Images on Web Bug introduced in commit `da118b3fed`	2022-08-03 11:52:51 +03:00
Debanjum Singh Solanky	b9e6273644	Include interfaces in pip package. Fix paths to web interface in app	2022-08-03 00:02:39 +03:00
Debanjum Singh Solanky	1b55462fb0	Convert search_filter, conversation dir to proper modules Add __init__.py files to their directories	2022-08-02 20:23:42 +03:00
Debanjum Singh Solanky	5108d45951	Wrap application startup steps into a method	2022-08-02 20:13:14 +03:00
Debanjum Singh Solanky	0ebfbb43ce	Nest org, md results at level 2 on Emacs interface. Improve readability - Makes it easier to fold/unfold, traverse and read results - This 2 level nesting is already being used on the web interface - Previously we were using the original nesting depth of the entry. This was aimed at providing more of the orginal context of the results. But currently this additional information does not provide as much, for the decreased legibility of the results	2022-08-01 04:01:18 +03:00
Debanjum Singh Solanky	1201bfddf3	Simplify name of config css from config-style.css to config.css	2022-08-01 01:34:00 +03:00
Debanjum Singh Solanky	075dba5d64	Use Khoj Title, Favicon in Config Page for Consistency	2022-08-01 01:27:14 +03:00
Debanjum Singh Solanky	56a4429f01	Move web interface to configure application into src/interface/web directory - Improve code layout by ensuring all web interface specific code under the src/interface/web directory - Rename config API to more specifi /config instead of /ui - Rename config data GET, POST api to /config/data instead of /config	2022-08-01 00:53:42 +03:00
Debanjum	bb2ccec1ca	Populate type dropdown on the web interface with only enabled search types - Previously we were statically populating types dropdown field in the web interface with all available search types - This change populates the type dropdown field with only search types that are enabled/configured - It queries the `/config` backend API to see which of the available search types are configured	2022-08-01 00:20:45 +03:00
Debanjum Singh Solanky	8b6058c879	Fix instantiating type field with value from URL query parameter - Populate via `.then` after enabled search types in dropdown are populated - Call to `/config` API is async and will usually complete after the value of type field is set from url - So value of type field would earlier be overridden when search types dropdown is populated after the call to `/config` API completes	2022-08-01 00:04:50 +03:00
Debanjum Singh Solanky	be253bab39	Populate type dropdown with only enabled search types in web interface - Get /config API and check config for which available search types is populated. This gives us the list of enabled search types - Dynamically populate search type field with enabled search types only	2022-07-31 23:42:00 +03:00
Debanjum Singh Solanky	0abd40aeb7	Only set query field when appropriate query param passed via URL - Setting query value to default option when query param wasn't passed via URL was overriding placeholder text in query field - We wanted placeholder text in field, not the query field to actually be populated by placeholder text - This clears field when user starts typing query into the query field, instead of them having to manually delete the default text populated	2022-07-31 22:29:23 +03:00
Debanjum Singh Solanky	17c38b526a	Default config for each search types to None - Setting up default compressed-jsonl, embeddings-file was only required for org search_type, while org-files and org-filter were allowed to be passed as command line argument - This avoided having to set compressed-jsonl and embeddings-file via command line argument as well for org search type - Now that all search types are only configurable via config file, We can default all search types to None. The default config for the rest of the search types wasn't being used anyway	2022-07-31 22:23:57 +03:00
Debanjum Singh Solanky	b83021a723	Improve code readability of merge_dicts helper method	2022-07-31 22:07:56 +03:00
Debanjum Singh Solanky	38aede68f2	Only configure org via config file for consistency across search types - Previously org-files were configurable via cmdline args. Where as none of the other search types are - This is an artifact of how the application grew - It can be removed for better consistency and equal preference given all search types	2022-07-31 22:02:03 +03:00
Saba	b55159f5bd	Fix URL for khoj.el quelpa setup instructions	2022-07-29 23:01:04 -04:00
Debanjum Singh Solanky	da118b3fed	Simplify incremental search function used in web interface Re-rank isn't passed to image search API in search function. So don't need to check type in incremental_search function too	2022-07-29 23:18:01 +04:00
Debanjum Singh Solanky	3079614981	Allow set up of search form via query params in web interface - Default search type to org, instead of images	2022-07-29 23:13:26 +04:00
Debanjum Singh Solanky	02ca2c05a1	Add Eagle Icon for Khoj to Web, Emacs Interfaces and Readme	2022-07-29 17:50:29 +04:00
Debanjum Singh Solanky	78314263a0	Add Table of Contents, Features, Performance Details to Readme	2022-07-29 17:08:17 +04:00
Debanjum Singh Solanky	ed181f47c9	Prettify rendering of org music results on Khoj web interface	2022-07-29 04:28:22 +04:00
Debanjum Singh Solanky	7e5291a38e	Make org result headings at same level. Improve spacing of results Having org-mode result headings change size based on their depth in the source document makes is a confusing UI experience. Improve font-size, line-spacing and margins of results to make delineation between entries, and differntiating between entry heading and it's body easier to visually infer. Do not white-space: pre-line. Improves rendering of Markdown results	2022-07-29 01:55:46 +04:00
Debanjum Singh Solanky	4d5183063c	Create images directory if doesn't exist, to store image search results	2022-07-28 21:30:31 +04:00
Debanjum Singh Solanky	a9bc17a6b0	Prettify Render of Markdown Results in Web Interface	2022-07-28 20:56:37 +04:00
Debanjum Singh Solanky	a6ae74f52e	Move JS files like org.js into a separate assets/ directory	2022-07-28 20:46:48 +04:00
Debanjum Singh Solanky	a12eaa4ce0	Move Khoj image results into a child images/ directory	2022-07-28 20:45:12 +04:00
Debanjum	a71253e137	Support Incremental Search on Web Interface ## Support Incremental Search on Khoj Web Interface - Use default, fast path to query /search API while user is typing - Upgrade to cross-encoder re-ranked results once user hits enter on search box ## Improve Render of Org Results on Web Interface - We were previously just wrapping results from /search API into a pre formatted div field. This was not easy to read - Use [org.js](https://mooz.github.io/org-js/) to render results from Khoj `/search` API as proper HTML - Improve org.js to render all task states, stylize task tags and make org-mode results look more like original content Closes #42 #41	2022-07-28 09:31:57 -07:00
Debanjum Singh Solanky	e8029bf415	Extract and Highlight org-mode tags in HTML render of search results	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	c6c248df26	Improve styling of org-mode results to original alignment, line breaks	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	9f59897eeb	Highlight all org-mode task states in HTML. Not just TODO, DONE. - Make logic to extract, mark todo state in org.js more generic - Add default todo state styling to html	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	f040b3f65c	Stylize TODO/DONE states with CSS	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	581b6097c7	Clean Results. Remove TOC, Heading Number and Property Drawers	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	965a93a2f2	Add Basic HTML Rendering of Org-Mode Results	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	1da44d4dfe	Add Incremental Search to Khoj Web Interface	2022-07-28 19:55:15 +04:00
Debanjum Singh Solanky	af1dd31401	Do not pass verbose argument to image_search.query() as not supported	2022-07-28 19:52:58 +04:00
Debanjum Singh Solanky	80ac10835c	Rerank results on normal minibuffer exit In current state: - Rerank results: - If user idles while entering query OR - exits normally - Do not rerank results: - If user exits abnormally, e.g via C-g from query	2022-07-28 03:37:16 +04:00
Debanjum Singh Solanky	1b759597df	Make incremental search more robust. Follow standard user expectations - Rename functions to more standard, descriptive names - Keep known, required code for incremental search - E.g Do not set buffer local flag in hooks on minibuffer setup - Only query when user in khoj minibuffer - Use active-minibuffer-window and track khoj minibuffer - (minibuffer-prompt) is not useful for our use-case here - (For now) Run re-rank only if user idle while querying - Do not run rerank on teardown/completion - The reranking lag (~2s) is annoying; hit enter, wait to see results - Also triggered when user exits abnormally, so C-g also results in rerank which is even more annoying - Emacs will still hang if re-ranking gets triggered on idle but that's better than always getting triggered. And better than not having mechanism to get results re-ranked via cross-encoder at all	2022-07-28 02:52:27 +04:00
Debanjum Singh Solanky	9a6eee31be	Make number of results to get from Khoj API customizable in khoj.el	2022-07-27 18:55:18 +04:00
Debanjum Singh Solanky	9302b45fe0	Use khoj-incremental as the main khoj func. Rename khoj to khoj-simple - Update khoj-simple to work cross-encoder re-ranked results like before - Increment major version as incremental search considered a breaking change and a major update to search capability	2022-07-27 18:18:17 +04:00
Debanjum Singh Solanky	09727ac3be	Make bi-encoder return fewer results to reduce cross-encoder latency	2022-07-27 07:26:02 +04:00
Debanjum Singh Solanky	9ab3edf6d6	Re-rank incremental search results using cross-encoder if user idle This provides a relatively smooth mechanism - to improve relevance of results on idle - while providing the rapid, incremental results while typing	2022-07-27 07:25:42 +04:00
Debanjum Singh Solanky	ad242cafa7	Support querying all text search types in incremental search - Before incremental search was hard-coded to only query org	2022-07-27 07:25:42 +04:00
Debanjum Singh Solanky	bfcb962cbe	Use post-command-hook to only query on user input - Hooking into after-change-functions results in system logs triggering query	2022-07-27 07:25:42 +04:00
Debanjum Singh Solanky	0d49398954	Reuse code to query api, render results. Formalize method, arg names	2022-07-27 07:25:42 +04:00
Debanjum Singh Solanky	fd1963d781	Implement Basic Incremental Search Interface in Emacs for Org Mode Notes	2022-07-27 03:05:00 +04:00
Debanjum Singh Solanky	3fa7d8f03a	Skeleton to allow incremental search on Khoj via Emacs	2022-07-27 02:48:27 +04:00
Debanjum Singh Solanky	1168244c92	Make cross-encoder re-rank results if query param set on /search API - Improve search speed by ~10x Tested on corpus of 125K lines, 12.5K entries - Allow cross-encoder to re-rank results by settings &?r=true when querying /search API - It's an optional param that default to False - Earlier all results were re-ranked by cross-encoder - Making this configurable allows for much faster results, if desired but for lower accuracy	2022-07-26 22:56:36 +04:00
Debanjum Singh Solanky	b1e64fd4a8	Improve search speed. Only apply filter if filter keywords in query - Formalize filters into class with can_filter() and filter() methods - Use can_filter() method to decide whether to apply filter and create deep copies of entries and embeddings for it - Improve search speed for queries with no filters as deep copying entries, embeddings takes the most time after cross-encodes scoring when calling the /search API Earlier we would create deep copies of entries, embeddings even if the query did not contain any filter keywords	2022-07-26 22:47:26 +04:00
Debanjum Singh Solanky	f094c86204	Trace query response performance and display timings in verbose mode	2022-07-26 21:03:53 +04:00
Debanjum Singh Solanky	65fea7681a	Rename notes search type to org search, now that markdown notes supported	2022-07-21 22:09:44 +04:00
Debanjum Singh Solanky	4c24202e42	Update documentation. Simplify, reflect current capabilities	2022-07-21 22:09:44 +04:00
Debanjum Singh Solanky	d4d7dbaca6	Support Natural Search on Markdown Files - Reason: Allow natural search on markdown based notes, documentation, websites etc - Details: - Create markdown processor to extract Markdown entries (identified by Heading) into standard jsonl format required by text_search - Update API, Configs to support interfacing with new markdown type - Update Emacs, Web clients to support interfacing with new markdown type via API - Update Readme to mentiond markdown is also supported Closes #35	2022-07-21 22:07:05 +04:00
Debanjum Singh Solanky	0602d018c0	Merge Symmetric, Asymmetric Search Types into a single Text Search Type - The code for both the text search types were mostly the same It was earlier done this way for expedience while experimenting - The minor differences were reconciled and merged into a single text_search type - This simplifies the app and making it easier to process other text types	2022-07-21 21:19:52 +04:00
Debanjum Singh Solanky	0917f1574d	Consolidate jsonl helper methods in a single file under utils module	2022-07-21 03:30:13 +04:00
Debanjum Singh Solanky	de726c4b6c	Minor fixes to unused installer utility script	2022-07-21 03:30:13 +04:00
Debanjum Singh Solanky	5aad297286	Reuse logic to extract entries across symmetric, asymmetric search Now that the logic to compile entries is in the processor layer, the extract_entries method is standard across (text) search_types Extract the load_jsonl method as a utility helper method. Use it in (a)symmetric search types	2022-07-21 02:53:18 +04:00
Debanjum Singh Solanky	e220ecc00b	Generate compiled form of each transaction directly in the beancount processor - The logic for compiling a beancount entry (for later encoding) now completely resides in the org-to-jsonl processor layer - This allows symmetric search to be generic and not be aware of beancount specific properties that were extracted by the beancount-to-jsonl processor layer - Now symmetric search just expects the jsonl to (at least) have the 'compiled' and 'raw' keys for each entry. What original text the entry was compiled from is irrelevant to it. The original text could be location, transaction, chat etc, it doesn't have to care	2022-07-21 02:43:28 +04:00
Debanjum Singh Solanky	06cf425314	Generate compiled form of each entry directly in the org-mode processor - The logic for compiling an org-mode entry (for later encoding) now completely resides in the org-to-jsonl processor layer - This allows asymmetric search to be generic and not be aware of org-mode specific properties that were extracted by the org-to-jsonl processor layer - Now asymmetric search just expects the jsonl to (at least) have the 'compiled' and 'raw' keys for each entry. What original text the entry was compiled from is irrelevant to it. The original text could be mail, chat, markdown, org-mode etc, it doesn't have to care	2022-07-21 02:08:02 +04:00
Debanjum Singh Solanky	4ead79d272	Make Notes Search Natural Language Date Aware - Pass Scheduled, Closed Dates of Entries to Include in Embeddings - The (new?) model seems to understand dates. So can give more relevant entries if date in natural language mentioned in query - E.g "Went Surfing with Friends" vs "Went Surfing with Friends in 1984" will give different results, with the second prioritizing entries mentioning any entries with closed, scheduled dates from 1984	2022-07-21 01:06:49 +04:00
Debanjum Singh Solanky	d50bfb5188	Parse Logbook Entries in the OrgNode parser for Org-Mode. Update tests	2022-07-21 00:15:30 +04:00
Debanjum Singh Solanky	70e70d4b15	Rename 'embed' key to more generic 'compiled' for jsonl extracted results - While it's true those strings are going to be used to generated embeddings, the more generic term allows them to be used elsewhere as well - Their main property is that they are processed, compiled for usage by semantic search - Unlike the 'raw' string which contains the external representation of the data, as is	2022-07-20 20:35:50 +04:00
Debanjum Singh Solanky	c1369233db	Consistently use "entry", "score" in json response for all search types - Had already made some progress on this earlier by updating the image search responses. But needed to update the text search responses to use lowercase entry and score - Update khoj.el to consume the updated json response keys for text search	2022-07-20 20:33:27 +04:00
Debanjum Singh Solanky	d68a9dc445	Sort extracted images before computing their embeddings - Image order returned by glob is OS dependent - This prevented sharing image embeddings across machines running different OS - A stable sort order for processed images allows sharing embeddings across machines. - Use case: A more powerful, always on machine actually computes the image embeddings regularly The client machine just load these periodically to provide semantic search functionality	2022-07-20 03:51:27 +04:00
Debanjum Singh Solanky	c4c7f38b15	Fix extracting image names from multiple image directories	2022-07-20 03:40:49 +04:00
Debanjum Singh Solanky	bdc1b9f2bb	Resolve edge case errors in encoding image metadata - Handle case where current image batch smaller than batch_size - Handle case where no XMP metadata for current image - return empty strings in such a scenario instead of ". "	2022-07-20 02:58:43 +04:00
Debanjum Singh Solanky	2a5445216c	Image input directory not required by collate result as image_name already absolute path	2022-07-20 02:56:23 +04:00
Debanjum Singh Solanky	6c9ffdba57	Allow indexing multiple image directories for image search	2022-07-20 02:56:01 +04:00
Debanjum Singh Solanky	70221bb038	Allow filtering transactions by date in symmetric ledger	2022-07-19 20:58:24 +04:00
Debanjum Singh Solanky	b673d26a12	Extract Entries in a standardized format across text search types Issue: - Had different schema of extracted entries for symmetric_ledger vs asymmetric - Entry extraction for asymmetric was dirty, relying on cryptic indices to store raw entry vs cleaned entry meant to be passed to embeddings - This was pushing the load of figuring out what property to extract from each entry to downstream processes like the filters - This limited the filters to only work for asymmetric search, not for symmetric_ledger - Fix - Use consistent format for extracted entries { 'embed': entry_string_meant_to_be_passed_to_model_and_get_embeddings, 'raw' : raw_entry_string_meant_to_be_passed_to_use } - Result - Now filters can be applied across search types, and the specific field they should be applied on can be configured by each search type	2022-07-19 20:52:25 +04:00
Debanjum Singh Solanky	e66cd5bf59	Only extract transactions from Beancount - Earlier was extracting all entries starting with dates but the other type of entries like account open/close, asserts etc aren't useful for querying	2022-07-19 19:50:58 +04:00
Debanjum Singh Solanky	732b2d287f	Give the project a short, less generic name. Rename it to Khoj - Semantic Search was just a placeholder used to test the idea out Didn't want to get into naming at that point of time	2022-07-19 18:26:16 +04:00
Debanjum Singh Solanky	989526ae54	Use a more accurate model for symmetric semantic search - The all-MiniLM-L6-v2 is more accurate - The exact previous model isn't benchmarked but based on the performance of the closest model to it. Seems like the new model maybe similar in speed and size - On very preliminary evaluation of the model, the new model seems faster, with pretty decent results	2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky	4a90972e38	Use a better model for asymmetric semantic search - The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1] - It has the right mix of model query speed, size and performance on benchmarks - On hugging face it has way more downloads and likes than the msmarco model[2] - On very preliminary evaluation of the model - It doubles the encoding speed of all entries (down from ~8min to 4mins) - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier) [1]: https://www.sbert.net/docs/pretrained_models.html [2]: https://huggingface.co/sentence-transformers	2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky	5e302dbcda	Fix using 1 column layout on small screens	2022-07-18 02:40:16 +04:00
Debanjum Singh Solanky	7d16b673b1	Use Single Column Layout for Small Screens on Web Interface	2022-07-18 02:08:52 +04:00
Debanjum Singh Solanky	31a221a76b	Auto focus cursor on query input box to simplify, speed interactions - Avoids having to click the query input box - Just open page, type whatever and hit enter to do image search - For other search types select appropriate type from dropdown	2022-07-16 19:39:15 +04:00
Debanjum Singh Solanky	06b0c720d6	Improve Rendering of Image Search Results in Emacs - Use shr to render image response from html in result buffer Earlier was using org-mode. But rendering HTML with shr seems cleaner - Use Headings to Add highlights - Use Random to Force fetch of Image. Similar to what was done for Web interface - Remove trailing elisp brackets from response - Show query match scores by image model for each image in results	2022-07-16 19:31:49 +04:00
Debanjum Singh Solanky	28ec9af589	Extract image URL location from response in elisp after API update	2022-07-16 18:43:55 +04:00
Debanjum Singh Solanky	47613cba1f	Improve Landing Page Look in General and Layout for Mobile - Ask for 6 Images to Fill Grid into 3x2 Layout - Submit Form on Hitting Enter	2022-07-16 16:55:13 +04:00
Debanjum Singh Solanky	cf207d6ebe	Add title, heading to the semantic search web interface	2022-07-16 03:44:29 +04:00
Debanjum Singh Solanky	e0d8398b27	Normalize metadata match score to work better with image match score - Metadata match score were consistently giving higher scores by a factor of ~3x wrt to image match score. This was resulting in all results being from the metadata match with query and none from the image match with query. - Scaling the metadata match scores down by scaling factor seems to give more consistently give a blend of results from both image and metadata matches	2022-07-16 03:39:33 +04:00
Debanjum Singh Solanky	a3fc82817d	Log and continue on image metadata encoding error due to Tensor size mismatch	2022-07-16 03:39:19 +04:00
Debanjum Singh Solanky	f26d0ddbbd	Minor fix to asymmetric search when no entries returned	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	ca3f93e641	Add button on web interface to regenerate embeddings of specified type	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	231cc91e14	Force reload of images every time user clicks search button Adding a random, unused url param at the end of the img.src string fixes the issue. As the browser thinks it's a new image and doesn't use the image data that's already cached because of which it wasn't even making the fetch call for the image	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	a6aef62a99	Create Basic Landing Page to Query Semantic Search and Render Results - Allow viewing image results returned by Semantic Search. Until now there wasn't any interface within the app to view image search results. For text results, we at least had the emacs interface - This should help with debugging issues with image search too For text the Swagger interface was good enough	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	4e27ae0577	Ease access to image result for given query by image_search - Copy images to accessible directory - Return URL paths to them to ease access - This is to be used in the web interface to render image results directly in browser - Return image, metadata scores for each image in response as well This should help get a better sense of image scores along both XMP metadata and whole image axis	2022-07-16 03:36:19 +04:00
Debanjum Singh Solanky	801e59a20d	Allow explicit filters when querying Ledger transactions	2022-07-15 23:41:54 +04:00
Debanjum Singh Solanky	0e979587e0	Add configurable filter support to Symmetric Ledger Search	2022-07-14 23:40:41 +04:00
Debanjum Singh Solanky	85077bc1d1	Handle unparseable date range passed via date filter in query - Do not reuse the same list - Just create new list, so only parsed data is in it	2022-07-14 22:47:23 +04:00
Debanjum Singh Solanky	a60de2c02b	Include date filter in asymmetic search on music as well	2022-07-14 22:37:17 +04:00
Debanjum Singh Solanky	c3b3e8959d	Put entry splitting regex in explicit filter into a variable for code readability	2022-07-14 22:00:10 +04:00
Debanjum Singh Solanky	3aac3c7d52	Run explicit filter on raw entry, add more terms to split entries by - With \t Last Word in Headings was suffixed by \t and so couldn't be filtered by - User interacts with raw entries, so run explicit filters on raw entry - For semantic search using the filtered entry is cleaner, still	2022-07-14 21:54:04 +04:00
Debanjum Singh Solanky	7640e2ab0c	Wrap attempt to extract dates from entry in try/catch - Not all YYYY-MM-DD strings in entry are necessarily dates	2022-07-14 21:38:00 +04:00
Debanjum Singh Solanky	9de2097182	Fix date filter usage with multi word queries. Simplify date regex	2022-07-14 21:34:33 +04:00
Debanjum Singh Solanky	dcb6fe479e	Fix date_filter query, entry in query range check. Add tests for it - Fix date_filter date_in_entry within query range check - Extracted_date_range is in [included_date, excluded_date) format - But check was checking for date_in_entry <= excluded_date - Fixed it to do date_in_entry < excluded_date - Fix removal of date filter from query - Add tests for date_filter	2022-07-14 20:01:35 +04:00
Debanjum Singh Solanky	011f81fac5	Fix date_filter to handle non overlapping date ranges	2022-07-14 18:53:38 +04:00
Debanjum Singh Solanky	70ac35b2a5	Compute Date Range to filter entries to, from Comparators, Dates in Query	2022-07-14 18:20:09 +04:00
Debanjum Singh Solanky	e6db3e3d00	Prefer Dates From Future only when specific words in date string - Default to looking at dates from past, as most notes are from past - Look for dates in future for cases where it's obvious query is for dates in the future but dateparser's parse doesn't parse it at all. E.g parse('5 months from now') returns nothing - Setting PREFER_DATES_FROM_FUTURE in this case and passing just parse('5 months') to dateparser.parse works as expected	2022-07-14 18:13:12 +04:00
Debanjum Singh Solanky	4a201d52af	Add, test date filter regex and date parsing to get natural date range	2022-07-14 16:47:32 +04:00
Debanjum Singh Solanky	b54588717f	Filter for entries with dates specified by user in query - Create Date filter - Users can pass dates in YYYY-MM-DD format in their query - Use it to filter asymmetric search to user specified dates	2022-07-14 00:51:02 +04:00
Debanjum Singh Solanky	b82aef26bf	Make filters to apply before semantic search configurable Details -- - The filters to apply are configured for each type in the search controller - Muliple filters can be applied on the query, entries etc before search - The asymmetric query method now just applies the passed filters to the query, entries and embeddings before semantic search is performed Reason -- This abstraction will simplify adding other pre-search filters. E.g datetime filter	2022-07-13 16:37:09 +04:00
Debanjum Singh Solanky	c92789d20a	Extract explicit pre-search filter function into a separate module Details -- - Move explicit_filters function into separate module under search_filter - Update signature of explicit filter to take and return query, entries, embeddings - Use this explicit_filter func from search_filters module in query Reason -- Abstraction will simplify adding other pre-search filters. E.g datetime filter	2022-07-13 16:20:04 +04:00
Debanjum Singh Solanky	6d7ab50113	Run Explicit Filter on Entries, Embeddings before Semantic Search for Query - Issue - Explicit filtering was earlier being done after search by bi-encoder but before re-ranking by cross-encoder - This was limiting the quality of results being returned. As the bi-encoder returned results which were going to be excluded. So the burden of improving those limited results post filtering was on the cross-encoder by re-ranking the remaining results based on query - Fix - Given the embeddings corresponding to an entry are at the same index in their respective lists. We can run the filter for blocked, required words before the search by the bi-encoder model. And limit entries, embeddings being considered for the current query - Result - Semantic search by the bi-encoder gets to return most relevant results for the query, knowing that the results aren't going to be filtered out after. So the cross-encoder shoulders less of the burden of improving results - Corollary - This pre-filtering technique allows us to apply other explicit filters on entries relevant for the current query - E.g limit search for entries within date/time specified in query	2022-07-12 18:25:42 +04:00
Debanjum Singh Solanky	7677465f23	Fix passing of device to setup method in /reload, /regenerate API - Use local variable to pass device to asymmetric.setup method via /reload, /regenerate API - Set default argument to torch.device('cpu') instead of 'cpu' to be more formal	2022-06-30 01:32:56 +04:00
Debanjum Singh Solanky	eda4b65ddb	Improve Query Speed. Normalize Embeddings, Moving them to Cuda GPU - Move embeddings to CUDA GPU for compute, when available - Normalize embeddings and Use Dot Product instead of Cosine	2022-06-30 00:59:57 +04:00
Debanjum Singh Solanky	b89fc2f4ac	Add /reload API to reload model embeddings and entries from file - The reload API adds the ability to separate out the loading of embeddings from file without having to restart app or (re-)generate embeddings - Before this the only way to load model from file was by restarting app - The other way to reload the model embeddings by regenerating them was to expensive for larger datasets - This unlocks at least 1 use-case, where - we regenerate model via an app instance running on a separate server and - just reload the generated embeddings on the client device - This allows us to offload the expensive embedding generation compute to a background server while letting - This avoids having to (re-)restart application on client device or be forced to generate embeddings on the client device itself - But it requires the model relevant files to be synced to the client device This can be done with any file syncing application like Syncthing - We can then call /regenerate on server and /reload client on a regular schedule to keep our data up to date on semantic search	2022-06-29 23:47:17 +04:00
Debanjum Singh Solanky	f5d6d1e752	Tiny style fix to separate functions by 2 newlines	2022-06-29 23:47:17 +04:00
Debanjum Singh Solanky	85fbe1c42b	Normalize org notes path to be relative to home directory - This is still clunky but it should be commitable - General enough that it'll work even when a users notes are not in the home directory - While solving for the special case where: - Notes are being processed on a different machine and used on a different machine - But the notes directory is in the same location relative to home on both the machines	2022-06-28 19:16:11 +04:00
Debanjum Singh Solanky	094eaf3fcc	Fix minor bugs in OrgNode parser - Bugs discovered from writing org-node tests	2022-06-17 19:14:54 +03:00
Debanjum Singh Solanky	36495038dd	Fix storing parsed CLOSED date in OrgNode The CLOSED date was getting parsed but not stored Adding setClosed at start also fixed the issue	2022-06-17 16:33:37 +03:00
Debanjum Singh Solanky	1c5754bf95	Simplify storing Tags in OrgNode object - Use Set for Tags instead of dictionary with empty keys - No Need to store First Tag separately - Remove properties methods associated with storing first tag separately - Simplify extraction of tags string in org_to_jsonl - Split notes_string creation into multiple f-string in separate line for code readability	2022-06-17 16:33:37 +03:00
Debanjum Singh Solanky	51a43245d3	Escape square brackets in file+heading based org-mode links	2022-06-17 16:20:19 +03:00
Debanjum Singh Solanky	04610f453a	Include scheduled date, deadline date and close date in repr of org node - Now that excluding the times line from the raw body of node, show it in repr so user can see it for reference - But the model doesn't need to see it for it's embeddings to be confused by	2022-06-17 05:13:48 +03:00
Debanjum Singh Solanky	367d7377df	Ignore scheduled, closed, deadline time and logbook start, end in org node body - Gives cleaner embeddings for semantic search - Hopefully improves results and reduces size, compute	2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky	b77ccadcba	Make property key regex more strict. Property key has to be alphanumeric	2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky	ac9d746444	Fix Tags extraction in Org Node parser - Previous version required two tags at least to work, not sure why - Fixed it to extract all tags, even if only one tag in heading	2022-06-17 04:21:22 +03:00
Debanjum Singh Solanky	fb86be8cd9	Add ID, File+Heading based Links to Org-Mode Entries - Add links to property drawer - This ensures results returned by semantic search contain these links - This allows the user to jump to entry within original file for context - The ID, file+heading based links are more robust to find relevant entry in original file than the line no based link, as edits being done by user to original files between embedding regenerations	2022-06-17 03:11:11 +03:00
Debanjum Singh Solanky	de23fc2051	Revert Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search Sentence Transformer MSMarco Model isn't date aware So no use of adding scheduled, deadline dates to model embeddings for consideration This reverts commit `a2a08d1354`.	2022-06-17 02:57:28 +03:00
Debanjum Singh Solanky	a2a08d1354	Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search	2022-06-17 02:55:27 +03:00
Debanjum Singh Solanky	cfbd5c4ecc	Update global model on regenerate via API	2022-06-17 00:49:06 +03:00
Debanjum Singh Solanky	c78bf84eef	Introduce search api endpoint that auto infers search type intent - Introduce prompt for GPT to automatically extract user's search intent - Expose new search api endpoint to use that to set SearchType being passed to search API - Currently meant as an experimental API to gauge usefulness, extendability. Evaluating for phone or voice use-case	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	8ef7917014	Fix json format passed in prompt to GPT	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	f57b7f65ea	Wrap prompts for GPT in triple quotes to improve prompt readability To prompt improve readability: - Remove newline escape sequence and use actual newline directly - This avoids one long line of text as prompt and - Remove escaping of double quotes	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	1eba7b1c6f	Use empty_escape_sequence constant to strip response text from gpt	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	1c3a1420f8	Update asymmetric extract_entries method to handle uncompressed jsonl This is similar to what was done for the symmetric extract_entries method earlier	2022-02-27 19:03:31 -05:00
Debanjum Singh Solanky	3d8a07f252	Extract empty line escape sequences var into constants file for reuse	2022-02-27 19:01:49 -05:00
Debanjum Singh Solanky	bb5d0d8908	Improve Semantic Search Buffer Names in Emacs - Allow multiple semantic searches buffers to exist simultaneously - Uniquify semantic search buffer namew - Add query and search-type to semantic search buffer name for easier disambiguration, search and find appropriate	2022-02-26 18:30:14 -05:00
Debanjum Singh Solanky	b68558651b	Improve Extraction of Beancount Entries - Only extract entries starting with YYYY-MM-DD from Beancount - Strip Trailing Escape Sequences from Entries	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	b3ac2dd730	Improve Results Rendered on Emacs from Semantic Search on Ledger - Add search query to top of buffer as Beancount comment - Remove trailing ) from response - Separate entries by empty line - Load beancount-mode in semantic search on ledger buffer	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	502c68d4f8	Remove trailling escape sequence in ledger search response entries - Fix loading entries from jsonl in extract_entries method - Only extract Title from jsonl of each entry This is the only thing written to the jsonl for symmetric ledger - This fixes the trailing escape seq in loaded entries - Remove the need for semantic-search.el response reader to do pointless complicated cleanup - Make symmetric_ledger:extract_entries use beancount_to_jsonl:load_jsonl Both methods were doing similar work - Make load_jsonl handle loading entries from both gzip and uncompressed jsonl	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	248aa632c0	Do not throw warning for beancount files with .beancount extension	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	76cd63f4bd	Fix count of processed jsonl entries shown to user by ledger processor Count lines not chars	2022-02-26 17:46:06 -05:00
Saba	33bc62dc19	Fix type of use_xmp_metadata to be bool, rather than str	2022-01-24 21:53:26 -05:00
Debanjum Singh Solanky	179153dc5a	Rename RawConfig Types for Consistency - Naming convention - [ContentType][ConfigType]Config - Where [ConfigType] ~ Content, Search, Processor - Where [ContentType] ~ Text, Image, Asymmetric, Symmetric, Conversation - Current Configs: - Content: - Org Notes - Org Music - Image - Ledger/Beancount - Search: - Asymmetric - Symmetric - Image - Processor: - Conversation	2022-01-14 20:54:38 -05:00
Debanjum Singh Solanky	c64e0c2965	Load model from HuggingFace if model_directory unset in config YAML - Do not save/load the model to/from disk when model_directory unset in config.yml - Add symmetric search default config to cli.py	2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky	510faa1904	Save Image Search Model to Disk	2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky	934ec233b0	Add Search Config for Symmetric Model. Save Model to Disk	2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky	b63026d97c	Save Asymmetric Search Model to Disk - Improve application load time - Remove dependence on internet to startup application and perform semantic search	2022-01-14 17:36:27 -05:00
Debanjum Singh Solanky	2e53fbc844	Fix the user intent extraction prompt for GPT. Clean up chatbot test	2022-01-12 10:36:01 -05:00
Debanjum Singh Solanky	ea28897cdd	Remove deprecated conversation_history field from config	2022-01-12 10:35:52 -05:00
Debanjum Singh Solanky	5a686b7be9	Add logs for chat bot in verbose mode	2022-01-12 10:35:52 -05:00
Debanjum Singh Solanky	6dc2a99d35	Merge branch 'master' of github.com:debanjum/semantic-search into add-summarize-capability-to-chat-bot - Fix openai_api_key being set in ConfigProcessorConfig - Merge addition of config UI and config instantiation updates	2021-12-20 13:30:42 +05:30
Debanjum Singh Solanky	65da7daf1f	Load, Save Conversation Session Summaries to Log. s/chat_log/chat_session Conversation logs structure now has session info too instead of just chat info Session info will allow loading past conversation summaries as context for AI in new conversations { "session": [ { "summary": <chat_session_summary>, "session-start": <session_start_index_in_chat_log>, "session-end": <session_end_index_in_chat_log> }], "chat": [ { "intent": <intent-object> "trigger-emotion": <emotion-triggered-by-message> "by": <AI\|Human> "message": <chat_message> "created": <message_created_date> }] }	2021-12-15 10:17:07 +05:30
Saba	97a6dfaa1e	Use default value False for verbose parameter, and small changes Pass config as parameter to initialize_search, change name of API methods to handle config CRUD operations, and initalize config to FullConfig	2021-12-11 14:13:14 -05:00
Saba	9536358d34	Fix key error model_name issue by upgrade sentence-transformers version Refer to https://github.com/UKPLab/sentence-transformers/issues/1241 Also user verbose flag passed through function parameters in image_search	2021-12-11 11:58:19 -05:00
Saba	ce7a751e6b	Fix passing verbose flag down in symmetric_ledger.py	2021-12-11 11:36:32 -05:00
Saba	d65190c3ee	Update unit tests, files with removing model suffix to config types	2021-12-09 08:50:38 -05:00
Debanjum Singh Solanky	0ac1e5f372	Summarize chat logs and notes returned by semantic search via /chat API	2021-12-08 02:34:07 +05:30
Saba	76e9e9da2f	Update unit tests to use the new BaseModel types	2021-12-05 09:31:39 -05:00
Saba	9b16cdbb41	Use past tense for verbose log	2021-12-04 11:45:44 -05:00
Saba	10e4065e05	Consolidate the search config models and pass verbose as a top level flag	2021-12-04 11:43:48 -05:00
Saba	43e647835b	Append Model Suffixed to config models	2021-12-04 10:51:21 -05:00
Saba	e068968b35	Update imports for raw config models in config.py	2021-12-04 10:44:55 -05:00
Saba	4d6284b0af	Remove Test suffix from Config models	2021-12-04 10:44:13 -05:00
Saba	7fcc8d2cef	Add null check for processor config	2021-12-04 10:11:00 -05:00
Saba	7ca4fc3453	Resolve mrege conflicts with updated processor conversation data model	2021-11-28 16:22:52 -05:00
Saba	87a6c2d716	Use parse_obj instead of parse_raw as incoming data is in dict	2021-11-28 14:34:32 -05:00
Saba	5d50487d83	Linting New line at end of config.html Remove debug print statement	2021-11-28 13:32:56 -05:00
Saba	6f466c8d99	Use global config and add a regenerate button to the config ui' && git push	2021-11-28 13:28:22 -05:00
Saba	34d1e4199c	Use alias generator when deserializing the config file	2021-11-28 13:05:48 -05:00
Saba	19b81e82f0	Write back to the raw config.yml file on update	2021-11-28 12:34:40 -05:00
Saba	8837b02de6	dump updated config to a yaml file	2021-11-28 12:26:07 -05:00
Saba	5b80b87379	Streamline None checking in initialize_search	2021-11-28 12:05:04 -05:00
Saba	bf8ae31e6a	Streamline None checking in initialize_search	2021-11-28 11:59:45 -05:00
Saba	da52433d89	Update to re-use the raw config base models in config.py as well	2021-11-28 11:57:33 -05:00
Saba	6292fe4481	Update to re-use the raw config base models in config.py as well	2021-11-28 11:57:13 -05:00
Saba	311c4b7e7b	Working API request body parsing to /post config!	2021-11-28 11:16:33 -05:00
Saba	66183cc298	Working API request body parsing to /post config!	2021-11-28 11:12:26 -05:00
Debanjum Singh Solanky	5cd920544d	Add GPT method to summarize notes and chat logs	2021-11-28 13:08:05 +05:30
Debanjum Singh Solanky	1785047ea6	Improve understand primer and load understand response as dict	2021-11-28 13:04:16 +05:30
Saba	64645c3ac1	Begin type checking/input validation effort	2021-11-27 21:47:56 -05:00
Saba	9a0264b7fc	Add a dummy POST config endpoint, integrate with editable UI	2021-11-27 20:36:03 -05:00
Saba	f3b03ea5b7	Make raw data reactive to changes	2021-11-27 19:17:15 -05:00
Debanjum Singh Solanky	67c3cd7372	Wire up GPT understand method to /chat API. Log conversation metadata too	2021-11-28 00:04:39 +05:30
Saba	3db06eee3f	Basic example of serving conifg as JSON and retriving on button click	2021-11-27 10:49:33 -05:00
Saba	3d4471e107	Merge branch 'master' of github.com:debanjum/semantic-search into saba/configui	2021-11-27 08:52:48 -05:00
Debanjum Singh Solanky	ccfb97e1a7	Wire up minimal conversation processor. Expose it over /chat API endpoint Ensure conversation history persists across application restart	2021-11-27 18:12:01 +05:30
Debanjum Singh Solanky	a99b4b3434	Make conversation processor configurable	2021-11-27 18:12:01 +05:30
Debanjum Singh Solanky	d4e1120b22	Add GPT based conversation processor to understand intent and converse with user - Allow conversing with user using GPT's contextually aware, generative capability - Extract metadata, user intent from user's messages using GPT's general understanding	2021-11-27 18:12:01 +05:30
Saba	baee52648d	Set up basic ui page with no functionality	2021-11-26 14:51:11 -05:00
debanjum	46661b3057	Ensure top_k never more than total entries to run symmetric search on	2021-11-16 11:32:21 -08:00
debanjum	8c858d1a94	Reduce symmetric search results for cross-encoder to re-rank to improve search speed	2021-11-16 11:31:19 -08:00
Debanjum Singh Solanky	f3fd5ae978	Improve code comments. Do not import unused modules in asymmetric search	2021-11-17 00:58:31 +05:30
Debanjum Singh Solanky	8cf2465e8e	Ensure top_k never more than total entries to search from	2021-11-17 00:56:31 +05:30
Debanjum Singh Solanky	4d37ace3d6	Reduce search results for cross-encoder to re-rank to improve search speed Search time on my notes reduced from 14s to 4s. Cross-encoder re-ranking step takes majority time, not the cosine similarity search	2021-11-17 00:50:28 +05:30
Debanjum Singh Solanky	1832e418e5	Use raw string for regex in orgnode to fix deprecation warning	2021-10-02 17:38:31 -07:00
Debanjum Singh Solanky	f59e321419	Update CLIP model load path	2021-10-02 16:50:06 -07:00
Debanjum Singh Solanky	c47a8cdf16	Allow configuring host, port or unix socket of server via CLI	2021-10-02 16:16:33 -07:00
Debanjum Singh Solanky	516f28b082	Merge branch 'master' of github.com:debanjum/semantic-search	2021-09-30 04:17:32 -07:00
Debanjum Singh Solanky	d2905c4be6	Move tests out to project root. Use absolute import in project tests/ directory in project root is more standard. Just had to use absolute path for internal module imports to get it to work	2021-09-30 04:12:14 -07:00
Debanjum Singh Solanky	58bb420f69	Fix image_metadata argument ordering bug. Add E2E image search test - Image search test seems a little flaky - Interchanged argument was causing inaccurate results earlier	2021-09-30 03:30:47 -07:00
Debanjum Singh Solanky	d5597442f4	Modularize Code. Wrap Search, Model Config in Classes. Add Tests Details - Rename method query_* to query in search_types for standardization - Wrapping Config code in classes simplified mocking test config - Reduce args beings passed to a function by passing it as single argument wrapped in a class - Minimize setup in main.py:__main__. Put most of it into functions These functions can be mocked if required in tests later too Setup Flow: CLI_Args\|Config_YAML -> (Text\|Image)SearchConfig -> (Text\|Image)SearchModel	2021-09-30 02:04:04 -07:00
Debanjum Singh Solanky	f4dd9cd117	Use type specific model for other search types too. Expose them via SearchModels - Wrap Image, Music, Ledger search into the type of SearchModel they use Similar to what was done for notes model by wrapping it's config into an AsymmetricSearchModel. - Use the uber wrapper class to expose all type specific search models	2021-09-29 21:09:42 -07:00
Debanjum Singh Solanky	352d2930ee	Use multiple threads to generate model embeddings. Other minor formating	2021-09-29 20:47:58 -07:00
Debanjum Singh Solanky	e22e0b41e3	Wrap asymmetric search model into SearchModels. Test notes search end-to-end - Wrap asymmetric search model parameters into AsymmetricSearchModel class - Create wrapper for all search type models. Put notes search model into it - Test notes search end-to-end from client API layer to results. Use model build on test data	2021-09-29 20:47:35 -07:00
Debanjum Singh Solanky	cde11a2331	Wrap search type enablement status in a search settings class - Cleaner, more idiomatic usage of a global variable - Simplifies mocking when testing client in pytest as setting wrapped in object rather than a simple type. So passed around by reference	2021-09-29 19:18:33 -07:00
Debanjum Singh Solanky	81ce0cacc3	Only allow supported search types to /search, /regenerate APIs - Use a SearchType to limit types that can be passed by user - FastAPI automatically validates type passed in query param - Available type options show up in Swagger UI, FastAPI docs - controller code looks neater instead of doing string comparisons for type - Test invalid, valid search types via pytest	2021-09-29 19:12:56 -07:00
Debanjum Singh Solanky	5db08c5293	Set query as heading of notes search results in Emacs Org buffer	2021-09-29 13:30:15 -07:00
Debanjum Singh Solanky	fdb60a8dcf	Set Query as Heading of Image Search Results Emacs Buffer	2021-09-16 12:30:06 -07:00
Debanjum Singh Solanky	169ddcc8c6	Make Using XMP Metadata to Enhance Image Search Optional, Configurable - Break the compute embeddings method into separate methods: compute_image_embeddings and compute_metadata_embeddings - If image_metadata_embeddings isn't defined, do not use it to enhance search results. Given image_metadata_embeddings wouldn't be defined if use_xmp_metadata is False, we can avoid unnecessary addition of args to query method	2021-09-16 12:01:05 -07:00
Debanjum Singh Solanky	a4a23d7a72	Batch encode XMP metadata from images too for image_search	2021-09-16 11:11:36 -07:00
Debanjum Singh Solanky	3afe054312	Make image batch size to encode configurable via config.yml	2021-09-16 10:52:31 -07:00
Debanjum Singh Solanky	41c328dae0	Batch encode images to keep memory consumption manageable - Issue: Process would get killed while encoding images for consuming too much memory - Fix: - Encode images in batches and append to image_embeddings - No need to use copy or deep_copy anymore with batch processing. It would earlier throw too many files open error Other Changes: - Use tqdm to see progress even when using batch - See progress bar of encoding independent of verbosity (for now)	2021-09-16 10:15:54 -07:00
Debanjum Singh Solanky	d8abbc0552	Use XMP metadata in images to improve image search - Details - The CLIP model can represent images, text in the same vector space - Enhance CLIP's image understanding by augmenting the plain image with it's text based metadata. Specifically with any subject, description XMP tags on the image - Improve results by combining plain image similarity score with metadata similarity scores for the highest ranked images - Minor Fixes - Convert verbose to integer from bool in image_search. It's already passed as integer from the main program entrypoint - Process images with ".jpeg" extensions too	2021-09-16 08:55:20 -07:00
Debanjum Singh Solanky	0e34c8f493	Allow semantic search on images from Emacs Images are rendered inline a temporary org-mode buffer	2021-09-10 01:14:34 -07:00
Debanjum Singh Solanky	7d5514ecaa	Allow user to override inferred search type with other valid options	2021-09-10 00:58:24 -07:00
Debanjum Singh Solanky	3bdeeb1e19	Autoload main semantic-search function	2021-09-09 22:10:37 -07:00
Debanjum Singh Solanky	f4bde75249	Decouple results shown to user and text the model is trained on - Previously: The text the model was trained on was being used to re-create a semblance of the original org-mode entry. - Now: - Store raw entry as another key:value in each entry json too Only return actual raw org entries in results But create embeddings like before - Also add link to entry in file:<filename>::<line_number> form in property drawer of returned results This can be used to jump to actual entry in it's original file	2021-08-29 06:06:54 -07:00
Debanjum Singh Solanky	7ee3007070	Get ID, QUERY, TYPE, CATEGORY properties from org property drawer when present	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	0263d4d068	Enable semantic search for songs in org-music Org-Music: https://github.com/debanjum/org-music	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	fd7888f3d4	Resolve relative file paths to config YAML file in cli.py	2021-08-29 03:03:37 -07:00
Debanjum Singh Solanky	fc531a1915	Resolve relative file paths to model embeddings in all search types	2021-08-28 22:26:12 -07:00
Debanjum Singh Solanky	4daeddbbda	Enable Semantic Search on Images	2021-08-22 21:42:37 -07:00
Debanjum Singh Solanky	fd217fe8b7	Enable Semantic Search for Beancount transactions	2021-08-22 21:36:06 -07:00
Debanjum Singh Solanky	97263b8209	Move CLI into a separate module. Move CLI tests into a separate file	2021-08-21 19:21:38 -07:00
Debanjum Singh Solanky	78a1f4ebb4	Use YAML file to allow user to configure application. Add tests - YAML Config - Can specify all params[1] earlier being passed via cmd args in config YAML - Can now also configure sentence-transformer models to use etc for search - [1] Config params - org files - compressed entries file config path - embeddings file config path - Include sample_config.yaml - Include sample .org file from this repos readmes - CLI - Configuration Priority: Config via cmd > Config via YAML > Default Config - Test CLI, include test config.yml for the tests - Set default type to None unless set via query param to API Run notes search if search_enabled, also if type is None (default) Prepares for running queries on all search types unless type specified in API query param - Update Readme	2021-08-21 19:07:39 -07:00
Debanjum Singh Solanky	bafc86d583	Add helpers to merge dictionaries and get keys deep inside a dictionary	2021-08-21 18:27:50 -07:00
Debanjum Singh Solanky	252266b62a	Pass type of item via regenerate API. Default type query param to None	2021-08-17 18:25:07 -07:00
Debanjum Singh Solanky	ff7207a6bd	Extract commandline arguments into separate testable method	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	a3a1100be9	Arrange modules in standardized ordering	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	569e30b1c8	Create a few basic tests	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	af9660f28e	Move application files under src directory. Update Readmes - Remove callign asymmetric search script directly command. It doesn't work anymore on calling directly due to internal package import issues	2021-08-17 04:11:03 -07:00

... 44 45 46 47 48 ...

3059 commits