sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-27 17:35:07 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	04610f453a	Include scheduled date, deadline date and close date in repr of org node - Now that excluding the times line from the raw body of node, show it in repr so user can see it for reference - But the model doesn't need to see it for it's embeddings to be confused by	2022-06-17 05:13:48 +03:00
Debanjum Singh Solanky	367d7377df	Ignore scheduled, closed, deadline time and logbook start, end in org node body - Gives cleaner embeddings for semantic search - Hopefully improves results and reduces size, compute	2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky	b77ccadcba	Make property key regex more strict. Property key has to be alphanumeric	2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky	ac9d746444	Fix Tags extraction in Org Node parser - Previous version required two tags at least to work, not sure why - Fixed it to extract all tags, even if only one tag in heading	2022-06-17 04:21:22 +03:00
Debanjum Singh Solanky	fb86be8cd9	Add ID, File+Heading based Links to Org-Mode Entries - Add links to property drawer - This ensures results returned by semantic search contain these links - This allows the user to jump to entry within original file for context - The ID, file+heading based links are more robust to find relevant entry in original file than the line no based link, as edits being done by user to original files between embedding regenerations	2022-06-17 03:11:11 +03:00
Debanjum Singh Solanky	de23fc2051	Revert Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search Sentence Transformer MSMarco Model isn't date aware So no use of adding scheduled, deadline dates to model embeddings for consideration This reverts commit `a2a08d1354`.	2022-06-17 02:57:28 +03:00
Debanjum Singh Solanky	a2a08d1354	Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search	2022-06-17 02:55:27 +03:00
Debanjum Singh Solanky	cfbd5c4ecc	Update global model on regenerate via API	2022-06-17 00:49:06 +03:00
Debanjum	35117af322	Show Demo of Semantic Search in Readme Merge pull request #27 from debanjum/debanjum/add-demo	2022-05-14 01:32:18 -07:00
Debanjum Singh Solanky	2eab256af9	Delete markdown file. It helped upload the demo video to Github	2022-05-14 04:30:20 -04:00
Debanjum Singh Solanky	96c588b7bc	Add demo of semantic search to repository	2022-05-14 04:29:25 -04:00
Debanjum	19f8f85333	Show Demo of Semantic Search in Readme - Use Markdown file to help upload demo to Github - Use generated link from upload into Readme org file	2022-05-14 01:29:13 -07:00
Debanjum Singh Solanky	031d6bddb4	Delete markdown file. It helped upload the demo video to Github	2022-05-14 04:25:17 -04:00
Debanjum Singh Solanky	c78bf84eef	Introduce search api endpoint that auto infers search type intent - Introduce prompt for GPT to automatically extract user's search intent - Expose new search api endpoint to use that to set SearchType being passed to search API - Currently meant as an experimental API to gauge usefulness, extendability. Evaluating for phone or voice use-case	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	8ef7917014	Fix json format passed in prompt to GPT	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	f57b7f65ea	Wrap prompts for GPT in triple quotes to improve prompt readability To prompt improve readability: - Remove newline escape sequence and use actual newline directly - This avoids one long line of text as prompt and - Remove escaping of double quotes	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	1eba7b1c6f	Use empty_escape_sequence constant to strip response text from gpt	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	37bfc956c9	Update Readme Local Development Section	2022-02-27 23:16:58 -05:00
Debanjum Singh Solanky	1c3a1420f8	Update asymmetric extract_entries method to handle uncompressed jsonl This is similar to what was done for the symmetric extract_entries method earlier	2022-02-27 19:03:31 -05:00
Debanjum Singh Solanky	3d8a07f252	Extract empty line escape sequences var into constants file for reuse	2022-02-27 19:01:49 -05:00
Debanjum Singh Solanky	624a3faf92	Update Readme. Improve Organization, Reduce Staleness	2022-02-26 19:04:49 -05:00
Debanjum Singh Solanky	bb5d0d8908	Improve Semantic Search Buffer Names in Emacs - Allow multiple semantic searches buffers to exist simultaneously - Uniquify semantic search buffer namew - Add query and search-type to semantic search buffer name for easier disambiguration, search and find appropriate	2022-02-26 18:30:14 -05:00
Debanjum	6a84ca965a	Merge pull request #25 from debanjum/users/debanjum/improve-semantic-search-on-ledger Improve Extraction and Rendering of Semantic Search on Ledger	2022-02-26 15:18:22 -08:00
Debanjum Singh Solanky	b68558651b	Improve Extraction of Beancount Entries - Only extract entries starting with YYYY-MM-DD from Beancount - Strip Trailing Escape Sequences from Entries	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	b3ac2dd730	Improve Results Rendered on Emacs from Semantic Search on Ledger - Add search query to top of buffer as Beancount comment - Remove trailing ) from response - Separate entries by empty line - Load beancount-mode in semantic search on ledger buffer	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	502c68d4f8	Remove trailling escape sequence in ledger search response entries - Fix loading entries from jsonl in extract_entries method - Only extract Title from jsonl of each entry This is the only thing written to the jsonl for symmetric ledger - This fixes the trailing escape seq in loaded entries - Remove the need for semantic-search.el response reader to do pointless complicated cleanup - Make symmetric_ledger:extract_entries use beancount_to_jsonl:load_jsonl Both methods were doing similar work - Make load_jsonl handle loading entries from both gzip and uncompressed jsonl	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	248aa632c0	Do not throw warning for beancount files with .beancount extension	2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky	76cd63f4bd	Fix count of processed jsonl entries shown to user by ledger processor Count lines not chars	2022-02-26 17:46:06 -05:00
Debanjum Singh Solanky	f08591c880	Set PORT arg when building docker image in the build workflow	2022-01-29 18:11:47 -05:00
Debanjum Singh Solanky	359f25b0a4	Rename publish workflow to build. Add badge to the workflow on Readme	2022-01-29 18:11:47 -05:00
Debanjum Singh Solanky	4add348e3c	Remove context from path to Dockerfile in Github build, push action	2022-01-29 17:16:12 -05:00
Debanjum Singh Solanky	859258864c	Update Readme badge post rename of build.yml to test.yml	2022-01-29 17:10:43 -05:00
Debanjum Singh Solanky	fa685dc37f	Create Github workflow to build, publish docker container to registry - Rename the build workflow to test workflow	2022-01-29 17:08:19 -05:00
Debanjum Singh Solanky	78b76d65a0	Minor fix to notes jsonl file extension in sample_config.yml	2022-01-29 04:13:36 -05:00
Debanjum Singh Solanky	7c773d29ef	Update github workflow to use environment.yml under config/ directory	2022-01-29 03:43:34 -05:00
Debanjum Singh Solanky	c31abad0a6	Mount embeddings to /data/embeddings for directory naming consistency - Keeps directory paths consistent between host and container volumes - Consistency simplifies documentation and updates required to setup sample_config.yml for local installation	2022-01-29 03:24:02 -05:00
Debanjum Singh Solanky	b0067fc32e	Store docker, conda, semantic-search configuration in a config directory - Improves organization of config files required for application - Declutters the application root directory from configs	2022-01-29 02:41:11 -05:00
Debanjum Singh Solanky	79c2224eaa	Improve test data organization and update correspoding conftests - Put test data for each content type into separate directories - Makes config.yml for docker and local host consistent - Prepending tests to /data in sample_config.yml makes application run on local host using test data - Allows mounting separate volume for each content type in docker-compose - Ignore gitignore to only add tests content, not generated models or embeddings	2022-01-29 02:03:17 -05:00
Debanjum Singh Solanky	3e889760c7	Merge sample_config, docker_sample_config yml into a single sample_config.yml - Update readme to indicate how to update the new sample_config to run on test data	2022-01-29 01:32:12 -05:00
Debanjum Singh Solanky	2bc2780501	Mention the experimental /chat API interacts with OpenAI's API	2022-01-29 00:11:40 -05:00
Debanjum Singh Solanky	6ed667aed0	Add Troubleshooting Section, Minor Fixes to Readme	2022-01-29 00:11:40 -05:00
Debanjum	d943d2be80	Merge pull request #21 from debanjum/saba/dockerize Add Docker support to semantic-search	2022-01-28 20:27:40 -08:00
Saba	1ba7fa66e5	Update README and default folders in docker_sample_config.yml - Add instruction to using Docker with README - Use the ./tests/data folder in docker_sample_conifg.yml so it can work right away for users	2022-01-28 23:20:50 -05:00
Saba	52e701b3c2	Simplify Dockerfile by removing multibuild - Install exiftool dependency directly in the miniconda image	2022-01-24 21:54:10 -05:00
Saba	33bc62dc19	Fix type of use_xmp_metadata to be bool, rather than str	2022-01-24 21:53:26 -05:00
Saba	9fb410fc25	Clean up docker_sample_config.yml - Uncomment other search types - Explain the file prefixes behavior and how it interfaces with the docker image	2022-01-24 14:11:38 -05:00
Saba	9802023c79	Clean up docker-compose - Mount the local directory to /app - Reformat the file paths to generically indicate what their purpose is - Add comments to assist users who wasnt to modify properties themselves	2022-01-24 14:10:18 -05:00
Saba	4ae8c15170	Clean the Dockerfile - Use /app as the working directory - Clarify comment to explain why the ENTRYPOINT is constructed as it is - Move explanations for the argument to docker-compose, where it's set - Copy required artifacts from the first build image into the subsequent one (exiftool)	2022-01-24 14:08:55 -05:00
Saba	66d08ab5df	Rename web to server in docker-compose.yml	2022-01-24 00:14:01 -05:00
Saba	77fa8718d9	Working example with docker-compose Still need quite a bit of clean-up, but this adds a working docker-compose + Dockerfile setup	2022-01-23 23:44:38 -05:00

... 41 42 43 44 45 ...

2339 commits