sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-30 02:43:01 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	81ce0cacc3	Only allow supported search types to /search, /regenerate APIs - Use a SearchType to limit types that can be passed by user - FastAPI automatically validates type passed in query param - Available type options show up in Swagger UI, FastAPI docs - controller code looks neater instead of doing string comparisons for type - Test invalid, valid search types via pytest	2021-09-29 19:12:56 -07:00
Debanjum Singh Solanky	150593c776	Update Readme. Acknowledger PyExifTool and Minor Fixes	2021-09-16 12:39:42 -07:00
Debanjum Singh Solanky	fdb60a8dcf	Set Query as Heading of Image Search Results Emacs Buffer	2021-09-16 12:30:06 -07:00
Debanjum Singh Solanky	169ddcc8c6	Make Using XMP Metadata to Enhance Image Search Optional, Configurable - Break the compute embeddings method into separate methods: compute_image_embeddings and compute_metadata_embeddings - If image_metadata_embeddings isn't defined, do not use it to enhance search results. Given image_metadata_embeddings wouldn't be defined if use_xmp_metadata is False, we can avoid unnecessary addition of args to query method	2021-09-16 12:01:05 -07:00
Debanjum Singh Solanky	a4a23d7a72	Batch encode XMP metadata from images too for image_search	2021-09-16 11:11:36 -07:00
Debanjum Singh Solanky	3afe054312	Make image batch size to encode configurable via config.yml	2021-09-16 10:52:31 -07:00
Debanjum Singh Solanky	41c328dae0	Batch encode images to keep memory consumption manageable - Issue: Process would get killed while encoding images for consuming too much memory - Fix: - Encode images in batches and append to image_embeddings - No need to use copy or deep_copy anymore with batch processing. It would earlier throw too many files open error Other Changes: - Use tqdm to see progress even when using batch - See progress bar of encoding independent of verbosity (for now)	2021-09-16 10:15:54 -07:00
Debanjum Singh Solanky	d8abbc0552	Use XMP metadata in images to improve image search - Details - The CLIP model can represent images, text in the same vector space - Enhance CLIP's image understanding by augmenting the plain image with it's text based metadata. Specifically with any subject, description XMP tags on the image - Improve results by combining plain image similarity score with metadata similarity scores for the highest ranked images - Minor Fixes - Convert verbose to integer from bool in image_search. It's already passed as integer from the main program entrypoint - Process images with ".jpeg" extensions too	2021-09-16 08:55:20 -07:00
Debanjum Singh Solanky	0e34c8f493	Allow semantic search on images from Emacs Images are rendered inline a temporary org-mode buffer	2021-09-10 01:14:34 -07:00
Debanjum Singh Solanky	7d5514ecaa	Allow user to override inferred search type with other valid options	2021-09-10 00:58:24 -07:00
Debanjum Singh Solanky	3bdeeb1e19	Autoload main semantic-search function	2021-09-09 22:10:37 -07:00
Debanjum Singh Solanky	f4bde75249	Decouple results shown to user and text the model is trained on - Previously: The text the model was trained on was being used to re-create a semblance of the original org-mode entry. - Now: - Store raw entry as another key:value in each entry json too Only return actual raw org entries in results But create embeddings like before - Also add link to entry in file:<filename>::<line_number> form in property drawer of returned results This can be used to jump to actual entry in it's original file	2021-08-29 06:06:54 -07:00
Debanjum Singh Solanky	7ee3007070	Get ID, QUERY, TYPE, CATEGORY properties from org property drawer when present	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	0263d4d068	Enable semantic search for songs in org-music Org-Music: https://github.com/debanjum/org-music	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	fd7888f3d4	Resolve relative file paths to config YAML file in cli.py	2021-08-29 03:03:37 -07:00
Debanjum Singh Solanky	fc531a1915	Resolve relative file paths to model embeddings in all search types	2021-08-28 22:26:12 -07:00
Debanjum Singh Solanky	74faa34bee	Update sample config to add minimal config for ledger, image search	2021-08-22 21:54:49 -07:00
Debanjum Singh Solanky	8dec58b12a	Update Readme to state can now query beancount transactions, images	2021-08-22 21:50:27 -07:00
Debanjum Singh Solanky	4daeddbbda	Enable Semantic Search on Images	2021-08-22 21:42:37 -07:00
Debanjum Singh Solanky	fd217fe8b7	Enable Semantic Search for Beancount transactions	2021-08-22 21:36:06 -07:00
Debanjum Singh Solanky	97263b8209	Move CLI into a separate module. Move CLI tests into a separate file	2021-08-21 19:21:38 -07:00
Debanjum Singh Solanky	78a1f4ebb4	Use YAML file to allow user to configure application. Add tests - YAML Config - Can specify all params[1] earlier being passed via cmd args in config YAML - Can now also configure sentence-transformer models to use etc for search - [1] Config params - org files - compressed entries file config path - embeddings file config path - Include sample_config.yaml - Include sample .org file from this repos readmes - CLI - Configuration Priority: Config via cmd > Config via YAML > Default Config - Test CLI, include test config.yml for the tests - Set default type to None unless set via query param to API Run notes search if search_enabled, also if type is None (default) Prepares for running queries on all search types unless type specified in API query param - Update Readme	2021-08-21 19:07:39 -07:00
Debanjum Singh Solanky	bafc86d583	Add helpers to merge dictionaries and get keys deep inside a dictionary	2021-08-21 18:27:50 -07:00
Debanjum Singh Solanky	eddbc67358	Document how to install latest version in Readme	2021-08-17 18:27:10 -07:00
Debanjum Singh Solanky	252266b62a	Pass type of item via regenerate API. Default type query param to None	2021-08-17 18:25:07 -07:00
Debanjum Singh Solanky	ff7207a6bd	Extract commandline arguments into separate testable method	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	a3a1100be9	Arrange modules in standardized ordering	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	569e30b1c8	Create a few basic tests	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	af9660f28e	Move application files under src directory. Update Readmes - Remove callign asymmetric search script directly command. It doesn't work anymore on calling directly due to internal package import issues	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	c35c6fb0b3	Reuse asymmetric.setup & input validation from asymmetric & org_to_jsonl Create asymmetric.setup method to - initialize model - generate compressed jsonl - compute embeddings put input_files, input_file_filter validation in org_to_jsonl for reuse in main.py, asymmetic.py	2021-08-17 00:45:40 -07:00
Debanjum Singh Solanky	02a84df37a	Update state vars after regeneration. Minimize time app in inconsistent state	2021-08-16 23:47:33 -07:00
Debanjum Singh Solanky	0509854e14	Replace README.md with README.org. Can be used as notes for testing	2021-08-16 20:00:05 -07:00
Debanjum Singh Solanky	79aff85fcb	Update Readme. No separate SETUP step required. Simpler RUN step - Setup now happens on first run of application - Embeddings can now be regenerated without killing app by calling API	2021-08-16 19:24:04 -07:00
Debanjum Singh Solanky	95bf26a7f2	Set verbosity commandline parameters default value to 0	2021-08-16 19:16:29 -07:00
Debanjum Singh Solanky	04a9a6d62f	Expose API endpoint to (re-)generate embeddings from latest notes - Provides mechanism to update notes from within application - Instead of having to pass the same arguments multiple times Pass it once (or rely on defaults when possible) and let app keep state and location of intermediary files - Allows user to not have to deal with the internals of the application - E.g user doesn't have to specify the jsonl.gz or embeddings file path The app will still put those files in a default location - The user doesn't have to run the generation from the commandline as a separate step	2021-08-16 18:52:38 -07:00
Debanjum Singh Solanky	1c00c33e73	Improve debug output from org_to_jsonl.py script	2021-08-16 18:50:29 -07:00
Debanjum Singh Solanky	2a57156428	Fix org_to_jsonl. Use passed args not global variables in methods. Fix orgnode import	2021-08-16 17:37:44 -07:00
Debanjum Singh Solanky	66238004d8	Use verbosity level instead of bool across application For consistent, more granular verbosity controls across app Allows user to increase verbosity by passing -vvv flags passed to main.py	2021-08-16 17:15:41 -07:00
Debanjum Singh Solanky	adbf157deb	Remove usage of the closure to search_notes as it's not required	2021-08-16 16:52:48 -07:00
Debanjum Singh Solanky	649e5d1327	Allow reuse of get_absolute_path, is_none_or_empty methods - Move them to utils.helper.py for reuse - Import those modules where required - Delete duplicate methods defined in org_to_jsonl.py, asymmetric.py	2021-08-16 16:33:43 -07:00
Debanjum Singh Solanky	9703afb814	Rename search_types to search_type to standardize to singular naming Using singular names for other directories in application already - processor instead of processors - interface instead of interfaces	2021-08-16 16:31:30 -07:00
Debanjum Singh Solanky	19d6678eb1	Allow importing org-to-jsonl as module for reuse To allow importing org-to-jsonl as module - Wrap code in __main__ into a org-to-jsonl method - Rename processor/org-mode to processor/org_mode - Add __init__.py to processor directory	2021-08-16 16:31:30 -07:00
Debanjum Singh Solanky	5f8221f77e	Remove unused verbose argument to collate_results method	2021-08-16 13:54:41 -07:00
Debanjum Singh Solanky	85bf15628d	Use better cmdline argument names. Drop unneeded no-compress argument Can infer to compress or not via the output_file suffix	2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky	d9f60c00bf	Warn if any input files to org-to-json are potentially non org-mode files That is, if the file paths in the input set don't end with .org	2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky	3aa0c30fee	Use absolute file path to open files in org-to-jsonl.py, asymmetric.py Exit script if neither org_files, org_file_filter is present	2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky	e773611558	Remove unused jsonl_file argument from convert_org_entries_to_jsonl	2021-08-16 13:49:35 -07:00
Debanjum Singh Solanky	8b29e272d3	Standardize interface, better default args for org-to-json.py script - Remove non-standard, unnecessary argument for org-directory Pass path each file in org-files and org-files-filter argument directly - Allow shorthand -i, -o for input files, output files - Default to compress, unless user explicitly specifies not to	2021-08-16 11:29:08 -07:00
Debanjum Singh Solanky	7547e90745	Minor doc updates after merging emacs package with main repository	2021-08-16 02:02:26 -07:00
Debanjum Singh Solanky	ec157ea0ff	Add Emacs interface to semantic-search directly to main repository Too much overhead to maintain multiple repositories, especially when the Emacs library for semantic-search is a single file. Import Readme from the emacs-semantic-search repository too	2021-08-16 01:27:46 -07:00

1 2

75 commits