sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-25 00:15:07 +01:00

Author	SHA1	Message	Date
Saba	64645c3ac1	Begin type checking/input validation effort	2021-11-27 21:47:56 -05:00
Saba	9a0264b7fc	Add a dummy POST config endpoint, integrate with editable UI	2021-11-27 20:36:03 -05:00
Saba	f3b03ea5b7	Make raw data reactive to changes	2021-11-27 19:17:15 -05:00
Debanjum Singh Solanky	67c3cd7372	Wire up GPT understand method to /chat API. Log conversation metadata too	2021-11-28 00:04:39 +05:30
Saba	3db06eee3f	Basic example of serving conifg as JSON and retriving on button click	2021-11-27 10:49:33 -05:00
Saba	3d4471e107	Merge branch 'master' of github.com:debanjum/semantic-search into saba/configui	2021-11-27 08:52:48 -05:00
Debanjum Singh Solanky	ccfb97e1a7	Wire up minimal conversation processor. Expose it over /chat API endpoint Ensure conversation history persists across application restart	2021-11-27 18:12:01 +05:30
Debanjum Singh Solanky	a99b4b3434	Make conversation processor configurable	2021-11-27 18:12:01 +05:30
Debanjum Singh Solanky	d4e1120b22	Add GPT based conversation processor to understand intent and converse with user - Allow conversing with user using GPT's contextually aware, generative capability - Extract metadata, user intent from user's messages using GPT's general understanding	2021-11-27 18:12:01 +05:30
Saba	baee52648d	Set up basic ui page with no functionality	2021-11-26 14:51:11 -05:00
debanjum	46661b3057	Ensure top_k never more than total entries to run symmetric search on	2021-11-16 11:32:21 -08:00
debanjum	8c858d1a94	Reduce symmetric search results for cross-encoder to re-rank to improve search speed	2021-11-16 11:31:19 -08:00
Debanjum Singh Solanky	f3fd5ae978	Improve code comments. Do not import unused modules in asymmetric search	2021-11-17 00:58:31 +05:30
Debanjum Singh Solanky	8cf2465e8e	Ensure top_k never more than total entries to search from	2021-11-17 00:56:31 +05:30
Debanjum Singh Solanky	4d37ace3d6	Reduce search results for cross-encoder to re-rank to improve search speed Search time on my notes reduced from 14s to 4s. Cross-encoder re-ranking step takes majority time, not the cosine similarity search	2021-11-17 00:50:28 +05:30
Debanjum Singh Solanky	1832e418e5	Use raw string for regex in orgnode to fix deprecation warning	2021-10-02 17:38:31 -07:00
Debanjum Singh Solanky	f59e321419	Update CLIP model load path	2021-10-02 16:50:06 -07:00
Debanjum Singh Solanky	c47a8cdf16	Allow configuring host, port or unix socket of server via CLI	2021-10-02 16:16:33 -07:00
Debanjum Singh Solanky	516f28b082	Merge branch 'master' of github.com:debanjum/semantic-search	2021-09-30 04:17:32 -07:00
Debanjum Singh Solanky	d2905c4be6	Move tests out to project root. Use absolute import in project tests/ directory in project root is more standard. Just had to use absolute path for internal module imports to get it to work	2021-09-30 04:12:14 -07:00
Debanjum Singh Solanky	58bb420f69	Fix image_metadata argument ordering bug. Add E2E image search test - Image search test seems a little flaky - Interchanged argument was causing inaccurate results earlier	2021-09-30 03:30:47 -07:00
Debanjum Singh Solanky	d5597442f4	Modularize Code. Wrap Search, Model Config in Classes. Add Tests Details - Rename method query_* to query in search_types for standardization - Wrapping Config code in classes simplified mocking test config - Reduce args beings passed to a function by passing it as single argument wrapped in a class - Minimize setup in main.py:__main__. Put most of it into functions These functions can be mocked if required in tests later too Setup Flow: CLI_Args\|Config_YAML -> (Text\|Image)SearchConfig -> (Text\|Image)SearchModel	2021-09-30 02:04:04 -07:00
Debanjum Singh Solanky	f4dd9cd117	Use type specific model for other search types too. Expose them via SearchModels - Wrap Image, Music, Ledger search into the type of SearchModel they use Similar to what was done for notes model by wrapping it's config into an AsymmetricSearchModel. - Use the uber wrapper class to expose all type specific search models	2021-09-29 21:09:42 -07:00
Debanjum Singh Solanky	352d2930ee	Use multiple threads to generate model embeddings. Other minor formating	2021-09-29 20:47:58 -07:00
Debanjum Singh Solanky	e22e0b41e3	Wrap asymmetric search model into SearchModels. Test notes search end-to-end - Wrap asymmetric search model parameters into AsymmetricSearchModel class - Create wrapper for all search type models. Put notes search model into it - Test notes search end-to-end from client API layer to results. Use model build on test data	2021-09-29 20:47:35 -07:00
Debanjum Singh Solanky	cde11a2331	Wrap search type enablement status in a search settings class - Cleaner, more idiomatic usage of a global variable - Simplifies mocking when testing client in pytest as setting wrapped in object rather than a simple type. So passed around by reference	2021-09-29 19:18:33 -07:00
Debanjum Singh Solanky	81ce0cacc3	Only allow supported search types to /search, /regenerate APIs - Use a SearchType to limit types that can be passed by user - FastAPI automatically validates type passed in query param - Available type options show up in Swagger UI, FastAPI docs - controller code looks neater instead of doing string comparisons for type - Test invalid, valid search types via pytest	2021-09-29 19:12:56 -07:00
Debanjum Singh Solanky	5db08c5293	Set query as heading of notes search results in Emacs Org buffer	2021-09-29 13:30:15 -07:00
Debanjum Singh Solanky	fdb60a8dcf	Set Query as Heading of Image Search Results Emacs Buffer	2021-09-16 12:30:06 -07:00
Debanjum Singh Solanky	169ddcc8c6	Make Using XMP Metadata to Enhance Image Search Optional, Configurable - Break the compute embeddings method into separate methods: compute_image_embeddings and compute_metadata_embeddings - If image_metadata_embeddings isn't defined, do not use it to enhance search results. Given image_metadata_embeddings wouldn't be defined if use_xmp_metadata is False, we can avoid unnecessary addition of args to query method	2021-09-16 12:01:05 -07:00
Debanjum Singh Solanky	a4a23d7a72	Batch encode XMP metadata from images too for image_search	2021-09-16 11:11:36 -07:00
Debanjum Singh Solanky	3afe054312	Make image batch size to encode configurable via config.yml	2021-09-16 10:52:31 -07:00
Debanjum Singh Solanky	41c328dae0	Batch encode images to keep memory consumption manageable - Issue: Process would get killed while encoding images for consuming too much memory - Fix: - Encode images in batches and append to image_embeddings - No need to use copy or deep_copy anymore with batch processing. It would earlier throw too many files open error Other Changes: - Use tqdm to see progress even when using batch - See progress bar of encoding independent of verbosity (for now)	2021-09-16 10:15:54 -07:00
Debanjum Singh Solanky	d8abbc0552	Use XMP metadata in images to improve image search - Details - The CLIP model can represent images, text in the same vector space - Enhance CLIP's image understanding by augmenting the plain image with it's text based metadata. Specifically with any subject, description XMP tags on the image - Improve results by combining plain image similarity score with metadata similarity scores for the highest ranked images - Minor Fixes - Convert verbose to integer from bool in image_search. It's already passed as integer from the main program entrypoint - Process images with ".jpeg" extensions too	2021-09-16 08:55:20 -07:00
Debanjum Singh Solanky	0e34c8f493	Allow semantic search on images from Emacs Images are rendered inline a temporary org-mode buffer	2021-09-10 01:14:34 -07:00
Debanjum Singh Solanky	7d5514ecaa	Allow user to override inferred search type with other valid options	2021-09-10 00:58:24 -07:00
Debanjum Singh Solanky	3bdeeb1e19	Autoload main semantic-search function	2021-09-09 22:10:37 -07:00
Debanjum Singh Solanky	f4bde75249	Decouple results shown to user and text the model is trained on - Previously: The text the model was trained on was being used to re-create a semblance of the original org-mode entry. - Now: - Store raw entry as another key:value in each entry json too Only return actual raw org entries in results But create embeddings like before - Also add link to entry in file:<filename>::<line_number> form in property drawer of returned results This can be used to jump to actual entry in it's original file	2021-08-29 06:06:54 -07:00
Debanjum Singh Solanky	7ee3007070	Get ID, QUERY, TYPE, CATEGORY properties from org property drawer when present	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	0263d4d068	Enable semantic search for songs in org-music Org-Music: https://github.com/debanjum/org-music	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	fd7888f3d4	Resolve relative file paths to config YAML file in cli.py	2021-08-29 03:03:37 -07:00
Debanjum Singh Solanky	fc531a1915	Resolve relative file paths to model embeddings in all search types	2021-08-28 22:26:12 -07:00
Debanjum Singh Solanky	4daeddbbda	Enable Semantic Search on Images	2021-08-22 21:42:37 -07:00
Debanjum Singh Solanky	fd217fe8b7	Enable Semantic Search for Beancount transactions	2021-08-22 21:36:06 -07:00
Debanjum Singh Solanky	97263b8209	Move CLI into a separate module. Move CLI tests into a separate file	2021-08-21 19:21:38 -07:00
Debanjum Singh Solanky	78a1f4ebb4	Use YAML file to allow user to configure application. Add tests - YAML Config - Can specify all params[1] earlier being passed via cmd args in config YAML - Can now also configure sentence-transformer models to use etc for search - [1] Config params - org files - compressed entries file config path - embeddings file config path - Include sample_config.yaml - Include sample .org file from this repos readmes - CLI - Configuration Priority: Config via cmd > Config via YAML > Default Config - Test CLI, include test config.yml for the tests - Set default type to None unless set via query param to API Run notes search if search_enabled, also if type is None (default) Prepares for running queries on all search types unless type specified in API query param - Update Readme	2021-08-21 19:07:39 -07:00
Debanjum Singh Solanky	bafc86d583	Add helpers to merge dictionaries and get keys deep inside a dictionary	2021-08-21 18:27:50 -07:00
Debanjum Singh Solanky	252266b62a	Pass type of item via regenerate API. Default type query param to None	2021-08-17 18:25:07 -07:00
Debanjum Singh Solanky	ff7207a6bd	Extract commandline arguments into separate testable method	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	a3a1100be9	Arrange modules in standardized ordering	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	569e30b1c8	Create a few basic tests	2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky	af9660f28e	Move application files under src directory. Update Readmes - Remove callign asymmetric search script directly command. It doesn't work anymore on calling directly due to internal package import issues	2021-08-17 04:11:03 -07:00

1 2 3

102 commits