sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-29 10:23:02 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	7677465f23	Fix passing of device to setup method in /reload, /regenerate API - Use local variable to pass device to asymmetric.setup method via /reload, /regenerate API - Set default argument to torch.device('cpu') instead of 'cpu' to be more formal	2022-06-30 01:32:56 +04:00
Debanjum Singh Solanky	eda4b65ddb	Improve Query Speed. Normalize Embeddings, Moving them to Cuda GPU - Move embeddings to CUDA GPU for compute, when available - Normalize embeddings and Use Dot Product instead of Cosine	2022-06-30 00:59:57 +04:00
Debanjum Singh Solanky	b89fc2f4ac	Add /reload API to reload model embeddings and entries from file - The reload API adds the ability to separate out the loading of embeddings from file without having to restart app or (re-)generate embeddings - Before this the only way to load model from file was by restarting app - The other way to reload the model embeddings by regenerating them was to expensive for larger datasets - This unlocks at least 1 use-case, where - we regenerate model via an app instance running on a separate server and - just reload the generated embeddings on the client device - This allows us to offload the expensive embedding generation compute to a background server while letting - This avoids having to (re-)restart application on client device or be forced to generate embeddings on the client device itself - But it requires the model relevant files to be synced to the client device This can be done with any file syncing application like Syncthing - We can then call /regenerate on server and /reload client on a regular schedule to keep our data up to date on semantic search	2022-06-29 23:47:17 +04:00
Debanjum Singh Solanky	cfbd5c4ecc	Update global model on regenerate via API	2022-06-17 00:49:06 +03:00
Debanjum Singh Solanky	c78bf84eef	Introduce search api endpoint that auto infers search type intent - Introduce prompt for GPT to automatically extract user's search intent - Expose new search api endpoint to use that to set SearchType being passed to search API - Currently meant as an experimental API to gauge usefulness, extendability. Evaluating for phone or voice use-case	2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky	510faa1904	Save Image Search Model to Disk	2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky	934ec233b0	Add Search Config for Symmetric Model. Save Model to Disk	2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky	b63026d97c	Save Asymmetric Search Model to Disk - Improve application load time - Remove dependence on internet to startup application and perform semantic search	2022-01-14 17:36:27 -05:00
Debanjum Singh Solanky	5a686b7be9	Add logs for chat bot in verbose mode	2022-01-12 10:35:52 -05:00
Debanjum Singh Solanky	6dc2a99d35	Merge branch 'master' of github.com:debanjum/semantic-search into add-summarize-capability-to-chat-bot - Fix openai_api_key being set in ConfigProcessorConfig - Merge addition of config UI and config instantiation updates	2021-12-20 13:30:42 +05:30
Debanjum Singh Solanky	65da7daf1f	Load, Save Conversation Session Summaries to Log. s/chat_log/chat_session Conversation logs structure now has session info too instead of just chat info Session info will allow loading past conversation summaries as context for AI in new conversations { "session": [ { "summary": <chat_session_summary>, "session-start": <session_start_index_in_chat_log>, "session-end": <session_end_index_in_chat_log> }], "chat": [ { "intent": <intent-object> "trigger-emotion": <emotion-triggered-by-message> "by": <AI\|Human> "message": <chat_message> "created": <message_created_date> }] }	2021-12-15 10:17:07 +05:30
Saba	97a6dfaa1e	Use default value False for verbose parameter, and small changes Pass config as parameter to initialize_search, change name of API methods to handle config CRUD operations, and initalize config to FullConfig	2021-12-11 14:13:14 -05:00
Saba	d65190c3ee	Update unit tests, files with removing model suffix to config types	2021-12-09 08:50:38 -05:00
Debanjum Singh Solanky	0ac1e5f372	Summarize chat logs and notes returned by semantic search via /chat API	2021-12-08 02:34:07 +05:30
Saba	10e4065e05	Consolidate the search config models and pass verbose as a top level flag	2021-12-04 11:43:48 -05:00
Saba	7fcc8d2cef	Add null check for processor config	2021-12-04 10:11:00 -05:00
Saba	7ca4fc3453	Resolve mrege conflicts with updated processor conversation data model	2021-11-28 16:22:52 -05:00
Saba	5d50487d83	Linting New line at end of config.html Remove debug print statement	2021-11-28 13:32:56 -05:00
Saba	6f466c8d99	Use global config and add a regenerate button to the config ui' && git push	2021-11-28 13:28:22 -05:00
Saba	34d1e4199c	Use alias generator when deserializing the config file	2021-11-28 13:05:48 -05:00
Saba	19b81e82f0	Write back to the raw config.yml file on update	2021-11-28 12:34:40 -05:00
Saba	8837b02de6	dump updated config to a yaml file	2021-11-28 12:26:07 -05:00
Saba	5b80b87379	Streamline None checking in initialize_search	2021-11-28 12:05:04 -05:00
Saba	bf8ae31e6a	Streamline None checking in initialize_search	2021-11-28 11:59:45 -05:00
Saba	da52433d89	Update to re-use the raw config base models in config.py as well	2021-11-28 11:57:33 -05:00
Saba	66183cc298	Working API request body parsing to /post config!	2021-11-28 11:12:26 -05:00
Debanjum Singh Solanky	1785047ea6	Improve understand primer and load understand response as dict	2021-11-28 13:04:16 +05:30
Saba	64645c3ac1	Begin type checking/input validation effort	2021-11-27 21:47:56 -05:00
Saba	9a0264b7fc	Add a dummy POST config endpoint, integrate with editable UI	2021-11-27 20:36:03 -05:00
Saba	f3b03ea5b7	Make raw data reactive to changes	2021-11-27 19:17:15 -05:00
Debanjum Singh Solanky	67c3cd7372	Wire up GPT understand method to /chat API. Log conversation metadata too	2021-11-28 00:04:39 +05:30
Saba	3db06eee3f	Basic example of serving conifg as JSON and retriving on button click	2021-11-27 10:49:33 -05:00
Saba	3d4471e107	Merge branch 'master' of github.com:debanjum/semantic-search into saba/configui	2021-11-27 08:52:48 -05:00
Debanjum Singh Solanky	ccfb97e1a7	Wire up minimal conversation processor. Expose it over /chat API endpoint Ensure conversation history persists across application restart	2021-11-27 18:12:01 +05:30
Saba	baee52648d	Set up basic ui page with no functionality	2021-11-26 14:51:11 -05:00
Debanjum Singh Solanky	c47a8cdf16	Allow configuring host, port or unix socket of server via CLI	2021-10-02 16:16:33 -07:00
Debanjum Singh Solanky	d2905c4be6	Move tests out to project root. Use absolute import in project tests/ directory in project root is more standard. Just had to use absolute path for internal module imports to get it to work	2021-09-30 04:12:14 -07:00
Debanjum Singh Solanky	d5597442f4	Modularize Code. Wrap Search, Model Config in Classes. Add Tests Details - Rename method query_* to query in search_types for standardization - Wrapping Config code in classes simplified mocking test config - Reduce args beings passed to a function by passing it as single argument wrapped in a class - Minimize setup in main.py:__main__. Put most of it into functions These functions can be mocked if required in tests later too Setup Flow: CLI_Args\|Config_YAML -> (Text\|Image)SearchConfig -> (Text\|Image)SearchModel	2021-09-30 02:04:04 -07:00
Debanjum Singh Solanky	f4dd9cd117	Use type specific model for other search types too. Expose them via SearchModels - Wrap Image, Music, Ledger search into the type of SearchModel they use Similar to what was done for notes model by wrapping it's config into an AsymmetricSearchModel. - Use the uber wrapper class to expose all type specific search models	2021-09-29 21:09:42 -07:00
Debanjum Singh Solanky	e22e0b41e3	Wrap asymmetric search model into SearchModels. Test notes search end-to-end - Wrap asymmetric search model parameters into AsymmetricSearchModel class - Create wrapper for all search type models. Put notes search model into it - Test notes search end-to-end from client API layer to results. Use model build on test data	2021-09-29 20:47:35 -07:00
Debanjum Singh Solanky	cde11a2331	Wrap search type enablement status in a search settings class - Cleaner, more idiomatic usage of a global variable - Simplifies mocking when testing client in pytest as setting wrapped in object rather than a simple type. So passed around by reference	2021-09-29 19:18:33 -07:00
Debanjum Singh Solanky	81ce0cacc3	Only allow supported search types to /search, /regenerate APIs - Use a SearchType to limit types that can be passed by user - FastAPI automatically validates type passed in query param - Available type options show up in Swagger UI, FastAPI docs - controller code looks neater instead of doing string comparisons for type - Test invalid, valid search types via pytest	2021-09-29 19:12:56 -07:00
Debanjum Singh Solanky	169ddcc8c6	Make Using XMP Metadata to Enhance Image Search Optional, Configurable - Break the compute embeddings method into separate methods: compute_image_embeddings and compute_metadata_embeddings - If image_metadata_embeddings isn't defined, do not use it to enhance search results. Given image_metadata_embeddings wouldn't be defined if use_xmp_metadata is False, we can avoid unnecessary addition of args to query method	2021-09-16 12:01:05 -07:00
Debanjum Singh Solanky	3afe054312	Make image batch size to encode configurable via config.yml	2021-09-16 10:52:31 -07:00
Debanjum Singh Solanky	d8abbc0552	Use XMP metadata in images to improve image search - Details - The CLIP model can represent images, text in the same vector space - Enhance CLIP's image understanding by augmenting the plain image with it's text based metadata. Specifically with any subject, description XMP tags on the image - Improve results by combining plain image similarity score with metadata similarity scores for the highest ranked images - Minor Fixes - Convert verbose to integer from bool in image_search. It's already passed as integer from the main program entrypoint - Process images with ".jpeg" extensions too	2021-09-16 08:55:20 -07:00
Debanjum Singh Solanky	0263d4d068	Enable semantic search for songs in org-music Org-Music: https://github.com/debanjum/org-music	2021-08-29 06:06:28 -07:00
Debanjum Singh Solanky	4daeddbbda	Enable Semantic Search on Images	2021-08-22 21:42:37 -07:00
Debanjum Singh Solanky	fd217fe8b7	Enable Semantic Search for Beancount transactions	2021-08-22 21:36:06 -07:00
Debanjum Singh Solanky	97263b8209	Move CLI into a separate module. Move CLI tests into a separate file	2021-08-21 19:21:38 -07:00
Debanjum Singh Solanky	78a1f4ebb4	Use YAML file to allow user to configure application. Add tests - YAML Config - Can specify all params[1] earlier being passed via cmd args in config YAML - Can now also configure sentence-transformer models to use etc for search - [1] Config params - org files - compressed entries file config path - embeddings file config path - Include sample_config.yaml - Include sample .org file from this repos readmes - CLI - Configuration Priority: Config via cmd > Config via YAML > Default Config - Test CLI, include test config.yml for the tests - Set default type to None unless set via query param to API Run notes search if search_enabled, also if type is None (default) Prepares for running queries on all search types unless type specified in API query param - Update Readme	2021-08-21 19:07:39 -07:00

1 2

54 commits