sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-27 17:35:07 +01:00

Author	SHA1	Message	Date
Debanjum	d153d420fc	Improve Latency of Explicit Filter ### Goal - Improve explicit filter latency to work better with incremental search ### Reasons for High Explicit Filter Latency - Deleting entries to be excluded from existing list of entries, embeddings - Explicit filtering on partial words during incremental search - Creating word set for all entries on the fly during query - Deep copying of entries, embeddings before applying filter ### Improvement Details - Major - `191a656` Use word to entry map, list comprehension to speed up explicit filter - Use list comprehension and `torch.index_select` methods - to speed selection of entries, embedding tensors satisfying filter - avoid deep copy and direct manipulation of entries, embeddings - Use word to entry map and set operations to mark entries satisfying inclusion, exclusion filters - `c7de57b` Pre-compute entry word sets to improve explicit filter query performance - `3308e68` Cache explicitly filtered entries, embeddings by required, blocked words - `cdcee89` Wrap explicit filter words in quotes to trigger filter - E.g `+"word_to_include"` instead of `+word_to_include` - Signals explicit filter term completed - Prevents latency due to incremental search with explicit filtering on partial terms - Minor - `28d3dc1` Deep copy entries, embeddings in filters. Defer till actual filtering - `8d9f507` Load entries_by_word_set from file only once on first load of explicit filter - `546fad5` Use regex to check for and extract include, exclude filter words from query - `b7d259b` Test Explicit Include, Exclude Filters ### Results - Improve exclude word filter latency from 20s+ to 0.02s on 120K line notes corpus	2022-09-04 13:55:17 +00:00
Debanjum Singh Solanky	6087862521	Use LRU helper class for explicit filter cache	2022-09-04 16:42:28 +03:00
Debanjum Singh Solanky	8f3326c8d4	Create LRU helper class for caching	2022-09-04 16:31:46 +03:00
Debanjum Singh Solanky	191a656ed7	Use word to entry map, list comprehension to speed up explicit filter - Code Changes - Use list comprehension and `torch.index_select' methods - to speed selection of entries, embedding tensors satisfying filter - avoid deep copy of entries, embeddings - avoid updating existing lists (of entries, embeddings) - Use word to entry map and set operations to mark entries satisfying inclusion, exclusion filters - Results - Speed up explicit filtering by two orders of magnitude - Improve consistency of speed up across inclusion and exclusion filtering	2022-09-04 15:22:35 +03:00
Debanjum Singh Solanky	28d3dc1434	Deep copy entries, embeddings in filters. Defer till actual filtering - Only the filter knows when entries, embeddings are to be manipulated. So move the responsibility to deep copy before manipulating entries, embeddings to the filters - Create deep copy in filters. Avoids creating deep copy of entries, embeddings when filter results are being loaded from cache etc	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	3308e68edf	Cache explicitly filtered entries, embeddings by required, blocked words	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	cdcee89ae5	Wrap words in quotes to trigger explicit filter from query - Do not run the more expensive explicit filter until the word to be filtered is completed by user. This requires an end sequence marker to identify end of explicit word filter to trigger filtering - Space isn't a good enough delimiter as the explicit filter could be at the end of the query in which case no space	2022-09-04 02:38:57 +03:00
Debanjum Singh Solanky	8d9f507df3	Load entries_by_word_set from file only once on first load of explicit filter	2022-09-04 00:37:37 +03:00
Debanjum Singh Solanky	858d86075b	Use regexes to check if any explicit filters in query. Test can_filter	2022-09-03 23:47:28 +03:00
Debanjum Singh Solanky	546fad570d	Use regex to extract include, exclude filter words from query	2022-09-03 23:41:43 +03:00
Debanjum Singh Solanky	b7d259b1ec	Test Explicit Include, Exclude Filters	2022-09-03 23:41:43 +03:00
Debanjum Singh Solanky	ffb8e3988e	Use Python Logging Framework to Time Performance of Explicit Filter	2022-09-03 22:24:10 +03:00
Debanjum Singh Solanky	30c3eb372a	Update Tests to Configure Filters and Setup Text Search	2022-09-03 22:24:10 +03:00
Debanjum Singh Solanky	c7de57b8ea	Pre-compute entry word sets to improve explicit filter query performance	2022-09-03 16:16:31 +03:00
Debanjum Singh Solanky	094bd18e57	Use python standard logging framework for app logs - Stop passing verbose flag around app methods - Minor remap of verbosity levels to match python logging framework levels - verbose = 0 maps to logging.WARN - verbose = 1 maps to logging.INFO - verbose >=2 maps to logging.DEBUG - Minor clean-up of app: unused modules, conversation file opening	2022-09-03 14:43:32 +03:00
Debanjum Singh Solanky	d0531c3064	Update URL QueryParam when Type set in Dropdown on Web Interface - This also pushes the updated URL state to history - Allows jumping back to the web interface after clicking on an image and having the type set to image search - Previously type would get reset to the default search type on jumping back	2022-08-28 12:22:22 +03:00
Debanjum Singh Solanky	2eae32d743	Time, Log Image Search Performance	2022-08-28 00:28:46 +03:00
Debanjum Singh Solanky	c3ca99841b	Scale down images to generate image embeddings faster, with less memory - CLIP doesn't need full size images for generating embeddings with decent search results. The sentence transformers docs use images scaled to 640px width - Benefits - Normalize image sizes - Increase image embeddings generation speed - Decrease memory usage while generating embeddings from images	2022-08-24 14:09:02 +03:00
Debanjum Singh Solanky	ea4fdd9134	Fix logic to ignore notes with no body. Add tests to prevent regression - Notes with empty newlines in body were not being ignored - Add regression tests to avoid above regression in org_to_jsonl conversion	2022-08-21 19:41:40 +03:00
Debanjum Singh Solanky	5e107eedc0	Rename test_asymmetric_search to now more appropriate test_text_search	2022-08-21 18:36:14 +03:00
Debanjum	144986ebfd	Fix, Improve Desktop GUI Splash Screen and Main Window - `5e6625a` Fix file browser to not add empty line when no file/dir selected - `8098b8c` Bring main window to Top when open from System Tray - `1c122a8` Place window near top so buttons are not hidden by OS bottom bar - `dfe2546` Set Khoj Icon on Main Desktop Window - `1b1f8f9` Move Splash screen text below icon. Set the text color to black - `450f644` Fix path to remove shared libraries when packaging the Windows app	2022-08-20 23:19:01 +00:00
Debanjum	d6f624dc75	Use MPS, CUDA to GPU Accelerate Query Performance - Load Models and Embeddings onto GPU if available - Use MPS for GPU acceleration when available - Note: Support for [MPS](https://developer.apple.com/metal/) in Pytorch is currently in v1.13.0 nightly builds. See [Announcement](https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/) - Users will have to wait for PyTorch MPS support to land in stable builds - Until then code can be tuned and tested for GPU acceleration on newer Macs - Re-enable Tests for Image Search	2022-08-20 23:16:44 +00:00
Debanjum Singh Solanky	5e6625ac68	Fix file browser to not add empty line when no file/dir selected - When no file selected in file browser an empty line/entry gets added to input entries list - Bug got introduced due to insufficient update on change to add instead of insert - Update is_none_or_empty helper method to also check for empty string	2022-08-21 02:03:28 +03:00
Debanjum Singh Solanky	8098b8c3a8	Bring Configure Window to Top when Opened from System Tray - Previously the window could get hidden behind other app windows when user clicked configure from the system tray	2022-08-20 23:38:43 +03:00
Debanjum Singh Solanky	1c122a8a91	Place window near top so buttons are not hidden by OS bottom bar	2022-08-20 22:38:06 +03:00
Debanjum Singh Solanky	dfe2546c04	Set Khoj Icon on Main Desktop Window	2022-08-20 20:36:15 +03:00
Debanjum Singh Solanky	1b1f8f9272	Move Splash screen text below icon. Set the text color to black - It is currently on top of the splash screen icon - Ballpark pixels to move such that text positioned below icon - Test later to verify if text is positioned fine now	2022-08-20 20:32:31 +03:00
Debanjum Singh Solanky	450f6441e2	Fix path to remove shared libraries when packaging the Windows app	2022-08-20 20:29:33 +03:00
Debanjum Singh Solanky	e6abe76875	Upgrade torch, torchvision package versions	2022-08-20 14:45:43 +03:00
Debanjum Singh Solanky	972523e8a9	Re-enable tests for image search Verify if recent fixes resolve test flakiness	2022-08-20 14:44:53 +03:00
Debanjum Singh Solanky	82d2891765	Do not pass ML compute `device' around as argument to search funcs - It is a non-user configurable, app state that is set on app start - Reduce passing unneeded arguments around. Just set device where required by looking for ML compute device in global state	2022-08-20 14:44:53 +03:00
Debanjum Singh Solanky	acc9091260	Use MPS on Apple Mac M1 to GPU accelerate Encode, Query Performance - Note: Support for MPS in Pytorch is currently in v1.13.0 nightly builds - Users will have to wait for PyTorch MPS support to land in stable builds - Until then the code can be tweaked and tested to make use of the GPU acceleration on newer Macs	2022-08-20 14:44:06 +03:00
Debanjum Singh Solanky	7de9c58a1c	Load models, corpus embeddings onto GPU device for text search, if available - Pass device to load models onto from app state. - SentenceTransformer models accept device to load models onto during initialization - Pass device to load corpus embeddings onto from app state	2022-08-20 14:04:18 +03:00
Debanjum Singh Solanky	7fe3e844d2	Fix setup of Reproducible Build environment in publish workflow - Note: Reproducible builds have not been validated. This is just preliminary work to get there. Further testing and fixes maybe required	2022-08-19 21:00:12 +03:00
Debanjum Singh Solanky	dc8dcc94a6	Bump Khoj.el package version to 0.1.6	2022-08-19 20:48:42 +03:00
Debanjum	a7b4d58865	Fix Image Search and Improve Desktop App ### Fix Image Search - Do not use XMP metadata by default for image search - It seems to be buggy currently. The returned results do not make sense with XMP metadata enabled ### Fix Image Search using Desktop App - Fix configuring Image Search via Desktop GUI - Set `input-directories`, instead of unused `input-files` for `content-type.image` in `khoj.yml` - Fix running Image Search via Desktop apps. - Previously the transformers wasn't getting packaged into the app by pyinstaller - This is required by image search to run. So the desktop apps would fail to start when image search was enabled - Resolves #68 - Append selected files, directories via "Add" button in Desktop GUI - This allows selecting multiple files, directories using Desktop GUI - Previously selecting multiple image directories had to be entered manually ### Improve Desktop App - Show Splash Screen to Desktop on App Initialization - The app takes a while to load during first run - A splash screen signals that app is loading and not being unresponsive - Note: _Pyinstaller only supports splash screens on Windows, Linux. Not on Macs._ - Add Khoj icon to the Windows, Linux app. Windows expects a `.ico` icon type - Only exclude `libtorch_{cuda, cpu, python}` on Linux machine - Seems those libraries are being used on Mac (and maybe Windows). - Linux is where the app size benefits from removing these is maximum anyway - Fix PyInstaller Warnings on App Start - The warning show up as annoying error popups on Windows	2022-08-19 17:37:09 +00:00
Debanjum Singh Solanky	b9a54c03ee	Add transformers package into Khoj app to run image search	2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky	ffbf15eff8	Add helper function to identify when app running as pyinstaller app Useful for when want the app to behave differently in pyinstaller app scenario with frozen python. And in development scenarios	2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky	6c5c1c33c1	Turn off Tokenizers Parallelism. Khoj doesn't support it right now - Forking and multiprocess are problemantic in frozen python scenarios. This will cause issues when running App packaged by pyinstaller	2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky	d4072974d7	Use of XMP metadata in Khoj Image Search is broken. Disable by default - CLIP Image score and XMP metadata score are not combining well. When combined they give non sensical results. Enable only once figure how best to combine the two. - Show scores with higher precision for image search - Image search scores seem to be mostly be between 0.2 - 0.3 for some reason - Higher precision scores make it easier to understand the quality of returned results perceived by the model itself	2022-08-19 19:17:28 +03:00
Debanjum Singh Solanky	7c4417126c	Append files, directories selected by user to config in Desktop GUI - Allows adding multiple image directories via GUI - Allow adding multiple files in different directories via GUI - Previously users couldn't add multiple directories via GUI They'd have to manually append to input field if multiple files, directories - To clear/overwrite is much easier. The user can just select text to delete in input area	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	00ddcfdac8	Use .ico icon when packaging for Windows (and Linux) using Pynstaller	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	812838da54	Only exclude libtorch_cuda, libtorch_cpu under torch/lib on Linux - On Mac excluding the .dylib version of these files throws errors Not sure why it didn't throw during testing. Maybe the libs were cached? - Tested on Linux again. It still seems to be passing with the above libs excluded. So going to keep those excluded for now. Unless further testing reveals those libs are really required for app	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	6ddcbe2e75	Remove files that triggered warnings during app start	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	60dacf3f2c	Show splash screen on app start. Only supported on Windows, Linux	2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky	0079c13bf7	Set input-directories in config for image search type on Desktop GUI - Issue Fix configuring image search from Desktop GUI. It was broken before. The Desktop GUI was updating input-files field under content-type > image. This field is not used for image search. So image search couldn't be configured from the Desktop GUI - Fix - Set input-directories when field of search type image is set from GUI - Otherwise set input-files field in config	2022-08-18 18:29:55 +03:00
Debanjum Singh Solanky	082fe937b9	Reduce Windows App Size by Removing Unused Libraries under Torch/Lib Tighten the duplicate library removal code in Khoj.spec	2022-08-18 11:28:28 +03:00
Debanjum	b78ee317ae	Reduce Debian, Mac App Size. Remove unused libraries under Torch/Lib ## Changes - On Debian - `libtorch_cuda.so` (1Gb) and `libtorch_cpu.so` (700Mb) are large shared libs - They are available at package root and under `torch/lib` directory in the package - We remove the unused, duplicate libraries from under `torch/lib` as only the top level libraries are used - On Mac - Remove `libtorch_{cpu,python}.dylib` under `torch/lib` directory from the Mac app. ## App Size Reduction - Debian amd64 app size by 42% from 1.6Gb to 920Mb - Mac arm64 app by 15% from 190Mb to 160Mb - Mac amd64 app by 33% from 340Mb to 230Mb ## Reference - [Release Workflow Run](https://github.com/debanjum/khoj/actions/runs/2878104171) after changes - [Release Workflow Run](https://github.com/debanjum/khoj/actions/runs/2869906116) before changes	2022-08-17 20:48:56 +00:00
Debanjum Singh Solanky	d25ddb93f7	Fix missing closing bracket from SOURCE_DATE_EPOCH def in release.yml	2022-08-17 23:17:27 +03:00
Debanjum Singh Solanky	9ee02b0804	Add --noconfirm in call to pyinstaller from Github release workflow Added just for safety, workflow works fine without it too	2022-08-17 23:04:26 +03:00

1 2 3 4 5 ...

645 commits