Commit graph

4066 commits

Author SHA1 Message Date
Debanjum Singh Solanky
972523e8a9 Re-enable tests for image search
Verify if recent fixes resolve test flakiness
2022-08-20 14:44:53 +03:00
Debanjum Singh Solanky
82d2891765 Do not pass ML compute `device' around as argument to search funcs
- It is a non-user configurable, app state that is set on app start
- Reduce passing unneeded arguments around. Just set device where
  required by looking for ML compute device in global state
2022-08-20 14:44:53 +03:00
Debanjum Singh Solanky
acc9091260 Use MPS on Apple Mac M1 to GPU accelerate Encode, Query Performance
- Note: Support for MPS in Pytorch is currently in v1.13.0 nightly builds
  - Users will have to wait for PyTorch MPS support to land in stable builds
- Until then the code can be tweaked and tested to make use of the GPU
  acceleration on newer Macs
2022-08-20 14:44:06 +03:00
Debanjum Singh Solanky
7de9c58a1c Load models, corpus embeddings onto GPU device for text search, if available
- Pass device to load models onto from app state.
  - SentenceTransformer models accept device to load models onto during initialization
- Pass device to load corpus embeddings onto from app state
2022-08-20 14:04:18 +03:00
Debanjum Singh Solanky
7fe3e844d2 Fix setup of Reproducible Build environment in publish workflow
- Note: Reproducible builds have not been validated.
  This is just preliminary work to get there.
  Further testing and fixes maybe required
2022-08-19 21:00:12 +03:00
Debanjum Singh Solanky
dc8dcc94a6 Bump Khoj.el package version to 0.1.6 2022-08-19 20:48:42 +03:00
Debanjum
a7b4d58865
Fix Image Search and Improve Desktop App
### Fix Image Search
  - Do not use XMP metadata by default for image search
    - It seems to be buggy currently. The returned results do not make sense with XMP metadata enabled

### Fix Image Search using Desktop App
  - Fix configuring Image Search via Desktop GUI
    - Set `input-directories`, instead of unused `input-files` for `content-type.image` in `khoj.yml`
  - Fix running Image Search via Desktop apps. 
    - Previously the transformers wasn't getting packaged into the app by pyinstaller
    - This is required by image search to run. So the desktop apps would fail to start when image search was enabled
    - Resolves #68
  - Append selected files, directories via "Add" button in Desktop GUI
    - This allows selecting multiple files, directories using Desktop GUI
    - Previously selecting multiple image directories had to be entered manually

### Improve Desktop App
  - Show Splash Screen to Desktop on App Initialization
    - The app takes a while to load during first run 
    - A splash screen signals that app is loading and not being unresponsive
    - Note: _Pyinstaller only supports splash screens on Windows, Linux. Not on Macs._
  - Add Khoj icon to the Windows, Linux app. Windows expects a `.ico` icon type
  - Only exclude `libtorch_{cuda, cpu, python}` on Linux machine
    - Seems those libraries are being used on Mac (and maybe Windows). 
    - Linux is where the app size benefits from removing these is maximum anyway
  - Fix PyInstaller Warnings on App Start
    - The warning show up as annoying error popups on Windows
2022-08-19 17:37:09 +00:00
Debanjum Singh Solanky
b9a54c03ee Add transformers package into Khoj app to run image search 2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky
ffbf15eff8 Add helper function to identify when app running as pyinstaller app
Useful for when want the app to behave differently in pyinstaller app
scenario with frozen python. And in development scenarios
2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky
6c5c1c33c1 Turn off Tokenizers Parallelism. Khoj doesn't support it right now
- Forking and multiprocess are problemantic in frozen python
  scenarios. This will cause issues when running App packaged by
  pyinstaller
2022-08-19 19:17:54 +03:00
Debanjum Singh Solanky
d4072974d7 Use of XMP metadata in Khoj Image Search is broken. Disable by default
- CLIP Image score and XMP metadata score are not combining well.
  When combined they give non sensical results. Enable only once
  figure how best to combine the two.

- Show scores with higher precision for image search
  - Image search scores seem to be mostly be between 0.2 - 0.3 for some reason
  - Higher precision scores make it easier to understand the quality
    of returned results perceived by the model itself
2022-08-19 19:17:28 +03:00
Debanjum Singh Solanky
7c4417126c Append files, directories selected by user to config in Desktop GUI
- Allows adding multiple image directories via GUI
- Allow adding multiple files in different directories via GUI
- Previously users couldn't add multiple directories via GUI
  They'd have to manually append to input field if multiple files, directories
- To clear/overwrite is much easier.
  The user can just select text to delete in input area
2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky
00ddcfdac8 Use .ico icon when packaging for Windows (and Linux) using Pynstaller 2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky
812838da54 Only exclude libtorch_cuda, libtorch_cpu under torch/lib on Linux
- On Mac excluding the .dylib version of these files throws errors
  Not sure why it didn't throw during testing. Maybe the libs were cached?

- Tested on Linux again. It still seems to be passing with the above
  libs excluded. So going to keep those excluded for now. Unless
  further testing reveals those libs are really required for app
2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky
6ddcbe2e75 Remove files that triggered warnings during app start 2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky
60dacf3f2c Show splash screen on app start. Only supported on Windows, Linux 2022-08-19 19:16:10 +03:00
Debanjum Singh Solanky
0079c13bf7 Set input-directories in config for image search type on Desktop GUI
- Issue
  Fix configuring image search from Desktop GUI. It was broken before.
  The Desktop GUI was updating input-files field under content-type > image.
  This field is not used for image search. So image search couldn't be
  configured from the Desktop GUI

- Fix
  - Set input-directories when field of search type image is set from GUI
  - Otherwise set input-files field in config
2022-08-18 18:29:55 +03:00
Debanjum Singh Solanky
082fe937b9 Reduce Windows App Size by Removing Unused Libraries under Torch/Lib
Tighten the duplicate library removal code in Khoj.spec
2022-08-18 11:28:28 +03:00
Debanjum
b78ee317ae
Reduce Debian, Mac App Size. Remove unused libraries under Torch/Lib
## Changes
- On **Debian**
  - `libtorch_cuda.so` (1Gb) and `libtorch_cpu.so` (700Mb) are large shared libs
  - They are available at package root and under `torch/lib` directory in the package
  - We remove the unused, duplicate libraries from under `torch/lib` as only the top level libraries are used
- On **Mac**
  - Remove `libtorch_{cpu,python}.dylib` under `torch/lib` directory from the Mac app.

## App Size Reduction
  - **Debian amd64** app size by **42%** from **1.6Gb to 920Mb**  
  - **Mac arm64** app by **15%** from **190Mb to 160Mb**
  - **Mac amd64** app by **33%** from **340Mb to 230Mb**

## Reference
- [Release Workflow Run](https://github.com/debanjum/khoj/actions/runs/2878104171) after changes
- [Release Workflow Run](https://github.com/debanjum/khoj/actions/runs/2869906116) before changes
2022-08-17 20:48:56 +00:00
Debanjum Singh Solanky
d25ddb93f7 Fix missing closing bracket from SOURCE_DATE_EPOCH def in release.yml 2022-08-17 23:17:27 +03:00
Debanjum Singh Solanky
9ee02b0804 Add --noconfirm in call to pyinstaller from Github release workflow
Added just for safety, workflow works fine without it too
2022-08-17 23:04:26 +03:00
Debanjum Singh Solanky
7cf345a138 Exclude unused mac libs under torch/lib. Reduce Mac app size by 30Mb
libtorch_cuda only seems to be imported in Linux. Which is why the
size of the Mac, Windows apps are 700Mb smaller than the Debian app size.

Guessing this is because libtorch_cuda only works on Linux machines?

Anyway, removing libtorch_{cpu,python}.dylib under torch/lib from the
Mac app reduces it's size from 190Mb to 160Mb. 15% reduction isn't too bad
2022-08-17 22:59:01 +03:00
Debanjum Singh Solanky
0273be0232 Exclude unused libs under torch/lib. Reduce Debian package size by 700Mb
- libtorch_cuda.so (1Gb) and libtorch_cpu.so (700Mb) are large shared
  libs that are available at package root and under torch/lib.
- The top level imports are used, so they unused libs are removed from
  package
- This reduces the single file package size from 1.6Gb to 920Mb
2022-08-17 22:32:58 +03:00
Debanjum Singh Solanky
5a20283202 Set Pyinstaller, Pip environment to create reproducible builds of Khoj
- Dependency Version Pinning
  - First level dependency versions have been pinned.
  - Transitive dependencies have not been specified yet

- Testing
  - The Pyinstaller build has been only minimally tested for reproducibility
  - The Khoj package generated for PyPi have not been tested for reproducibility

- References
  - https://reproducible-builds.org/docs/source-date-epoch/
  - https://pyinstaller.org/en/stable/advanced-topics.html#creating-a-reproducible-build
2022-08-17 20:09:35 +03:00
Debanjum Singh Solanky
f821b614ab Fix path to Khoj executable in Khoj.desktop for Debian package 2022-08-17 19:52:35 +03:00
Debanjum Singh Solanky
43049f761b Fix query performance numbers in Readme 2022-08-17 18:32:55 +03:00
Debanjum Singh Solanky
a37724f338 Fix Debian package permission. Set version on manual workflow trigger 2022-08-16 20:30:46 +03:00
Debanjum Singh Solanky
4d83a1d13b Bump Khoj pip version to 0.1.6 to publish pre-release builds 2022-08-16 17:37:41 +03:00
Debanjum Singh Solanky
3d0f979475 Make the .deb package use version in Debian package versioning format 2022-08-16 17:35:14 +03:00
Debanjum Singh Solanky
3000b91297 Fix and Update Readme. Delete old Demo from Repository 2022-08-16 16:57:56 +03:00
Debanjum Singh Solanky
c4fd661909 Move the experimental /chat API to under /beta/chat 2022-08-16 16:36:15 +03:00
Debanjum
a482c2a8b0
Update Demo with Install, Configure and Search shown
Demo contains:
- Setup using pip
- Configure using Desktop GUI
- Search via Web and Emacs interfaces
2022-08-15 16:15:43 -07:00
Debanjum
7fc8672666
Improve Desktop GUI and Documentation
- Improve Documentation
  - 7866add Add Interface Screenshots to Docs
  - 8ad3482 Update Readme Instructions to use Desktop GUI to configure App
- Fix Markdown Search Bug on Backend
  - b891347 Fix condition in router to trigger markdown search
- Improve Desktop GUI
  - 67ab40b Regenerate embeddings everytime user clicks configure in Desktop GUI
  - 7f479b0 Improve Displaying Error to User on Khoj window in Desktop GUI
  - 873bb9d Do not force the Khoj window to always be on top. It's needlessly annoying
  - 9bc4fd5 Set Web Interface URL from loaded state in Desktop GUIs. Not hard-coded
2022-08-15 23:01:37 +00:00
Debanjum Singh Solanky
35ce511709 Add Interface Screenshots to Readme 2022-08-16 01:59:29 +03:00
Debanjum Singh Solanky
342c72b156 Update Readme Instructions to use Desktop GUI to configure App
- Configure app using the configure screen during first run.
  No config file args required to be passed via CLI by users

- Update instructions to copy khoj_sample.yml to default app configure
  file location and edit that. Instead of editing the khoj_sample.yml
  directly in source
2022-08-16 01:59:29 +03:00
Debanjum Singh Solanky
b8913476ba Fix if condition in router to trigger markdown search 2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky
9bc4fd539e Set Web Interface URL from loaded state in Desktop GUIs. Not hard-coded 2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky
7f479b0104 Improve Displaying Error to User on Khoj window in Desktop GUI
- Show a helpful error message in the GUI to the user, instead of the
  crashing if loading config fails, for e.g if file wasn't found
- Collate GUI errors into an ErrorType enum class
- Remove previous error messages before showing the new one
2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky
873bb9dd97 Do not force the Khoj window to always be on top. It's needlessly annoying 2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky
67ab40bb01 Regenerate embeddings everytime user clicks configure in Desktop GUI
Previously if the embeddings were already there only the khoj.yml
config file would get updated. The embeddings would remain old.

1. This results in a stale app state where the config doesn't
   match the embeddings

2. Currently the user cannot update their config from the config
   screen. They'd have to use a combination of config screen and web
   interface>regenerate button to trigger it or delete their ~/.khoj dir

This commit should resolve the above issues
2022-08-16 00:37:16 +03:00
Debanjum Singh Solanky
2647e6bab4 Display re-ranked results triggered via keybinding in khoj.el
- Prevent immediate overwrite of re-ranked results by
  incremental-search without rerank triggered via post-command-hook.

- This triggers right after the reranking results are rendered, so
  user never ends up seeing them
2022-08-15 18:41:12 +03:00
Debanjum Singh Solanky
7421ef2724 Fix path of app artifacts to attach to release via release pipeline 2022-08-15 17:41:08 +03:00
Debanjum Singh Solanky
c21ab4316c Fix publish of pip package to PyPi on git tag 2022-08-15 06:55:00 +03:00
Debanjum Singh Solanky
237e207304 Create separate app artifacts for each operating system 2022-08-15 06:39:45 +03:00
Debanjum Singh Solanky
a91d2df300 Simplify Emacs interface to only rerank results on explicit command 2022-08-15 06:20:13 +03:00
Debanjum Singh Solanky
e846829a2e Reset Khoj.el version to align with Khoj package version 2022-08-15 06:20:13 +03:00
Debanjum Singh Solanky
4308f51d2c Fix workflow_dispatch trigger name used in github release workflow 2022-08-15 06:09:54 +03:00
Debanjum
a26f7f4716
Merge pull request #66 from debanjum/use-venv-to-reduce-debian-app-size
Improve Release Workflow, Automatically Publish Artifacts to Release
2022-08-15 03:04:18 +00:00
Debanjum Singh Solanky
14710da962 Add version, arch to app name. Publish artifacts to release
- Allow manual trigger of workflow
2022-08-15 06:01:47 +03:00
Debanjum Singh Solanky
2142cba627 Simplify upload artifacts to single action 2022-08-15 05:11:37 +03:00