- Magika on Desktop app was too bloated (100Mb to 250Mb) and broke
install for some reason. Not sure why it was causing the app install
to fail but do not have time to currently investigate
- Just use file extensions whitelist it's good enough for now. Let
server handle the deeper identification of file type
- `file-type' doesn't handle mis-labelled files or files without
extensions well
- Only show supported file types in file selector dialog on Desktop app
Use Magika to get list of text file extensions. Combine with other
supported extensions to get complete list of supported file extensions.
Use it to limit selectable files in the File Open dialog.
Note: Folder selector will index text files with no extensions as well
- `fs.readdir' func in node version 18.18.2 has buggy `recursive' option
See nodejs/node#48640, effect-ts/effect#1801 for details
- We were recursing down a folder in two ways on the Desktop app.
Remove `recursive: True' option to the `fs.readdirSync' method call
to recurse down via app code only
- Allow syncing more file types from desktop app to index on server
- Use `file-type' package to identify valid text file types on Desktop app
- Split plaintext entries into smaller logical units than a whole file
Since the text splitting upgrades in #645, compiled chunks have more
logical splits like paragraph, sentence.
Show those (potentially) smaller snippets to the user as references
- Tangential Fix:
Initialize unbound currentTime variable for error log timestamp
* Add chat sessions to the desktop application
* Increase width of the main chat body to 90vw
* Update the version of electron
* Render the default message if chat history fails to load
* Merge conversation migrations and fix slug setting
* Update the welcome message, use the hostURL, and update background color for chat actions
* Only update the window's web contents if the page is config
* Use separate functions for adding files and folders to configuration for indexing
* Add a loading bar while data is syncing
* Bump the minor version for the application
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes