Commit graph

4139 commits

Author SHA1 Message Date
Debanjum Singh Solanky
8e0bae894d Extract run with process lock logic into func. Use for content reindexing 2024-04-17 12:31:19 +05:30
Debanjum Singh Solanky
e9f608174b Fix access to Khoj admin panel from non HTTPS custom domains
To access the Khoj admin panel from a non HTTPS custom domain the
`KHOJ_NO_SSL' and `KHOJ_DOMAIN' env vars need to be explictly set.

See the updated setup docs for details.

Resolves #662
2024-04-17 03:20:05 +05:30
sabaimran
46210695b6 pin version of huggingface hub explicitly to ensure relevant constants are present. Closes #708 2024-04-17 01:09:36 +05:30
sabaimran
b0059654c9 Do not create an import error if the resend module is not available 2024-04-17 01:00:22 +05:30
sabaimran
f04ead7c37 Remove seting up log line for configuring image search 2024-04-17 00:45:39 +05:30
sabaimran
0208688801 Increase factor for n_ctx reduciton to 2e6 2024-04-17 00:41:36 +05:30
Debanjum Singh Solanky
1f2ffce85b Copy chat message with it's markdown formatting in Web, Desktop apps 2024-04-16 22:10:34 +05:30
sabaimran
91c8b137f1
Add a database lock for jobs that shouldn't be run by multiple workers (#706)
* Add a database lock for jobs that shouldn't be run by multiple workers

* Import relevant functions from utils.helpers
2024-04-16 21:29:27 +05:30
sabaimran
adb2e8cc5f Check if n is populated before making a comparison 2024-04-16 02:05:58 +05:30
Debanjum Singh Solanky
6707ccc463 Check before updating "chat" key in meta_log in chat history API endpoint 2024-04-15 21:06:47 +05:30
Debanjum Singh Solanky
4e7812fe55 Use Django management cmd to update inline images in DB to/from WebP/PNG
This provides Khoj server admins more control on migrating their S3
images to WebP format from PNG
2024-04-15 20:19:49 +05:30
Debanjum Singh Solanky
7fab8d6586 Only use chat messages count in history API endpoint when set by client 2024-04-15 19:12:57 +05:30
Debanjum
6b3ef61dd2
Improve Chat Page Load Perf, Offline Chat Perf and Miscellaneous Fixes (#703)
### Store Generated Images as WebP 
- 78bac4ae Add migration script to convert PNG to WebP references in database
- c6e84436 Update clients to support rendering webp images inline
- d21f22ff Store Khoj generated images as webp instead of png for faster loading

### Lazy Fetch Chat Messages to Improve Time, Data to First Render
This is especially helpful for long conversations with lots of images
- 128829c4 Render latest msgs on chat session load. Fetch, render rest as they near viewport
- 9e558577 Support getting latest N chat messages via chat history API

### Intelligently set Context Window of Offline Chat to Improve Performance
- 4977b551 Use offline chat prompt config to set context window of loaded chat model

### Fixes
- 148923c1 Fix to raise error on hitting rate limit during Github indexing
- b8bc6bee Always remove loading animation on Desktop app if can't login to server
- 38250705 Fix `get_user_photo` to only return photo, not user name from DB

### Miscellaneous Improvements
- 689202e0 Update recommended CMAKE flag to enable using CUDA on linux in Docs
- b820daf3 Makes logs less noisy
2024-04-15 18:34:29 +05:30
Debanjum Singh Solanky
a352940dfd Use Django management command to update images URL in DB to WebP
This provides Khoj server admins more control on migrating their S3
images to WebP format from PNG
2024-04-15 17:53:41 +05:30
Debanjum Singh Solanky
7d8e8eb0cf Use Enum to type text-to-image intent of Khoj chat response 2024-04-15 17:53:40 +05:30
Debanjum Singh Solanky
128829c477 Show latest msgs on chat session load. Fetch rest as they near viewport
- Reduces time to first render when loading long chat sessions
- Limits size of first page load, when loading long chat sessions

These performance improvements are maximally felt for large chat
sessions with lots of images generated by Khoj

Updated web and desktop app to support these changes for now
2024-04-15 16:10:56 +05:30
Debanjum Singh Solanky
9e5585776c Support getting latest N chat messages via chat history API
Get latest N if N > 0, else return all messages except latest N from
the conversation
2024-04-15 15:32:32 +05:30
Debanjum Singh Solanky
e5ff85f6fb Start fetching khoj css before icons to reduce time with no styling
This should reduce frequency of page load jitter when icons are loaded
before style is applied
2024-04-15 15:32:32 +05:30
Debanjum Singh Solanky
d5de59d411 Do not assume results key present in notion content when indexing 2024-04-15 08:02:20 +05:30
Debanjum Singh Solanky
4977b55106 Use offline chat prompt config to set context window of loaded chat model
Previously you couldn't configure the n_ctx of the loaded offline chat
model. This made it hard to use good offline chat model (which these
days also have larger context) on machines with lower VRAM
2024-04-14 02:35:36 +05:30
Debanjum Singh Solanky
689202e00e Update recommended CMAKE flag to enable using CUDA on linux in Docs 2024-04-14 02:35:27 +05:30
Debanjum Singh Solanky
148923c13a Fix to raise error on hitting rate limit during Github indexing 2024-04-13 22:09:13 +05:30
sabaimran
f24d71c71c
Improve the agents UX (#702)
- Make the chat buttons look more clickable
- Show agent name in new conversation message
- Add an icon to the CTA to send agent a message
2024-04-13 20:11:37 +05:30
Debanjum Singh Solanky
78bac4ae05 Add migration script to convert PNG to WebP references in database 2024-04-13 19:06:28 +05:30
Debanjum Singh Solanky
c6e8443631 Update clients to support rendering webp images inline
This is for self-hosted scenarios where AWS S3 uploads is not enabled
2024-04-13 13:11:18 +05:30
Debanjum Singh Solanky
d21f22ffa1 Store Khoj generated images as webp instead of png for faster loading 2024-04-13 13:03:32 +05:30
Debanjum Singh Solanky
b820daf38f Makes logs less noisy
- Show telemetry enabled/disabled state on init, not every 2 minutes
- Convert no docs synced logs to debug level instead of warning
  Having synced docs isn't as important to use Khoj now, unlike before
2024-04-13 11:22:58 +05:30
Debanjum Singh Solanky
b8bc6bee83 Always remove loading animation on Desktop app if can't login to server 2024-04-13 11:02:44 +05:30
Debanjum Singh Solanky
382507051f Fix get_user_photo to only return photo, not user name from DB 2024-04-13 11:02:30 +05:30
sabaimran
f06ec485cb Fix redirect url process for login flow, existing user 2024-04-12 17:10:05 +05:30
sabaimran
87b9a93fa1 Update assertion line to match new logic 2024-04-12 13:09:19 +05:30
sabaimran
b86e68a29d Make it easier to view agents in the admin page 2024-04-12 13:02:22 +05:30
sabaimran
e58bd0e485 Remove mbox file from list of files expected to be included 2024-04-12 12:55:22 +05:30
sabaimran
6634d603a8 Add links for contributors to use in the readme 2024-04-12 12:49:12 +05:30
sabaimran
1377a44a1a Suppress debug logs from uvicorn.error to avoid clutter from websockets
- If application is not in DEBUG_MODE
2024-04-12 12:12:16 +05:30
Debanjum Singh Solanky
89b8ec3546 Release Khoj version 1.10.2 2024-04-12 11:53:32 +05:30
Debanjum Singh Solanky
50b4788a91 Remove chat loading animation in login required state on Desktop app 2024-04-12 11:50:54 +05:30
Debanjum Singh Solanky
b3f4794d91 Remove the unnecessary async/await func chains on Desktop app 2024-04-12 11:49:25 +05:30
Debanjum Singh Solanky
1e30a072d4 Just use file ext to identify indexable files to fix Desktop app install
- Magika on Desktop app was too bloated (100Mb to 250Mb) and broke
  install for some reason. Not sure why it was causing the app install
  to fail but do not have time to currently investigate

- Just use file extensions whitelist it's good enough for now. Let
  server handle the deeper identification of file type
2024-04-12 11:16:07 +05:30
Debanjum Singh Solanky
5c7797dbca Only check content type if file extension cannot identify text file 2024-04-12 03:40:42 +05:30
Debanjum Singh Solanky
7d2ef728e6 Fix identifying pdf files on server
Introduced bug in previous commit that would stop indexing PDF files
as trying to check content_group instead of mime_type is application/pdf
2024-04-12 03:07:46 +05:30
Debanjum Singh Solanky
07f8fb5c5b Release Khoj version 1.10.1 2024-04-12 02:18:07 +05:30
Debanjum Singh Solanky
a7d9102c33 Make identifying text, code files with Magika more robust on server
Use identified content group rather than mime_type to find text files.
2024-04-12 02:12:26 +05:30
Debanjum Singh Solanky
60337086f9 Release Khoj version 1.10.0 2024-04-12 01:01:02 +05:30
Debanjum Singh Solanky
34c3f70203 Index only files with valid text extension in folders synced by Desktop app
This maintains consistent set of indexable files from Desktop app,
whether indexing via file or folder filters
2024-04-12 00:59:54 +05:30
Debanjum
9a48f72041
Index more text file types from Desktop, Github (#692)
### Index more text file types 
- Index all text, code files in Github repos. Not just md, org files
- Send more text file types from Desktop app and improve indexing them
- Identify file type by content & allow server to index all text files

### Deprecate Github Indexing Features
- Stop indexing commits, issues and issue comments in a Github repo
- Skip indexing Github repo on hitting Github API rate limit

### Fixes and Improvements
- **Fix indexing files in sub-folders from Desktop app**
- Standardize structure of text to entries to match other entry processors
2024-04-12 00:08:29 +05:30
Debanjum Singh Solanky
0819b83d0b Fix constructing status update strings for intermediate chat steps 2024-04-11 20:31:32 +05:30
Debanjum Singh Solanky
d15b9bc272 Tell doc search actor to not generate online queries for doc search
This can pick up irrelevant details from notes
2024-04-11 19:49:41 +05:30
Debanjum Singh Solanky
15a78b19ad Improve Inferred Document Search Query Extraction from GPT
Using stop_words = "\n" was preventing JSON responses with newlines in
them
2024-04-11 19:24:04 +05:30
Debanjum Singh Solanky
653681967e Show inferred document search queries in intermediate chat step on Web app 2024-04-11 19:24:04 +05:30