Commit graph

1984 commits

Author SHA1 Message Date
Debanjum Singh Solanky
3ce06a938c Render scheduled task response as html to improve readability in email 2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky
c17dbbeb92 Render next run time in user timezone in config, chat UIs
- Pass timezone string from ipapi to khoj via clients
  - Pass this data from web, desktop and obsidian clients to server
- Use user tz to render next run time of scheduled task in user tz
2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky
6736551ba3 Improve scheduled task text rendered in UI 2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky
0e01362469 Merge DB migrations from master with those from scheduled task feature 2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky
a5ed4f2af2 Send email to share results of scheduled task 2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky
69775b6d6e Add /task command. Use it to disable scheduling tasks from tasks
This takes the load of the task scheduling chat actor / prompt from
having to artifically differentiate query to create scheduled task
from a scheduled task run.
2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky
22289a0002 Improve task scheduling by using json mode and agent scratchpad
- The task scheduling actor was having trouble calculating the
  timezone. Giving the actor a scratchpad to improve correctness by
  thinking step by step
- Add more examples to reduce chances of the inferred query looping to
  create another reminder instead of running the query and sharing
  results with user
- Improve task scheduling chat actor test with more tests and
  by ensuring unexpected words not present in response
2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky
7f5981594c Only notify when scheduled task results satisfy user's requirements
There's a difference between running a scheduled task and notifying
the user about the results of running the scheduled task.

Decide to notify the user only when the results of running the
scheduled task satisfy the user's requirements.

Use sync version of send_message_to_model_wrapper for scheduled tasks
2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky
7e084ef1e0 Improve job id. Fix refreshing list of jobs on delete from config page 2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
a1e5195c8b Save separate user message time from Khoj response time in chat logs
Previously user message time was being stored the same as Khoj
response time in conversation logs.
2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
5133b6e73b Minor improvements to styling the config page 2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
648f1a5c71 Suffix chat response element vars with "El" in chat.html of web, desktop apps 2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
98d0ffecf1 Add section in settings page to view, delete your scheduled tasks 2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
423d61796d Add API endpoints to get and delete user scheduled tasks 2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
af0972c539 Make scheduled jobs persistent and work in multiple worker setups
- Store scheduled job state in Postgres so job schedules persist
  across app restarts
- Use Process Locks to only allow single worker to process a given job
  type. This prevents duplicating job runs across all workers
2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
fcf878e1f3 Add new operation Scheduled Job to Operation enum of ProcessLock 2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
c11742f443 Add chat actor to schedule run query for user at specified times
- Detect when user intends to schedule a task, aka reminder
  Add new output mode: reminder. Add example of selecting the reminder
  output mode
- Extract schedule time (as cron timestring) and inferred query to run
  from user message
- Use APScheduler to call chat with inferred query at scheduled time
- Handle reminder scheduling from both websocket and http chat requests

- Support constructing scheduled task using chat history as context
  Pass chat history to scheduled query generator for improved context
  for scheduled task generation
2024-05-01 08:28:59 +05:30
Debanjum Singh Solanky
9e068fad4f Handle null ref, when refresh conversation from db in websocket chat 2024-04-30 14:19:07 +05:30
sabaimran
37879a7850 Release Khoj version 1.11.2 2024-04-30 13:31:06 +05:30
sabaimran
93b41170d1 Refresh the conversation log from the db before addressing the next query 2024-04-30 13:27:51 +05:30
Debanjum Singh Solanky
f1545d2b2f Add, fix help link, improve title style in web ui config pages
- Align title text with icon better in all config cards
- Fix help link to github setup docs
- Fix help link to notion setup docs
2024-04-30 05:50:08 +05:30
Debanjum Singh Solanky
e6da0f9a8c Fix response type of delete client tokens API endpoint
Previously the make delete API response failed, after deleting token.
Required a page refresh to see that the API token was actually gone.

This was happening because the response type of the delete token API
endpoint isn't a string, so it failed FastAPI response validation
checks.
2024-04-30 02:46:52 +05:30
sabaimran
0f4c3518d3 Allow session cookies to be stored with a lax policy for some localhost scenarios 2024-04-29 15:48:45 +05:30
sabaimran
5beedc9734 Use Secure proxy ssl header only if no https 2024-04-29 15:33:21 +05:30
sabaimran
12258f02d7 Release Khoj version 1.11.1 2024-04-27 18:42:24 +05:30
sabaimran
2047b0c973
Support customization of the OpenAI base url in admin settings (#725)
- Allow self-hosted users to customize their open ai base url. This allows you to easily use a proxy service and extend support for other models.
- This also includes a migration that associates any existing openai chat model configuration with an openai processor configuration
- Make changing model a paid/subscriber feature
- Removes usage of langchain's OpenAI wrapper for better control over parsing input/output
2024-04-27 18:24:35 +05:30
sabaimran
49834e3b00 Add a hero image for the og:image meta tag 2024-04-27 17:07:21 +05:30
sabaimran
138f12f957 Fix indentation and revert first run message link styling to all links 2024-04-27 09:56:58 +05:30
Debanjum Singh Solanky
4395ed8065 Improve extract_questions func. Set message role to user, not assistant
Previous behavior of passing message with role = "assistant was
reducing instruction following quality of the model
2024-04-26 11:55:22 +05:30
Debanjum Singh Solanky
346499f12c Fix, improve args being passed to chat_completion args
- Allow passing completion args through completion_with_backoff
- Pass model_kwargs in a separate arg to simplify this
- Pass model in `model_name' kwarg from the send_message_to_model func
  `model_name' kwarg is used by langchain, not `model' kwarg
2024-04-26 11:55:22 +05:30
sabaimran
d8f2eac6e0 Release Khoj version 1.11.0 2024-04-25 17:24:59 +05:30
Debanjum Singh Solanky
1842017393 Skip trying to index deleted files, folders from Desktop app
Previously app would crash on startup if desktop app was told to
index a file that had been deleted afterwards
2024-04-25 15:23:05 +05:30
Debanjum
17a06f152c
Support Llama 3 and Improve Offline Chat Actors (#724)
- Add support for Llama 3 in Khoj offline mode
- Make chat actors generate valid json with more local models
- Fix offline chat actor tests
2024-04-25 14:00:56 +05:30
Debanjum
220e5516ab
Make Search Models More Configurable. Upgrade Default Cross-Encoder (#722)
- Upgrade default cross-encoder to mixedbread ai's mxbai-rerank-xsmall
- Support more embedding models by making query, docs encoding configurable
2024-04-25 13:55:49 +05:30
Debanjum Singh Solanky
cf08eaf786 Add comments explaining each field in the search model config in DB 2024-04-25 13:54:13 +05:30
Debanjum
4ee5ac7c20
Fix Chat UI and Indexing on Desktop App (#723)
- Make valid file extension checking case insensitive on Desktop app
- Skip indexing non-existent folders on Desktop app
- Pass auth headers to fix lazy load of chat messages on Desktop app
- Set chat-message height to height of content in web, desktop
2024-04-24 18:49:03 +05:30
Debanjum Singh Solanky
799efb5974 Create DB migration to add new fields and change default cross-encoder 2024-04-24 09:50:34 +05:30
Debanjum Singh Solanky
ec41482324 Upgrade default cross-encoder to mixedbread ai's mxbai-rerank-xsmall
Previous cross-encoder model was a few years old, newer models should
have improved in quality. Model size increases by 50% compared to
previous for better performance, at least on benchmarks
2024-04-24 09:50:09 +05:30
Debanjum Singh Solanky
7eaf9367fe Support more embedding models by making query, docs encoding configurable
Most newer, better embeddings models add a query, docs prefix when
encoding. Previously Khoj admins couldn't configure these, so it
wasn't possible to use these newer models.

This change allows configuring the kwargs passed to the query, docs
encoders by updating the search config in the database.
2024-04-24 09:49:17 +05:30
Debanjum Singh Solanky
4f7237b158 Make chat actors generate valid json with more local models
Improve tool, online search, webpage links, docs search chat actor
prompts. Ensure works with hermes-2-pro and llama-3.

Be more specific about generating JSON and not saying anything else.
2024-04-24 09:40:00 +05:30
Debanjum Singh Solanky
a2e4e4bede Add support for Llama 3 in Khoj offline mode
- Improve extract question prompts to explicitly request JSON list
- Use llama-3 chat format if HF repo_id mentions llama-3. The
  llama-cpp-python logic for detecting when to use llama-3 chat format
  isn't robust enough currently
2024-04-24 09:40:00 +05:30
Debanjum Singh Solanky
8e77b3dc82 Fix infer_max_tokens func when configured_max_tokens is set to None 2024-04-24 09:36:29 +05:30
Debanjum Singh Solanky
8196ab62f9 Make valid file extension checking case insensitive on Desktop app 2024-04-24 09:35:20 +05:30
Debanjum Singh Solanky
5def14e3bb Skip indexing non-existent folders on Desktop app 2024-04-24 09:35:20 +05:30
Debanjum Singh Solanky
cd05f262a6 Pass auth headers to fix lazy load of chat messages on Desktop app 2024-04-24 09:35:20 +05:30
Debanjum Singh Solanky
4d5d3e6433 Set chat-message height to height of content in web, desktop
In some cases, especially with image generation requests, this was
causing the chat messages to overlap in the chat UI
2024-04-24 09:35:20 +05:30
sabaimran
60658a8037 Get rid of enable flag for the offline chat processor config
- Default, assume that offline chat is enabled if there is an offline chat model option configured
2024-04-23 23:08:29 +05:30
sabaimran
ac474fce38 Ensure that the tokenizer and max prompt size are used the wrapper method 2024-04-23 21:22:23 +05:30
Olatoyan George
ad59180fb8
Added indication in the desktop UI for back-end connectivity (#711)
* Changed the styling of the link that takes a user to the settings page into a button
* added an indicator that shows if a user is connected to the server or not
* made a class name more descriptive and also made the text in first run message more intuitive
* changed the command to install dependencies in the README.md
* changed the class name of the first run message text to be more descriptive
* added icons in the desktop UI that shows if a file is synced successfully or not
* made the link class name in the homepage more descriptive
* fixed the hover issue on status box in the chat header pane
* fixed hovering issue on status box on macOS
2024-04-23 16:43:48 +05:30
Debanjum
419b044ac5
Use set, inferred max token limits wherever chat models are used (#713)
- User configured max tokens limits weren't being passed to
  `send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
  to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
  on machine
- Use min of app configured max tokens, vram based max tokens and
  model context window
2024-04-23 16:42:35 +05:30
AjaySDwivedi1
abf6f963ea
Replaced reinitialize and save all button to a sync button in config.… (#701)
Replaced reinitialize and save all button to a sync button in config
2024-04-23 16:42:11 +05:30
Debanjum Singh Solanky
c39c4e4ec4 Improve prompt for online search query generation chat actor
- Allow searching github, pypi for information about Khoj
- Enable creating multiple search queries by rewording prompt
2024-04-22 01:32:11 +05:30
Debanjum Singh Solanky
175169c156 Use set, inferred max token limits wherever chat models are used
- User configured max tokens limits weren't being passed to
  `send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
  to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
  on machine
- Use min of app configured max tokens, vram based max tokens and
  model context window
2024-04-20 11:23:28 +05:30
Debanjum Singh Solanky
002cd14a65 Only let agent use online search tool if connected to it 2024-04-20 11:19:48 +05:30
Debanjum Singh Solanky
75c9ebbc54 Only show uvicorn debug logs at higher verbosity levels
Don't automatically show the uvicorn logs when in_debug_mode, only
show on at least verbosity = 2, i.e when start khoj with -vv flag
2024-04-20 11:18:01 +05:30
sabaimran
d11354f9c8 Remove additional references to image content config 2024-04-17 13:00:50 +05:30
sabaimran
105dbf49e4 Fix max_duration_in_seconds for the update_embeddings job 2024-04-17 13:00:18 +05:30
Debanjum Singh Solanky
8e0bae894d Extract run with process lock logic into func. Use for content reindexing 2024-04-17 12:31:19 +05:30
Debanjum Singh Solanky
e9f608174b Fix access to Khoj admin panel from non HTTPS custom domains
To access the Khoj admin panel from a non HTTPS custom domain the
`KHOJ_NO_SSL' and `KHOJ_DOMAIN' env vars need to be explictly set.

See the updated setup docs for details.

Resolves #662
2024-04-17 03:20:05 +05:30
sabaimran
b0059654c9 Do not create an import error if the resend module is not available 2024-04-17 01:00:22 +05:30
sabaimran
f04ead7c37 Remove seting up log line for configuring image search 2024-04-17 00:45:39 +05:30
sabaimran
0208688801 Increase factor for n_ctx reduciton to 2e6 2024-04-17 00:41:36 +05:30
Debanjum Singh Solanky
1f2ffce85b Copy chat message with it's markdown formatting in Web, Desktop apps 2024-04-16 22:10:34 +05:30
sabaimran
91c8b137f1
Add a database lock for jobs that shouldn't be run by multiple workers (#706)
* Add a database lock for jobs that shouldn't be run by multiple workers

* Import relevant functions from utils.helpers
2024-04-16 21:29:27 +05:30
sabaimran
adb2e8cc5f Check if n is populated before making a comparison 2024-04-16 02:05:58 +05:30
Debanjum Singh Solanky
6707ccc463 Check before updating "chat" key in meta_log in chat history API endpoint 2024-04-15 21:06:47 +05:30
Debanjum Singh Solanky
4e7812fe55 Use Django management cmd to update inline images in DB to/from WebP/PNG
This provides Khoj server admins more control on migrating their S3
images to WebP format from PNG
2024-04-15 20:19:49 +05:30
Debanjum Singh Solanky
7fab8d6586 Only use chat messages count in history API endpoint when set by client 2024-04-15 19:12:57 +05:30
Debanjum
6b3ef61dd2
Improve Chat Page Load Perf, Offline Chat Perf and Miscellaneous Fixes (#703)
### Store Generated Images as WebP 
- 78bac4ae Add migration script to convert PNG to WebP references in database
- c6e84436 Update clients to support rendering webp images inline
- d21f22ff Store Khoj generated images as webp instead of png for faster loading

### Lazy Fetch Chat Messages to Improve Time, Data to First Render
This is especially helpful for long conversations with lots of images
- 128829c4 Render latest msgs on chat session load. Fetch, render rest as they near viewport
- 9e558577 Support getting latest N chat messages via chat history API

### Intelligently set Context Window of Offline Chat to Improve Performance
- 4977b551 Use offline chat prompt config to set context window of loaded chat model

### Fixes
- 148923c1 Fix to raise error on hitting rate limit during Github indexing
- b8bc6bee Always remove loading animation on Desktop app if can't login to server
- 38250705 Fix `get_user_photo` to only return photo, not user name from DB

### Miscellaneous Improvements
- 689202e0 Update recommended CMAKE flag to enable using CUDA on linux in Docs
- b820daf3 Makes logs less noisy
2024-04-15 18:34:29 +05:30
Debanjum Singh Solanky
a352940dfd Use Django management command to update images URL in DB to WebP
This provides Khoj server admins more control on migrating their S3
images to WebP format from PNG
2024-04-15 17:53:41 +05:30
Debanjum Singh Solanky
7d8e8eb0cf Use Enum to type text-to-image intent of Khoj chat response 2024-04-15 17:53:40 +05:30
Debanjum Singh Solanky
128829c477 Show latest msgs on chat session load. Fetch rest as they near viewport
- Reduces time to first render when loading long chat sessions
- Limits size of first page load, when loading long chat sessions

These performance improvements are maximally felt for large chat
sessions with lots of images generated by Khoj

Updated web and desktop app to support these changes for now
2024-04-15 16:10:56 +05:30
Debanjum Singh Solanky
9e5585776c Support getting latest N chat messages via chat history API
Get latest N if N > 0, else return all messages except latest N from
the conversation
2024-04-15 15:32:32 +05:30
Debanjum Singh Solanky
e5ff85f6fb Start fetching khoj css before icons to reduce time with no styling
This should reduce frequency of page load jitter when icons are loaded
before style is applied
2024-04-15 15:32:32 +05:30
Debanjum Singh Solanky
d5de59d411 Do not assume results key present in notion content when indexing 2024-04-15 08:02:20 +05:30
Debanjum Singh Solanky
4977b55106 Use offline chat prompt config to set context window of loaded chat model
Previously you couldn't configure the n_ctx of the loaded offline chat
model. This made it hard to use good offline chat model (which these
days also have larger context) on machines with lower VRAM
2024-04-14 02:35:36 +05:30
Debanjum Singh Solanky
148923c13a Fix to raise error on hitting rate limit during Github indexing 2024-04-13 22:09:13 +05:30
sabaimran
f24d71c71c
Improve the agents UX (#702)
- Make the chat buttons look more clickable
- Show agent name in new conversation message
- Add an icon to the CTA to send agent a message
2024-04-13 20:11:37 +05:30
Debanjum Singh Solanky
78bac4ae05 Add migration script to convert PNG to WebP references in database 2024-04-13 19:06:28 +05:30
Debanjum Singh Solanky
c6e8443631 Update clients to support rendering webp images inline
This is for self-hosted scenarios where AWS S3 uploads is not enabled
2024-04-13 13:11:18 +05:30
Debanjum Singh Solanky
d21f22ffa1 Store Khoj generated images as webp instead of png for faster loading 2024-04-13 13:03:32 +05:30
Debanjum Singh Solanky
b820daf38f Makes logs less noisy
- Show telemetry enabled/disabled state on init, not every 2 minutes
- Convert no docs synced logs to debug level instead of warning
  Having synced docs isn't as important to use Khoj now, unlike before
2024-04-13 11:22:58 +05:30
Debanjum Singh Solanky
b8bc6bee83 Always remove loading animation on Desktop app if can't login to server 2024-04-13 11:02:44 +05:30
Debanjum Singh Solanky
382507051f Fix get_user_photo to only return photo, not user name from DB 2024-04-13 11:02:30 +05:30
sabaimran
f06ec485cb Fix redirect url process for login flow, existing user 2024-04-12 17:10:05 +05:30
sabaimran
b86e68a29d Make it easier to view agents in the admin page 2024-04-12 13:02:22 +05:30
sabaimran
1377a44a1a Suppress debug logs from uvicorn.error to avoid clutter from websockets
- If application is not in DEBUG_MODE
2024-04-12 12:12:16 +05:30
Debanjum Singh Solanky
89b8ec3546 Release Khoj version 1.10.2 2024-04-12 11:53:32 +05:30
Debanjum Singh Solanky
50b4788a91 Remove chat loading animation in login required state on Desktop app 2024-04-12 11:50:54 +05:30
Debanjum Singh Solanky
b3f4794d91 Remove the unnecessary async/await func chains on Desktop app 2024-04-12 11:49:25 +05:30
Debanjum Singh Solanky
1e30a072d4 Just use file ext to identify indexable files to fix Desktop app install
- Magika on Desktop app was too bloated (100Mb to 250Mb) and broke
  install for some reason. Not sure why it was causing the app install
  to fail but do not have time to currently investigate

- Just use file extensions whitelist it's good enough for now. Let
  server handle the deeper identification of file type
2024-04-12 11:16:07 +05:30
Debanjum Singh Solanky
5c7797dbca Only check content type if file extension cannot identify text file 2024-04-12 03:40:42 +05:30
Debanjum Singh Solanky
7d2ef728e6 Fix identifying pdf files on server
Introduced bug in previous commit that would stop indexing PDF files
as trying to check content_group instead of mime_type is application/pdf
2024-04-12 03:07:46 +05:30
Debanjum Singh Solanky
07f8fb5c5b Release Khoj version 1.10.1 2024-04-12 02:18:07 +05:30
Debanjum Singh Solanky
a7d9102c33 Make identifying text, code files with Magika more robust on server
Use identified content group rather than mime_type to find text files.
2024-04-12 02:12:26 +05:30
Debanjum Singh Solanky
60337086f9 Release Khoj version 1.10.0 2024-04-12 01:01:02 +05:30
Debanjum Singh Solanky
34c3f70203 Index only files with valid text extension in folders synced by Desktop app
This maintains consistent set of indexable files from Desktop app,
whether indexing via file or folder filters
2024-04-12 00:59:54 +05:30
Debanjum
9a48f72041
Index more text file types from Desktop, Github (#692)
### Index more text file types 
- Index all text, code files in Github repos. Not just md, org files
- Send more text file types from Desktop app and improve indexing them
- Identify file type by content & allow server to index all text files

### Deprecate Github Indexing Features
- Stop indexing commits, issues and issue comments in a Github repo
- Skip indexing Github repo on hitting Github API rate limit

### Fixes and Improvements
- **Fix indexing files in sub-folders from Desktop app**
- Standardize structure of text to entries to match other entry processors
2024-04-12 00:08:29 +05:30
Debanjum Singh Solanky
0819b83d0b Fix constructing status update strings for intermediate chat steps 2024-04-11 20:31:32 +05:30
Debanjum Singh Solanky
d15b9bc272 Tell doc search actor to not generate online queries for doc search
This can pick up irrelevant details from notes
2024-04-11 19:49:41 +05:30