Commit graph

3866 commits

Author SHA1 Message Date
sabaimran
3badb27744 Remove stored uploaded files after they're processed. 2024-11-08 23:28:02 -08:00
sabaimran
78630603f4 Delete the fact checker application 2024-11-08 17:27:42 -08:00
sabaimran
807687a0ac Automatically generate titles for conversations from history 2024-11-08 16:02:34 -08:00
sabaimran
7159b0b735 Enforce limits on file size when converting to text 2024-11-08 15:27:28 -08:00
sabaimran
4695174149 Add support for file preview in the chat input area (before message sent) 2024-11-08 15:12:48 -08:00
sabaimran
ad46b0e718 Label pages when extract text from pdf, docs content. Fix scroll area in doc preview. 2024-11-08 14:53:20 -08:00
sabaimran
ee062d1c48 Fix parsing for PDFs via content indexing API 2024-11-07 18:17:29 -08:00
sabaimran
623a97a9ee Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter 2024-11-07 17:18:23 -08:00
sabaimran
33498d876b Simplify the share chat page. Don't need it to maintain its own conversation history
- When chatting on a shared page, fork and redirect to a new conversation page
2024-11-07 17:14:11 -08:00
sabaimran
4b8be55958 Convert UUID to string when forking a conversation 2024-11-07 17:13:04 -08:00
sabaimran
9bbe27fe36 Set default value of attached files to empty list 2024-11-07 17:12:45 -08:00
sabaimran
3a51996f64 Process attached files in the chat history and add them to the chat message 2024-11-07 16:06:58 -08:00
sabaimran
a89160e2f7 Add support for converting an attached doc and chatting with it
- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend
- We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
2024-11-07 16:06:37 -08:00
sabaimran
e521853895 Remove unnecessary console.log statements 2024-11-07 16:03:31 -08:00
sabaimran
92c3b9c502 Add function to get an icon from a file type 2024-11-07 16:02:53 -08:00
sabaimran
140c67f6b5 Remove focus ring from the text area component 2024-11-07 16:02:02 -08:00
sabaimran
b8ed98530f Accept attached files in the chat API
- weave through all subsequent subcalls to models, where relevant, and save to conversation log
2024-11-07 16:01:48 -08:00
sabaimran
ecc81e06a7 Add separate methods for docx and pdf files to just convert files to raw text, before further processing 2024-11-07 16:01:08 -08:00
sabaimran
394035136d Add an api that gets a document, and converts it to just text 2024-11-07 16:00:10 -08:00
sabaimran
3b1e8462cd Include attach files in calls to extract questions 2024-11-07 15:59:15 -08:00
sabaimran
de73cbc610 Add support for relaying attached files through backend calls to models 2024-11-07 15:58:52 -08:00
Debanjum
4cad96ded6
Add Script to Evaluate Khoj on Google's FRAMES benchmark (#955)
- Why
We need better, automated evals to measure performance shifts of Khoj
across prompt, model and capability changes.

Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of AI agents. It's a good starter benchmark to evaluate Khoj.

- Details
This PR adds an eval script to evaluate Khoj responses on the the FRAMES
benchmark prompts against the ground truth provided by it.

Script allows configuring sample size, batch size, sampling queries from the
eval dataset.

Gemini is used as an LLM Judge to auto grade Khoj responses vs ground truth 
data from the benchmark.
2024-11-06 17:52:01 -08:00
Debanjum
8679294bed Remove need to set server chat settings from use openai proxies docs
This was previously required, but now it's only usefuly for more
advanced settings, not typical for self-hosting users.

With recent updates, the user's selected chat model is used for both
Khoj's train of thought and response. This makes it easy to
switch your preferred chat model directly from the user settings
page and not have to update this in the admin panel as well.

Reflect these code changse in the docs, by removing the unnecessary
step for self-hosted users to create a server chat setting when using
an OpenAI proxy service like Ollama, LiteLLM etc.
2024-11-05 17:10:53 -08:00
Debanjum
05a93fcbed v-align attach, send buttons with chat input text area on web app
Otherwise, those buttons look off-center when images are attached to
the chat input area
2024-11-05 17:10:53 -08:00
sabaimran
a0480d5f6c use fill weight for the toggle right (enabled state) for research mode 2024-11-04 22:01:09 -08:00
sabaimran
dc26da0a12 Add uploaded files in the conversation file filter for a new convo 2024-11-04 22:00:47 -08:00
Debanjum
b51ee644aa Fix escaping filename when normalizing in org node parser 2024-11-04 20:24:57 -08:00
Debanjum
5724d16a6f Fix passing images to anthropic chat models to extract questions 2024-11-04 20:24:57 -08:00
sabaimran
cf0bcec0e7 Revert SKIP_TESTS flag in offline chat director tests 2024-11-04 19:06:54 -08:00
sabaimran
1f372bf2b1 Update file summarization unit tests now that multiple files are allowed 2024-11-04 17:45:54 -08:00
sabaimran
7543360210 Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter 2024-11-04 16:55:48 -08:00
sabaimran
b6145df3be Handle file retrieval when agent is None 2024-11-04 16:55:22 -08:00
sabaimran
3dc9139cee Add additional handling for when file_object comes back empty 2024-11-04 16:53:07 -08:00
sabaimran
a27b8d3e54 Remove summarize condition for only 1 file filter 2024-11-04 16:51:37 -08:00
sabaimran
362bdebd02 Add methods for reading full files by name and including context
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.

Pipe the relevant attached_files context through all methods calling into models.

We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
2024-11-04 16:37:13 -08:00
sabaimran
e3ca52b7cb Use .get() to get text accompanying image url, instead of subindexing 2024-11-04 16:09:16 -08:00
sabaimran
1e89baca7b Deprecate the UserSearchModelConfig and remove all references
- The server has moved to a model of standardization for the embeddings generation workflow. Remove references to the support for differentiated models.
- The migration script fo ra new model needs to be updated to accommodate full regeneration.
2024-11-04 12:24:41 -08:00
Debanjum
1ccbf72752 Use logger instead of print to track eval 2024-11-04 00:40:26 -08:00
sabaimran
99c1d2831a Release Khoj version 1.28.3 2024-11-02 12:23:11 -07:00
sabaimran
075b4ecf15 Call subscription_to_state with sync_to_async wrapper when getting user subscription state
- This is needed in case the renewal_date is not set and we need to reset it for the user
2024-11-02 12:22:35 -07:00
sabaimran
ec44cbe1e7 Release Khoj version 1.28.2 2024-11-02 07:53:51 -07:00
Debanjum
791eb205f6 Run prompt batches in parallel for faster eval runs 2024-11-02 04:58:03 -07:00
Debanjum
96904e0769 Add script to evaluate khoj on Google's FRAMES benchmark
Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of an agent.

The script uses Gemini as an LLM Judge to evaluate Khoj responses to
the FRAMES benchmark prompts against the ground truth provided by it.
2024-11-02 04:57:42 -07:00
Debanjum
31b5fde163 Only enable prompt tracer if git python is installed 2024-11-02 02:07:02 -07:00
sabaimran
5b18dc96e0 Release Khoj version 1.28.1 2024-11-01 22:51:51 -07:00
sabaimran
8d1b1bc78e Move the git python dependency into top level dependencies 2024-11-01 22:51:00 -07:00
Debanjum
e85dd59295 Release Khoj version 1.28.0 2024-11-01 19:06:59 -07:00
Debanjum
1f79a10541 Fix link to code execution feature in docs 2024-11-01 18:22:21 -07:00
Debanjum
cff8e02b60
Research Mode [Part 2]: Improve Prompts, Edit Chat Messages. Set LLM Seed for Reproducibility (#954)
- Improve chat actors and their prompts for research mode.
- Add documentation to enable the code tool when self-hosting Khoj
- Edit Chat Messages
  - Store Turn Id in each chat message. 
  - Expose API to delete chat message.
  - Expose delete chat message button to turn delete chat message from web app
- Set LLM Generation Seed for Reproducible Debugging and Testing
  - Setting seed for LLM generation is supported by Llama.cpp and OpenAI models. 
    This can (somewhat) restrain LLM output
  - Getting fixed responses for fixed inputs helps test, debug longer reasoning chains like used in advanced reasoning
2024-11-01 18:16:42 -07:00
Debanjum
14e453039d Add prompt tracing, agent personality to infer webpage urls chat actor 2024-11-01 18:12:50 -07:00