- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend
- We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
- Why
We need better, automated evals to measure performance shifts of Khoj
across prompt, model and capability changes.
Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of AI agents. It's a good starter benchmark to evaluate Khoj.
- Details
This PR adds an eval script to evaluate Khoj responses on the the FRAMES
benchmark prompts against the ground truth provided by it.
Script allows configuring sample size, batch size, sampling queries from the
eval dataset.
Gemini is used as an LLM Judge to auto grade Khoj responses vs ground truth
data from the benchmark.
This was previously required, but now it's only usefuly for more
advanced settings, not typical for self-hosting users.
With recent updates, the user's selected chat model is used for both
Khoj's train of thought and response. This makes it easy to
switch your preferred chat model directly from the user settings
page and not have to update this in the admin panel as well.
Reflect these code changse in the docs, by removing the unnecessary
step for self-hosted users to create a server chat setting when using
an OpenAI proxy service like Ollama, LiteLLM etc.
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
- The server has moved to a model of standardization for the embeddings generation workflow. Remove references to the support for differentiated models.
- The migration script fo ra new model needs to be updated to accommodate full regeneration.