Support including file attachments in the chat message
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).
This will ensure only unique online references are shown in all
clients.
The duplication issue was exacerbated in research mode as even with
different online search queries, you can get previously seen results.
This change does a global deduplication across all online results seen
across research iterations before returning them in client reponse.
- Deduplicate online, doc search queries across research iterations.
This avoids running previously run online, doc searches again and
dedupes online, doc context seen by model to generate response.
- Deduplicate online search queries generated by chat model for each
user query.
- Do not pass online, docs, code context separately when generate
response in research mode. These are already collected in the meta
research passed with the user query
- Improve formatting of context passed to generate research response
- Use xml tags to delimit context. Pass per iteration queries in each
iteration result
- Put user query before meta research results in user message passed
for generating response
This deduplications will improve speed, cost & quality of research mode
Previously the whole research mode response would fail if the pick
next tool call to chat model failed. Now instead of it completely
failing, the researcher actor is told to try again in next iteration.
This allows for a more graceful degradation in answering a research
question even if a (few?) calls to the chat model fail.
Jina search API returns content of all webpages in search results.
Previously code wouldn't remove content beyond max_webpages_to_read
limit set. Now, webpage content in organic results aree explicitly
removed beyond the requested max_webpage_to_read limit.
This should align behavior of online results from Jina with other
online search providers. And restrict llm context to a reasonable size
when using Jina for online search.
This fixes chat with old chat sessions. Fixes issue with old Whatsapp
users can't chat with Khoj because chat history doc context was
stored as a list earlier
Command rate limit wouldn't be shown to user as server wouldn't be
able to handle HTTP exception in the middle of streaming.
Catch exception and render it as LLM response message instead for
visibility into command rate limiting to user on client
Log rate limmit messages for all rate limit events on server as info
messages
Convert exception messages into first person responses by Khoj to
prevent breaking the third wall and provide more details on wht
happened and possible ways to resolve them.
Previously the batch start index wasn't being passed so all batches
started in parallel were showing the same processing example index
This change doesn't impact the evaluation itself, just the index shown
of the example currently being evaluated