Support including file attachments in the chat message
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).
This will ensure only unique online references are shown in all
clients.
The duplication issue was exacerbated in research mode as even with
different online search queries, you can get previously seen results.
This change does a global deduplication across all online results seen
across research iterations before returning them in client reponse.
- Deduplicate online, doc search queries across research iterations.
This avoids running previously run online, doc searches again and
dedupes online, doc context seen by model to generate response.
- Deduplicate online search queries generated by chat model for each
user query.
- Do not pass online, docs, code context separately when generate
response in research mode. These are already collected in the meta
research passed with the user query
- Improve formatting of context passed to generate research response
- Use xml tags to delimit context. Pass per iteration queries in each
iteration result
- Put user query before meta research results in user message passed
for generating response
This deduplications will improve speed, cost & quality of research mode
Previously the whole research mode response would fail if the pick
next tool call to chat model failed. Now instead of it completely
failing, the researcher actor is told to try again in next iteration.
This allows for a more graceful degradation in answering a research
question even if a (few?) calls to the chat model fail.
Jina search API returns content of all webpages in search results.
Previously code wouldn't remove content beyond max_webpages_to_read
limit set. Now, webpage content in organic results aree explicitly
removed beyond the requested max_webpage_to_read limit.
This should align behavior of online results from Jina with other
online search providers. And restrict llm context to a reasonable size
when using Jina for online search.
This fixes chat with old chat sessions. Fixes issue with old Whatsapp
users can't chat with Khoj because chat history doc context was
stored as a list earlier
Command rate limit wouldn't be shown to user as server wouldn't be
able to handle HTTP exception in the middle of streaming.
Catch exception and render it as LLM response message instead for
visibility into command rate limiting to user on client
Log rate limmit messages for all rate limit events on server as info
messages
Convert exception messages into first person responses by Khoj to
prevent breaking the third wall and provide more details on wht
happened and possible ways to resolve them.
- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend
- We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
- The server has moved to a model of standardization for the embeddings generation workflow. Remove references to the support for differentiated models.
- The migration script fo ra new model needs to be updated to accommodate full regeneration.
- Dedent code for readability
- Use better name for in research mode check
- Continue to remove inferred summarize command when multiple files in
file filter even when not in research mode
- Continue to show select information source train of thought.
It was removed by mistake earlier
## Overview
Use git to capture prompt traces of khoj's train of thought. View, analyze and debug them using your favorite git client (e.g vscode, magit).
- Each commit captures an interaction with an LLM
The commit writes the query, response and system message each to a separate file in the repo.
The commit message captures the chat model, Khoj version and other metadata
- Each conversation turn can have multiple interactions with an LLM (e.g Khoj's train of thought)
- Each new conversation turn forks from and merges back into its conversation branch
- Each new conversation branches from the user branch
- Each new user branches from root commit on the main branch
## Usage
1. Set `KHOJ_DEBUG=true` or start khoj in very verbose mode with `khoj -vv` to turn on prompt tracing
2. Chat with Khoj as usual
3. Open the promptrace git repo to view the generated prompt traces using your favorite git porcelain.
The Khoj prompt trace git repo is created at `/tmp/khoj_promptrace` by default. You can configure the prompt trace directory by setting the `PROMPTRACE_DIR`environment variable.
## Implementation
- Add utility functions to capture prompt traces using git (via `gitpython`)
- Make each model provider in Khoj commit their LLM interactions with promptrace
- Weave chat metadata from chat API through all chat actors and commit it to the prompt trace
- Match the online query generator prompt to match the formatting of
extract questions
- Separate iteration results by newline
- Improve webpage and online tool descriptions
- Allow server to start if loading embedding model fails with an error.
This allows fixing the embedding model config via admin panel.
Previously server failed to start if embedding model was configured
incorrectly. This prevented fixing the model config via admin panel.
- Convert boolean string in config json to actual booleans when passed
via admin panel as json before passing to model, query configs
- Only create default model if no search model configured by admin.
Return first created search model if its been configured by admin.
Models were getting a bit confused about who is search for who's
information. Using third person to explicitly call out on who's behalf
these searches are running seems to perform better across
models (gemini's, gpt etc.), even if the role of the message is user.
Use placeholder for newline in json object values until json parsed
and values extracted. This is useful when research mode models outputs
multi-line codeblocks in queries etc.
Anthropic API doesn't have ability to enforce response with valid json
object, unlike all the other model types.
While the model will usually adhere to json output instructions.
This step is meant to more strongly encourage it to just output json
object when response_type of json_object is requested.
Separate conversation history with user from the conversation history
between the tool AIs and the researcher AI.
Tools AIs don't need top level conversation history, that context is
meant for the researcher AI.
The invoked tool AIs need previous attempts at using the tool in this
research runs iteration history to better tune their next run.
Or at least that is the hypothesis to break the models looping.
Models weren't generating a diverse enough set of questions. They'd do
minor variations on the original query. What is required is asking
queries from a bunch of different lenses to retrieve the requisite
information.
This prompt updates shows the AIs the breadth of questions to by
example and instruction. Seem like performance improved based on vibes
- Improve mobile friendliness with new research mode toggle, since chat input area is now taking up more space
- Remove clunky title from the suggestion card
- Fix fk lookup error for agent.creator
Overview
---
- Put context into separate user message before sending to chat model.
This should improve model response quality and truncation logic in code
- Pass online context from chat history to chat model for response.
This should improve response speed when previous online context can be reused
- Improve format of notes, online context passed to chat models in prompt.
This should improve model response quality
Details
---
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Align context passed to offline chat model with other chat models
- Pass context in separate message for better separation between user
query and the shared context
- Pass filename in context
- Add online results for webpage conversation command
Context role was added to allow change message truncation order based
on context role as well.
Revert it for now since currently this is not currently being done.