This will ensure only unique online references are shown in all
clients.
The duplication issue was exacerbated in research mode as even with
different online search queries, you can get previously seen results.
This change does a global deduplication across all online results seen
across research iterations before returning them in client reponse.
- Deduplicate online, doc search queries across research iterations.
This avoids running previously run online, doc searches again and
dedupes online, doc context seen by model to generate response.
- Deduplicate online search queries generated by chat model for each
user query.
- Do not pass online, docs, code context separately when generate
response in research mode. These are already collected in the meta
research passed with the user query
- Improve formatting of context passed to generate research response
- Use xml tags to delimit context. Pass per iteration queries in each
iteration result
- Put user query before meta research results in user message passed
for generating response
This deduplications will improve speed, cost & quality of research mode
Previously the whole research mode response would fail if the pick
next tool call to chat model failed. Now instead of it completely
failing, the researcher actor is told to try again in next iteration.
This allows for a more graceful degradation in answering a research
question even if a (few?) calls to the chat model fail.
Jina search API returns content of all webpages in search results.
Previously code wouldn't remove content beyond max_webpages_to_read
limit set. Now, webpage content in organic results aree explicitly
removed beyond the requested max_webpage_to_read limit.
This should align behavior of online results from Jina with other
online search providers. And restrict llm context to a reasonable size
when using Jina for online search.
This fixes chat with old chat sessions. Fixes issue with old Whatsapp
users can't chat with Khoj because chat history doc context was
stored as a list earlier
Command rate limit wouldn't be shown to user as server wouldn't be
able to handle HTTP exception in the middle of streaming.
Catch exception and render it as LLM response message instead for
visibility into command rate limiting to user on client
Log rate limmit messages for all rate limit events on server as info
messages
Convert exception messages into first person responses by Khoj to
prevent breaking the third wall and provide more details on wht
happened and possible ways to resolve them.
- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend
- We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
- The server has moved to a model of standardization for the embeddings generation workflow. Remove references to the support for differentiated models.
- The migration script fo ra new model needs to be updated to accommodate full regeneration.
- Dedent code for readability
- Use better name for in research mode check
- Continue to remove inferred summarize command when multiple files in
file filter even when not in research mode
- Continue to show select information source train of thought.
It was removed by mistake earlier
## Overview
Use git to capture prompt traces of khoj's train of thought. View, analyze and debug them using your favorite git client (e.g vscode, magit).
- Each commit captures an interaction with an LLM
The commit writes the query, response and system message each to a separate file in the repo.
The commit message captures the chat model, Khoj version and other metadata
- Each conversation turn can have multiple interactions with an LLM (e.g Khoj's train of thought)
- Each new conversation turn forks from and merges back into its conversation branch
- Each new conversation branches from the user branch
- Each new user branches from root commit on the main branch
## Usage
1. Set `KHOJ_DEBUG=true` or start khoj in very verbose mode with `khoj -vv` to turn on prompt tracing
2. Chat with Khoj as usual
3. Open the promptrace git repo to view the generated prompt traces using your favorite git porcelain.
The Khoj prompt trace git repo is created at `/tmp/khoj_promptrace` by default. You can configure the prompt trace directory by setting the `PROMPTRACE_DIR`environment variable.
## Implementation
- Add utility functions to capture prompt traces using git (via `gitpython`)
- Make each model provider in Khoj commit their LLM interactions with promptrace
- Weave chat metadata from chat API through all chat actors and commit it to the prompt trace
- Match the online query generator prompt to match the formatting of
extract questions
- Separate iteration results by newline
- Improve webpage and online tool descriptions
- Allow server to start if loading embedding model fails with an error.
This allows fixing the embedding model config via admin panel.
Previously server failed to start if embedding model was configured
incorrectly. This prevented fixing the model config via admin panel.
- Convert boolean string in config json to actual booleans when passed
via admin panel as json before passing to model, query configs
- Only create default model if no search model configured by admin.
Return first created search model if its been configured by admin.
Models were getting a bit confused about who is search for who's
information. Using third person to explicitly call out on who's behalf
these searches are running seems to perform better across
models (gemini's, gpt etc.), even if the role of the message is user.
Use placeholder for newline in json object values until json parsed
and values extracted. This is useful when research mode models outputs
multi-line codeblocks in queries etc.
Anthropic API doesn't have ability to enforce response with valid json
object, unlike all the other model types.
While the model will usually adhere to json output instructions.
This step is meant to more strongly encourage it to just output json
object when response_type of json_object is requested.
Separate conversation history with user from the conversation history
between the tool AIs and the researcher AI.
Tools AIs don't need top level conversation history, that context is
meant for the researcher AI.
The invoked tool AIs need previous attempts at using the tool in this
research runs iteration history to better tune their next run.
Or at least that is the hypothesis to break the models looping.
Models weren't generating a diverse enough set of questions. They'd do
minor variations on the original query. What is required is asking
queries from a bunch of different lenses to retrieve the requisite
information.
This prompt updates shows the AIs the breadth of questions to by
example and instruction. Seem like performance improved based on vibes
- Improve mobile friendliness with new research mode toggle, since chat input area is now taking up more space
- Remove clunky title from the suggestion card
- Fix fk lookup error for agent.creator
Overview
---
- Put context into separate user message before sending to chat model.
This should improve model response quality and truncation logic in code
- Pass online context from chat history to chat model for response.
This should improve response speed when previous online context can be reused
- Improve format of notes, online context passed to chat models in prompt.
This should improve model response quality
Details
---
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Align context passed to offline chat model with other chat models
- Pass context in separate message for better separation between user
query and the shared context
- Pass filename in context
- Add online results for webpage conversation command
Context role was added to allow change message truncation order based
on context role as well.
Revert it for now since currently this is not currently being done.
Previously model would rarely read webpages after webpage search. Need
the model to webpages more regularly for deeper research and to stop
getting stuck in repetitive online search loops
Previous passing of online results as json dump in prompts was less
readable for humans, and I'm guessing less readable for
models (trained on human data) as well?
- Start from this branches src/khoj/routers/api_chat.py
Add tracer to all old and new chat actors that don't have it set
when they are called.
- Update the new chat actors like apick next tool etc to use tracer too
- Conflicts:
Combine both sides of the conflict in all 3 files below
- src/khoj/processor/conversation/utils.py
- src/khoj/routers/helpers.py
- src/khoj/utils/helpers.py
- Message train of thought forks and merges from its conversation branch
- Conversation branches from user branch
- User branches from root commit on the main branch
- Weave chat tracer metadata from api endpoint through all chat actors
and commit it to the prompt trace
### Major
- Give Vision to Anthropic models in Khoj
### Minor
- Reuse logic to format messages for chat with anthropic models
- Make the get image from url function more versatile and reusable
- Encourage output mode chat actor to output only json and nothing else
- Use a single standard search model across the server. There's diminishing benefits for having multiple user-customizable search models.
- We may want to add server-level customization for specific tasks
- Store the search model used to generate a given entry on the `Entry` object
- Remove user-facing APIs and view
- Add a management command for migrating the default search model on the server
In a future PR (after running the migration), we'll also remove the `UserSearchModelConfig`
Latest claude model wanted to say more than just give the json output.
The updated prompt encourages the model to ouput just json. This is
similar to what is already being done for other prompts
It was previously added under the google utils. Now it can be used by
other conversation processors as well.
The updated function
- can get both base64 encoded and PIL formatted images from url
- will return the media type of the image as well in response
* Create explicit flow to enable the free trial
The current design is confusing. It obfuscates the fact that the user is on a free trial. This design will make the opt-in explicit and more intuitive.
* Use the Subscription Type enum instead of hardcoded strings everywhere
* Use length of free trial in the frontend code as well
Had temporarily updated the default selected agent to last used.
Revert for now as
1. The previous logic was buggy. It didn't select the default agent
even when the last used agent was the default agent. Which would
require more work.
2. It maybe too early anyway to set the default agent to last used.
Adding div elements to message to render degraded text copied to
clipboard for messages with user uploaded images.
This change fixes that by separating message to render from message
for clipboard. It ensures differently formatted forms of the user
images are added to the two to allow proper rendering while still
having decently formatted text copied to clipboard
Add newline instead of sending message when hit Enter key on mobile
displays. As on phones shift key doesn't exist and send button is easily
clickable.
Limit hitting Enter key to send message to computers = larger display
= expected to have full fledged keyboards.
- Remove border from agent detail hover card on home page
- Do not wrap long agent names in agent pills on home page
- Handle scenario where chatInputRef is null
Add support for generating dynamic diagrams in flow with Excalidraw (https://github.com/excalidraw/excalidraw). This happens in three steps:
1. Default information collection & intent determination step.
2. Improving the overall guidance of the prompt for generating a JSON, Excalidraw-compatible declaration.
3. Generation of the diagram to output to the final UI.
Add support in the web UI.
Previously only notes context from chat history was included.
This change includes online context from chat history for model to use
for response generation.
This can reduce need for online lookups by reusing previous online
context for faster responses. But will increase overall response time
when not reusing past online context, as faster context buildup per
conversation.
Unsure if inclusion of context is preferrable. If not, both notes and
online context should be removed.
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Marking the context message with assistant role doesn't translate well
across chat models. E.g
- Gemini can't handle consecutive messages by role = model well
- Claude will merge consecutive messages by same role. In current
message ordering the context message will result get merged into the
previous assistant response. And if move context message after user
query. The truncation logic will have to hop and skip while doing
deletions
- GPT seems to handle consecutive roles of any type fine
Using context role = user generalizes better across chat models for
now and aligns with previous behavior.
Improve separation of note snippets and show its origin file in notes
prompt to have more readable, contextualized text shared with model.
Previously the references dict was being directly passed as a string.
The documents don't look well formatted and are less intelligible.
- Passing file path along with notes snippets will help contextualize
the notes better.
- Better formatting should help with making notes more readable by the
chat model.