- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend
- We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
- The server has moved to a model of standardization for the embeddings generation workflow. Remove references to the support for differentiated models.
- The migration script fo ra new model needs to be updated to accommodate full regeneration.
- Dedent code for readability
- Use better name for in research mode check
- Continue to remove inferred summarize command when multiple files in
file filter even when not in research mode
- Continue to show select information source train of thought.
It was removed by mistake earlier
## Overview
Use git to capture prompt traces of khoj's train of thought. View, analyze and debug them using your favorite git client (e.g vscode, magit).
- Each commit captures an interaction with an LLM
The commit writes the query, response and system message each to a separate file in the repo.
The commit message captures the chat model, Khoj version and other metadata
- Each conversation turn can have multiple interactions with an LLM (e.g Khoj's train of thought)
- Each new conversation turn forks from and merges back into its conversation branch
- Each new conversation branches from the user branch
- Each new user branches from root commit on the main branch
## Usage
1. Set `KHOJ_DEBUG=true` or start khoj in very verbose mode with `khoj -vv` to turn on prompt tracing
2. Chat with Khoj as usual
3. Open the promptrace git repo to view the generated prompt traces using your favorite git porcelain.
The Khoj prompt trace git repo is created at `/tmp/khoj_promptrace` by default. You can configure the prompt trace directory by setting the `PROMPTRACE_DIR`environment variable.
## Implementation
- Add utility functions to capture prompt traces using git (via `gitpython`)
- Make each model provider in Khoj commit their LLM interactions with promptrace
- Weave chat metadata from chat API through all chat actors and commit it to the prompt trace
- Match the online query generator prompt to match the formatting of
extract questions
- Separate iteration results by newline
- Improve webpage and online tool descriptions
- Allow server to start if loading embedding model fails with an error.
This allows fixing the embedding model config via admin panel.
Previously server failed to start if embedding model was configured
incorrectly. This prevented fixing the model config via admin panel.
- Convert boolean string in config json to actual booleans when passed
via admin panel as json before passing to model, query configs
- Only create default model if no search model configured by admin.
Return first created search model if its been configured by admin.
Models were getting a bit confused about who is search for who's
information. Using third person to explicitly call out on who's behalf
these searches are running seems to perform better across
models (gemini's, gpt etc.), even if the role of the message is user.
Use placeholder for newline in json object values until json parsed
and values extracted. This is useful when research mode models outputs
multi-line codeblocks in queries etc.
Anthropic API doesn't have ability to enforce response with valid json
object, unlike all the other model types.
While the model will usually adhere to json output instructions.
This step is meant to more strongly encourage it to just output json
object when response_type of json_object is requested.
Separate conversation history with user from the conversation history
between the tool AIs and the researcher AI.
Tools AIs don't need top level conversation history, that context is
meant for the researcher AI.
The invoked tool AIs need previous attempts at using the tool in this
research runs iteration history to better tune their next run.
Or at least that is the hypothesis to break the models looping.
Models weren't generating a diverse enough set of questions. They'd do
minor variations on the original query. What is required is asking
queries from a bunch of different lenses to retrieve the requisite
information.
This prompt updates shows the AIs the breadth of questions to by
example and instruction. Seem like performance improved based on vibes
- Improve mobile friendliness with new research mode toggle, since chat input area is now taking up more space
- Remove clunky title from the suggestion card
- Fix fk lookup error for agent.creator
Overview
---
- Put context into separate user message before sending to chat model.
This should improve model response quality and truncation logic in code
- Pass online context from chat history to chat model for response.
This should improve response speed when previous online context can be reused
- Improve format of notes, online context passed to chat models in prompt.
This should improve model response quality
Details
---
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Align context passed to offline chat model with other chat models
- Pass context in separate message for better separation between user
query and the shared context
- Pass filename in context
- Add online results for webpage conversation command
Context role was added to allow change message truncation order based
on context role as well.
Revert it for now since currently this is not currently being done.
Previously model would rarely read webpages after webpage search. Need
the model to webpages more regularly for deeper research and to stop
getting stuck in repetitive online search loops
Previous passing of online results as json dump in prompts was less
readable for humans, and I'm guessing less readable for
models (trained on human data) as well?
- Start from this branches src/khoj/routers/api_chat.py
Add tracer to all old and new chat actors that don't have it set
when they are called.
- Update the new chat actors like apick next tool etc to use tracer too
- Conflicts:
Combine both sides of the conflict in all 3 files below
- src/khoj/processor/conversation/utils.py
- src/khoj/routers/helpers.py
- src/khoj/utils/helpers.py
- Message train of thought forks and merges from its conversation branch
- Conversation branches from user branch
- User branches from root commit on the main branch
- Weave chat tracer metadata from api endpoint through all chat actors
and commit it to the prompt trace
### Major
- Give Vision to Anthropic models in Khoj
### Minor
- Reuse logic to format messages for chat with anthropic models
- Make the get image from url function more versatile and reusable
- Encourage output mode chat actor to output only json and nothing else
- Use a single standard search model across the server. There's diminishing benefits for having multiple user-customizable search models.
- We may want to add server-level customization for specific tasks
- Store the search model used to generate a given entry on the `Entry` object
- Remove user-facing APIs and view
- Add a management command for migrating the default search model on the server
In a future PR (after running the migration), we'll also remove the `UserSearchModelConfig`
Latest claude model wanted to say more than just give the json output.
The updated prompt encourages the model to ouput just json. This is
similar to what is already being done for other prompts
It was previously added under the google utils. Now it can be used by
other conversation processors as well.
The updated function
- can get both base64 encoded and PIL formatted images from url
- will return the media type of the image as well in response
* Create explicit flow to enable the free trial
The current design is confusing. It obfuscates the fact that the user is on a free trial. This design will make the opt-in explicit and more intuitive.
* Use the Subscription Type enum instead of hardcoded strings everywhere
* Use length of free trial in the frontend code as well
Had temporarily updated the default selected agent to last used.
Revert for now as
1. The previous logic was buggy. It didn't select the default agent
even when the last used agent was the default agent. Which would
require more work.
2. It maybe too early anyway to set the default agent to last used.
Adding div elements to message to render degraded text copied to
clipboard for messages with user uploaded images.
This change fixes that by separating message to render from message
for clipboard. It ensures differently formatted forms of the user
images are added to the two to allow proper rendering while still
having decently formatted text copied to clipboard
Add newline instead of sending message when hit Enter key on mobile
displays. As on phones shift key doesn't exist and send button is easily
clickable.
Limit hitting Enter key to send message to computers = larger display
= expected to have full fledged keyboards.
- Remove border from agent detail hover card on home page
- Do not wrap long agent names in agent pills on home page
- Handle scenario where chatInputRef is null
Add support for generating dynamic diagrams in flow with Excalidraw (https://github.com/excalidraw/excalidraw). This happens in three steps:
1. Default information collection & intent determination step.
2. Improving the overall guidance of the prompt for generating a JSON, Excalidraw-compatible declaration.
3. Generation of the diagram to output to the final UI.
Add support in the web UI.
Previously only notes context from chat history was included.
This change includes online context from chat history for model to use
for response generation.
This can reduce need for online lookups by reusing previous online
context for faster responses. But will increase overall response time
when not reusing past online context, as faster context buildup per
conversation.
Unsure if inclusion of context is preferrable. If not, both notes and
online context should be removed.
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Marking the context message with assistant role doesn't translate well
across chat models. E.g
- Gemini can't handle consecutive messages by role = model well
- Claude will merge consecutive messages by same role. In current
message ordering the context message will result get merged into the
previous assistant response. And if move context message after user
query. The truncation logic will have to hop and skip while doing
deletions
- GPT seems to handle consecutive roles of any type fine
Using context role = user generalizes better across chat models for
now and aligns with previous behavior.
Improve separation of note snippets and show its origin file in notes
prompt to have more readable, contextualized text shared with model.
Previously the references dict was being directly passed as a string.
The documents don't look well formatted and are less intelligible.
- Passing file path along with notes snippets will help contextualize
the notes better.
- Better formatting should help with making notes more readable by the
chat model.
- Double click on agent to open edit agent card
- Focus on chat input pane when agent selected/clicked
for quick, smooth agent switch and message flow
- Hover on agent to see agent detail card on non-mobile displays
- Use debounce to only show when hover on card for a bit
- Default to None for the input_tools and output_modes so that they can be managed in the admin panel
- Hold off on showing off all Public Agents until we have a better experience for user profiles etc.
Have get agents API return agents ordered intelligently
- Put the default agent first
- Sort used agents by most recently chatted with agent for ease of access
- Randomly shuffle the remaining unused agents for discoverability
This change wraps the agent pane in a scroll area with all agents shown.
It allows selecting an agent to chat with directly from the home
screen without breaking flow and having to jump to the agents page.
The previous flow was not convenient to quickly and consistently start
chat with one of your standard agents.
This was because a random subet of agents were shown on the home page.
To start chat with an agent not shown on home screen load you had to
open the agents page and initiate the conversation from there.
Exposes a transient switch with available agents as selectable options
in the Khoj chat sub-menu.
Currently shows agent slugs instead of agent names as options. This
isn't the cleanest but gets the job done for now.
Only new conversations with a different agent can be started. Existing
conversations will continue with the original agent it was created with.
The ability to switch the conversation's agent doesn't exist on the
server yet.
One limitation of this methodology is that localStorage has a limit in how much data it can take. Should add more graceful error handling here as well.
Currently experiencing difficulty instruction following when an image is shared. It's more likely to try and output an image. Update to make a clearer distinction.
- Put the attached images display div inside the same parent div as
the text area
- Keep the attachment, microphone/send message buttons aligned with
the text area. So the attached images just show up at the top of the
text area but everything else stays at the same horizontal height as
before.
- This improves the UX by
- Ensuring that the attached images do not obscure the agents pane
above the chat input area
- The attached images visually look like they are inside the actual
input area, rather than floating above it. So the visual aligns
with the semantics
Previously the web app only expected a single image to be shared by
the user as part of their query.
This change allows sharing multiple images from the web app.
Closes#921
Previously Khoj could respond to a single shared image at a time.
This changes updates the chat API to accept multiple images shared by
the user and send it to the appropriate chat actors including the
openai response generation chat actor for getting an image aware
response
Recent changes made Khoj try respond even when document lookup fails.
This change missed handling downstream effects of a failed document
lookup, as the defiltered_query was null and so the text response
didn't have the user query to respond to.
This code initializes defiltered_query to original user query to
handle that.
Also response_type wasn't being passed via
send_message_to_model_wrapper_sync unlike in the async scenario
- Simplifies changing order in which web scrapers are invoked to read
web page by just changing their priority number on the admin panel.
Previously you'd have to delete/, re-add the scrapers to change
their priority.
- Add help text for each scraper field to ease admin setup experience
- Friendlier env var to use Firecrawl's LLM to extract content
- Remove use of separate friendly name for scraper types.
Reuse actual name and just make actual name better
The other webpage scrapers will not work for internal webpages. Try
access those urls directly if they are visible to the Khoj server over
the network.
Only enable this by default for self-hosted, single user setups.
Otherwise ability to scan internal network would be a liability!
For use-cases where it makes sense, the Khoj server admin can
explicitly add the direct webpage scraper via the admin panel
- Set up scrapers via API keys, explicitly adding them via admin panel
or enabling only a single scraper to use via server chat settings.
- Use validation to ensure only valid scrapers added via admin panel
Example API key is present for scrapers that require it etc.
- Modularize the read webpage functions to take api key, url as args
Removes dependence on constants loaded in online_search. Functions
are now mostly self contained
- Improve ability to read webpages by using the speed, success rate of
different scrapers. Optimal configuration needs to be discovered
This should reduce webpage read and response generation time.
Previously, we'd run separate webpage read and extract relevant
content pipes for each distinct (query, url) pair.
Now we aggregate all queries for each url to extract information from
and run the webpage read and extract relevant content pipes once for
each distinct url.
Even though the webpage content extraction pipes were previously being
in parallel. They increased response time by
1. adding more context for the response generation chat actor to
respond from
2. and by being more susceptible to page read and extract latencies of
the parallel jobs
The aggregated retrieval of context for all queries for a given
webpage could result in some hit to context quality. But it should
improve and reduce variability in response time, quality and costs.
Set the FIRECRAWL_TO_EXTRACT environment variable to true to have
Firecrawl scrape and extract content from webpage using their LLM
This could be faster, not sure about quality as LLM used is obfuscated
Firecrawl is open-source, self-hostable with a default hosted service
provided, similar to Jina.ai. So it can be
1. Self-hosted as part of a private Khoj cloud deployment
2. Used directly by getting an API key from the Firecrawl.dev service
This is as an alternative to Olostep and Jina.ai for reading webpages.
Khoj shouldn't refuse to respond to user if web lookups fail.
It should transparently mention that online search etc. failed.
But try respond as best as it can without those references
This change ensures a response to the users query is attempted even
when web info retrieval fails.
The huggingface endpoint can be flaky. Khoj shouldn't refuse to
respond to user if document search fails.
It should transparently mention that document lookup failed.
But try respond as best as it can without the document references
This changes provides graceful failover when inference endpoint
requests fail either when encoding query or reranking retrieved docs
- Remove unused subscribed variable from the chat API
- Unexpectedly dropped client app logging when migrated API chat to do
advanced streaming in july
- Only set addedFiles to selectedFiles when selectedFiles is an array
- Only set seleectedFiles, addedFiles to API response json when
response succeeded. Previously we set it to response json
on errors as well. This made the variables into json objects instead
of arrays on API call failure
- Check if selectedFiles, addedFiles are arrays before running
operations on them. Previously the addedFiles.includes was where the
code would fail
finish_reason (google.ai.generativelanguage_v1beta.types.Candidate.FinishReason):
Optional. Output only. The reason why the
model stopped generating tokens.
If empty, the model has not stopped generating
the tokens.
- Advanced chat model should also fallback to user chat model if set
- Get conversation config should falback to user chat model if set
These assume no server chat model settings is configured
Khoj shouldn't refuse to respond to user if web lookups fail.
It should transparently mention that online search etc. failed.
But try respond as best as it can without those references
This change ensures a response to the users query is attempted even
when web info retrieval fails.
The huggingface endpoint can be flaky. Khoj shouldn't refuse to
respond to user if document search fails.
It should transparently mention that document lookup failed.
But try respond as best as it can without the document references
This changes provides graceful failover when inference endpoint
requests fail either when encoding query or reranking retrieved docs
- Advanced chat model should also fallback to user chat model if set
- Get conversation config should falback to user chat model if set
These assume no server chat model settings is configured
- Advanced chat model should also fallback to user chat model if set
- Get conversation config should falback to user chat model if set
These assume no server chat model settings is configured
# Overview
- Default to use user chat models for train of thought when no server chat settings created by admins
- Default to not create server chat settings on first run
# Details
This change simplifies switching chat models for self-hosted setups
by just changing the chat model on the user settings page.
It falls back to use the user chat model for train of thought
if server chat settings have not been created on the admin panel.
Server chat settings, when set, controls the chat model used
for Khoj's train of thought and the default user chat model.
Previously a self-hosted user had to update
1. the server chat settings in the admin panel and
2. their own user chat model in the user settings panel
to completely switch to a different chat model
for both train of thought & response generation respectively
You can still set server chat settings via the admin panel
to use a different chat model for train of thought vs response generation.
But this is only useful for advanced, multi-user setups.
Update regex to also include any links to code generated images that
aren't explicitly meant to be displayed inline. This allows folks to
download the image (unlike the fake link that doesn't work created by
model)
Previously Khoj would start answering the previous query. This maybe
because the prompt uses User for prompt in chat history but was using
Q for current user prompt.
Make webpages to read automatically on search_online configurable via
a argument.
Set it to default to 1, so other callers of the function
are unaffected.
But iterative chat director can still decide which, if
any, webpages to read based on the online search it performs
This change allows the iterative director to dive deeper into its
research as the data extracted contains relevant links from the webpage
Previous summarization prompt didn't extract relevant links from the
webpage which limited further explorations from webpages
Move construct_chat_history and ChatEvent enum into conversation.utils
and move send_message_to_model_wrapper to conversation.helper to
modularize code. And start thinning out the bloated routers.helper
- conversation.util components are shared functions that conversation
child packages can use.
- conversation.helper components can't be imported by conversation
packages but it can use these child packages
This division allows better modularity while avoiding circular
import dependencies
Create python code executing chat actor
- The chat actor generate python code within sandbox constraints
- Run the generated python code in the cohere terrarium, pyodide
based sandbox accessible at sandbox url
- Create a more dynamic reasoning agent that can evaluate information and understand what it doesn't know, making moves to get that information
- Lots of hacks and code that needs to be reversed later on before submission
Update chat actors to use user's chat model for train of thought. This
requires passing the user info as argument to all the chat actors.
Whether the user is subscribed or not can be inferred from the user
info being passed, so it doesn't need to be passed as a separate
argument to chat actor functions
Let send_message_to_model function infer chat model instead of passing
it as an argument from some chat actors. Better if this logic can be
done in a single place.
Server chat settings can be set for advanced self-hosted or multi-user
cloud setups. They are not necessary anymore as we fallback to use the
users chat model for train of thought now
Fallback to use user chat model for train of thought if server chat
settings not defined.
This simplifies switching chat models for single-user, self-hosted
setups by just changing the chat model on the user settings page.
Server chat settings, when set, controls the default user chat model
and the chat model that is used for Khoj's train of thought.
Previously a self-hosted user had to update both the server chat
settings in the admin panel and their own user chat model in the user
settings panel to explicitly switch to a different chat model (i.e to
switch to a new model for both train of thought & response generation)
You can still set server chat settings to use a different chat
model for train of thought and response generation. But this is only
necessary for advanced self-hosted or cloud hosted setups of Khoj.
Previously you had to refresh the page to see the updated data on
reopening the agents edit card after a save operation.
Now you see the latest saved agent data on reopening the agents edit
card. This should avoid confusion on whether the data was saved
correctly
If a public or protected agent is made private. Other users who were
having conversation with that agent will have to carry on their
conversation using default agent instead
Loading the embeddings model, even locally seems to be taking much
longer. Use timer to track visibility into embedding, cross-encoder
model load times
We should start disambiguating the the max input from output size. Max
prompt size should only be used for the max input context to an LLM.
If required max_output_tokens should be set as a separate new field