- Set output mode to single string. Specify output schema in prompt
- Both thesee should encourage model to select only 1 output mode
instead of encouraging it in prompt too many times
- Output schema should also improve schema following in general
- Standardize variable, func name of io selector for readability
- Fix chat actors to test the io selector chat actor
- Make chat actor return sources, output separately for better
disambiguation, at least during tests, for now
- JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically.
- Make slight improvements to tool selection indicators
Previously, we'd replace the generated message with an error message
when message generation stopped via stop button on chat page of web app.
So the partially generated message (which could be useful) gets lost.
This change just stops generation, while keeping the generated
response so any useful information from the partially generated
message can be retrieved.
- Allows managing chat models in the OpenAI proxy service like Ollama.
- Removes need to manually add, remove chat models from Khoj Admin Panel
for these OpenAI compatible API services when enabled.
- Khoj still mantains the chat models configs within Khoj, so they can
be configured via the Khoj admin panel as usual.
Previously Jina search didn't API key. Now that it does need API key,
we should re-use the API key set in the Jina web scraper config,
otherwise fallback to using JINA_API_KEY from environment variable, if
either is present.
Resolves#978
- Integrate with Ollama or other openai compatible APIs by simply
setting `OPENAI_API_BASE' environment variable in docker-compose etc.
- Update docs on integrating with Ollama, openai proxies on first run
- Auto populate all chat models supported by openai compatible APIs
- Auto set vision enabled for all commercial models
- Minor
- Add huggingface cache to khoj_models volume. This is where chat
models and (now) sentence transformer models are stored by default
- Reduce verbosity of yarn install of web app. Otherwise hit docker
log size limit & stops showing remaining logs after web app install
- Suggest `ollama pull <model_name>` to start it in background
- Update to latest initialize with new claude 3.5 sonnet and haiku models
- Update to set vision enabled for google and anthropic models by
default. Previously we didn't support but we've supported this for a
month or two now
- Explictly adding a slash command is a higher priority intent than
research mode being enabled in the background. Respect that for a
more intuitive UX flow.
- Explicit slash commands do not currently work in research mode.
You've to turn research mode off to use other slash commands. This
is strange, unnecessary given intent priority is clear.
- JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically.
- Make slight improvements to tool selection indicators
Previously errors would get eaten up but the model wouldn't see
anything. And the model wouldn't be allowed re-run the same query-tool
combination in the next iteration.
This update should give it insight into why it didn't get a result. So
it can make an informed (hopefully better) decision on what to do next.
And re-run the previous query if appropriate.
Previously when call to online search API etc. failed, it'd error out
of response to query in research mode. Khoj should skip tool use that
iteration but continue to try respond.
Previously chatml messages were just strings, now they can be list of
strings or list of dicts as well.
- Use json seriallization to manage their variations and truncate them
before printing for context.
- Put logic in single function for use across chat models
Previously chatml messages were just strings.
Since gemini, anthropic models always have messages as list of
strings, truncate those strings instead of the list of message content
Removing binary data and truncating large data in output files
generated by code runs should improve speed and cost of research mode
runs with large or binary output files.
Previously binary data in code results was passed around in iteration
context during research mode. This made the context inefficient
because models have limited efficiency and reasoning capabilities over
b64 encoded image (and other binary) data and would hit context limits
leading to unnecessary truncation of other useful context
Also remove image data when logging output of code execution
- Allow passing user files as input into code sandbox for analysis
- Update prompt to give more example of complex, multi-line code
- Simplify logic for model. Run one program at a time,
instead of allowing model to run multiple programs in parallel
- Show Code generated charts and docs in Reference pane of web app and make them downloaded
- Add a border below heading
- Show code snippet in pre block
- Overflow-x when reference side panel open to allow seeing whole text
via x-scroll
- Align header, body position of reference cards with each other
- Only show filename in doc reference cards at message bottom.
Show full file path in hover and reference side panel
- Improve rendering code reference with better icons, smaller text and
different line clamps for better visibility
- Show code output files as sub card of code card in reference section
- Allow downloading files generated by code instead of rendering it in
chat message directly
- Show executed code before online references in reference panel
- Fix to render code generated chart with images, excalidraw diagrams
- Fix to save code context to chat history in image, diagram output modes
- Fix bug in image markdown being wrapped twice in markdown syntax
- Render newline in code references shown on chat page of web app
Previously newlines weren't getting rendered. This made the code
executed by Khoj hard to read in references. This changes fixes that.
`dangerouslySetInnerHTML' usage is justified as rendered code
snippet is being sanitized by DOMPurify before rendering.
- Run one program at a time, instead of allowing model to pass
multiple programs to run in parallel to simplify logic for model
- Update prompt to give more example of complex, multi-line code
- Allow passing user files as input into code sandbox for analysis
- Log code execution timer at info level to evaluate execution latencies
in production
- Type the generated code for easier processing by caller functions
Support including file attachments in the chat message
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).
This will ensure only unique online references are shown in all
clients.
The duplication issue was exacerbated in research mode as even with
different online search queries, you can get previously seen results.
This change does a global deduplication across all online results seen
across research iterations before returning them in client reponse.
- Deduplicate online, doc search queries across research iterations.
This avoids running previously run online, doc searches again and
dedupes online, doc context seen by model to generate response.
- Deduplicate online search queries generated by chat model for each
user query.
- Do not pass online, docs, code context separately when generate
response in research mode. These are already collected in the meta
research passed with the user query
- Improve formatting of context passed to generate research response
- Use xml tags to delimit context. Pass per iteration queries in each
iteration result
- Put user query before meta research results in user message passed
for generating response
This deduplications will improve speed, cost & quality of research mode
Previously the whole research mode response would fail if the pick
next tool call to chat model failed. Now instead of it completely
failing, the researcher actor is told to try again in next iteration.
This allows for a more graceful degradation in answering a research
question even if a (few?) calls to the chat model fail.
Jina search API returns content of all webpages in search results.
Previously code wouldn't remove content beyond max_webpages_to_read
limit set. Now, webpage content in organic results aree explicitly
removed beyond the requested max_webpage_to_read limit.
This should align behavior of online results from Jina with other
online search providers. And restrict llm context to a reasonable size
when using Jina for online search.
This fixes chat with old chat sessions. Fixes issue with old Whatsapp
users can't chat with Khoj because chat history doc context was
stored as a list earlier
Command rate limit wouldn't be shown to user as server wouldn't be
able to handle HTTP exception in the middle of streaming.
Catch exception and render it as LLM response message instead for
visibility into command rate limiting to user on client
Log rate limmit messages for all rate limit events on server as info
messages
Convert exception messages into first person responses by Khoj to
prevent breaking the third wall and provide more details on wht
happened and possible ways to resolve them.