## Objective
Improve build speed and size of khoj docker images
## Changes
### Improve docker image build speeds
- Decouple web app and server build steps
- Build the web app and server in parallel
- Cache docker layers for reuse across dockerize github workflow runs
- Split Docker build layers for improved cacheability (e.g separate `yarn install` and `yarn build` steps)
### Reduce size of khoj docker images
- Use an up-to-date `.dockerignore` to exclude unnecessary directories
- Do not installing cuda python packages for cpu builds
### Improve web app builds
- Use consistent mechanism to get fonts for web app
- Make tailwind extensions production instead of dev dependencies
- Make next.js create production builds for the web app (via `NODE_ENV=production` env var)
- Track, return cost and usage metrics in chat api response
Track input, output token usage and cost of interactions with
openai, anthropic and google chat models for each call to the khoj chat api
- Collect, display and store costs & accuracy of eval run currently in progress
This provides more insight into eval runs during execution
instead of having to wait until the eval run completes.
- Previously when settings list became long the dropdown height would
overflow screen height. Now it's max height is clamped and y-scroll
- Previously the dropdown content would take width of content. This
would mean the menu could sometimes be less wide than the button. It
felt strange. Now dropdown content is at least width of parent button
- Track input, output token usage and cost for interactions
via chat api with openai, anthropic and google chat models
- Get usage metadata from OpenAI using stream_options
- Handle openai proxies that do not support passing usage in response
- Add new usage, end response events returned by chat api.
- This can be optionally consumed by clients at a later point
- Update streaming clients to mark message as completed after new
end response event, not after end llm response event
- Ensure usage data from final response generation step is included
- Pass usage data after llm response complete. This allows gathering
token usage and cost for the final response generation step across
streaming and non-streaming modes
- JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically.
- Make slight improvements to tool selection indicators
Previously, we'd replace the generated message with an error message
when message generation stopped via stop button on chat page of web app.
So the partially generated message (which could be useful) gets lost.
This change just stops generation, while keeping the generated
response so any useful information from the partially generated
message can be retrieved.
- Explictly adding a slash command is a higher priority intent than
research mode being enabled in the background. Respect that for a
more intuitive UX flow.
- Explicit slash commands do not currently work in research mode.
You've to turn research mode off to use other slash commands. This
is strange, unnecessary given intent priority is clear.
- JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically.
- Make slight improvements to tool selection indicators
- Allow passing user files as input into code sandbox for analysis
- Update prompt to give more example of complex, multi-line code
- Simplify logic for model. Run one program at a time,
instead of allowing model to run multiple programs in parallel
- Show Code generated charts and docs in Reference pane of web app and make them downloaded
- Add a border below heading
- Show code snippet in pre block
- Overflow-x when reference side panel open to allow seeing whole text
via x-scroll
- Align header, body position of reference cards with each other
- Only show filename in doc reference cards at message bottom.
Show full file path in hover and reference side panel
- Improve rendering code reference with better icons, smaller text and
different line clamps for better visibility
- Show code output files as sub card of code card in reference section
- Allow downloading files generated by code instead of rendering it in
chat message directly
- Show executed code before online references in reference panel
- Fix to render code generated chart with images, excalidraw diagrams
- Fix to save code context to chat history in image, diagram output modes
- Fix bug in image markdown being wrapped twice in markdown syntax
- Render newline in code references shown on chat page of web app
Previously newlines weren't getting rendered. This made the code
executed by Khoj hard to read in references. This changes fixes that.
`dangerouslySetInnerHTML' usage is justified as rendered code
snippet is being sanitized by DOMPurify before rendering.