- Building arm64 image on an ubuntu arm64 runner reduces `yarn build'
step time by 75% from 12mins to 3mins.
- This is because no QEMU emulation for arm64 on x86 is required now
- Parallelizing x64 and arm64 platform builds halves build time on top
- Revert to use standard ubuntu-latest runner as large x64 runner
doesn't give much more speed improvements
This results an effective additional 50%-66% reduction in build time
on top of #987.
So a full dockerize workflow run now takes *10 mins* vs previous 35+mins.
This is a total of *72% improvement* in max dockerize run time.
Get additional speed improvements when docker layer cache hit.
## Objective
Improve build speed and size of khoj docker images
## Changes
### Improve docker image build speeds
- Decouple web app and server build steps
- Build the web app and server in parallel
- Cache docker layers for reuse across dockerize github workflow runs
- Split Docker build layers for improved cacheability (e.g separate `yarn install` and `yarn build` steps)
### Reduce size of khoj docker images
- Use an up-to-date `.dockerignore` to exclude unnecessary directories
- Do not installing cuda python packages for cpu builds
### Improve web app builds
- Use consistent mechanism to get fonts for web app
- Make tailwind extensions production instead of dev dependencies
- Make next.js create production builds for the web app (via `NODE_ENV=production` env var)
The current fix should improve Khoj responses when charts in response
context. It truncates code context before sharing with response chat actors.
Previously Khoj would respond with it not being able to create chart
but than have a generated chart in it's response in default mode.
The truncate code context was added to research chat actor for
decision making but it wasn't added to conversation response
generation chat actors.
When khoj generated charts with code for its response, the images in
the context would exceed context window limits.
So the truncation logic to drop all past context, including chat
history, context gathered for current response.
This would result in chat response generator 'forgetting' all for the
current response when code generated images, charts in response context.
It needs to be used across routers and processors. It being in
run_code tool makes it hard to be used in other chat provider contexts
due to circular dependency issues created by
send_message_to_model_wrapper func
Previous changes to depend on just the PROMPTRACE_DIR env var instead
of KHOJ_DEBUG or verbosity flag was partial/incomplete.
This fix adds all the changes required to only depend on the
PROMPTRACE_DIR env var to enable/disable prompt tracing in Khoj.
Pass your domain cert files via the --sslcert, --sslkey cli args.
For example, to start khoj at https://example.com, you'd run command:
KHOJ_DOMAIN=example.com khoj --sslcert example.com.crt --sslkey
example.com.key --host example.com
This sets up ssl certs directly with khoj without requiring a
reverse proxy like nginx to serve khoj behind https endpoint for
simple setups. More complex setups should, of course, still use a
reverse proxy for efficient request processing
- Track, return cost and usage metrics in chat api response
Track input, output token usage and cost of interactions with
openai, anthropic and google chat models for each call to the khoj chat api
- Collect, display and store costs & accuracy of eval run currently in progress
This provides more insight into eval runs during execution
instead of having to wait until the eval run completes.
Collect, display and store running costs & accuracy of eval run.
This provides more insight into eval runs during execution instead of
having to wait until the eval run completes.