## Overview
- Gemma 2 is a new open model family by Google. They've released a 9B, 29B param model. A 2B model is also expected.
- It performs really well on the Chatbot arena and shows good performance when testing within Khoj as well.
- Llama.cpp support for Gemma 2 architecture seems to have stabilized
- If Gemma 2 performs well in further testing, it can be made the default offline chat model for Khoj
- Once the 2B param model is released, the model size to download can be automatically chosen based on (V)RAM available
## Major
- Support Gemma 2 for Offline Chat
- Improve and fix chat model prompts for better, consistent context
## Minor
- Fix and improve offline chat actor, director tests
- Improve offline chat truncation to consider chat message delimiter tokens
- Add day of week to system prompt of openai, anthropic, offline chat models
- Pass more context to offline chat system prompt to
- ask follow-up questions
- know where to find information about khoj (itself)
- Fix output mode selection prompt. Log error if model does not select
valid option from list of valid output modes provided
- Use consistent names for question, answers passed to
extract_questions_offline prompt
- Log which model extracts question, what the offline chat model sees
as context. Similar to debug log shown for openai models
- Pass system message as the first user chat message as Gemma 2
doesn't support system messages
- Use gemma-2 chat format
- Pass chat model name to generic, extract questions chat actors
Used to figure out chat template to use for model
For generic chat actor argument was anyway available but not being
passed, which is confusing
- Deprecate khoj-assistant pypi package. Use more accurate and
succinct pypi project name, khoj
- Update references to sye khoj pypi package in docs and code instead
of the legacy khoj-assistant pypi package
- Update pypi workflow to publish to both khoj, khoj-assistant for now
- Update stale python 3.9 support mentioned in our pyproject. Can't
support python 3.9 as depend on latest django which support >=3.10
- Major
- Ask for prompt in prose
- Remove seed from SD3 image generation to improve diversity of output
for a given prompt
Otherwise for conversations with similar sounding
prompts, the images would be almost exactly the same. This maybe
another indicator of SD3's inability to capture detailed
instructions
- Consistently use "prompt" wording instead of "query" in improved
image generation prompts.
Previously a mix of those terms were being used, which could confuse
the chat model
- Minor
- Add day of week to prompt
- Remove 2-5 sentence limit on instructions to SD3. It seems to be
able to follow longer instructions just with less fidelity than
DALLE. And the 2-5 sentence instruction limit wasn't being adhered to
- Improve ability to edit, improve the image based on follow-up
instructions by the user
- Align prompts for DALLE and SD3. Only difference is to wrap text to
be rendered in quotes for SD3. This improves it's ability to render
requested text. DALLE cannot render text as well or consistently
- Because we're using a FastAPI api framework with a Django ORM, we're running into some interesting conditions around connection pooling and clean-up. We're ending up with a large pile-up of open, stale connections to the DB recurringly when the server has been running for a while. To mitigate this problem, given starlette and django run in different python threads, add a middleware that will go and call the connection clean up method in each of the threads.
- Issue
The Khoj docker build would fail with `ImportError: libGL.so.1: cannot open shared object file: No such file or directory`. This was required by the Khoj RapidOCR python package dependency.
- Fix
A minimal set of system packages have been added to resolve this issue.
- Transcribe on holding Ctrl+s keyboard shortcut
- Transcribe on holding the transcribe button pressed via mouse too
- Make the transcribe button robust to inadvertent touches by using timeout
- Do not transcribe, trigger auto-send on silences. Silence detection
is super rudimentary, just blocks standard emanations by whisper
when no speech
### Fix
- Fix degrade in speed when indexing large files
- Resolve org-mode indexing bug by splitting current section only once by heading
- Improve summarization by fixing formatting of text in indexed files
### Improve
- Improve scaling user, admin flows to delete all entries for a user
- Split once by heading (=first_non_empty) to extract current section body
Otherwise child headings with same prefix as current heading will
cause the section split to go into infinite loop
- Also add check to prevent getting into recursive loop while trying
to split entry into sub sections
Adding files to the DB for summarization was slow, buggy in two ways:
- We were updating same text of modified files in DB = no of chunks
per file times
- The `" ".join(file_content)' code was breaking each character in the
file content by a space. This formats the original file content
incorrectly before storing in the DB
Because this code ran in the main file indexing path, it was slowing down
file indexing. Knowledge bases with larger files were impacted more strongly
- Delete entries by batch to improve efficiency of query at scale
- Share code to delete all user entries between it's async, sync methods
- Add indicator to show when files being deleted on web config page
- Add CSP to web config pages. Load phone no. validation js, css from S3
- Construct config page elements on Web via DOM scripting
- Disable CSP in Khoj Obsidian as it interferes with Obsidian functionality
- Other miscellaneous voice message level improvements (rate limit, listening animation)
The Khoj CSP interferes with other Obsidian features and plugins as
CSP is applied page wide.
For now chat message sanitization via Dompurify should suffice.
Enable CSP when can scope it to only the Khoj Obsidian plugin.
Maybe better to fallback to non-summarize behavior if summarize intent
is just inferred but we can't actually summarize because the single
file added to conversation isn't satisfied
- Simplify quick jump between Khoj side pane and main editor view using keyboard shortcuts
- Enable voice chat in Obsidian to make interactions with Khoj more seamless