- Wasn't able to login to the admin panel when KHOJ_DEBUG was not True. Fix this error so self-hosted users can get unblocked from accessing the admin settings
- Don't force users to set their KHOJ_DJANGO_SECRET_KEY
- Use new style references for Khoj chat modal in Obsidian
- Khoj Chat responses in Obsidian had regressed to not show references
for new questions after modal has been opened. Now even those are
rendered, and use new references style
- Render chat response as markdown while it's being streamed
- Add transcription button with mic icon
- Collect audio recording on pressing mic
- Process and send audio recording to server for transcription
- Extract the functionality to flash status in chat input for reuse
- Allow server admin to configure offline speech to text model during
initialization
- Use offline speech to text model to transcribe audio from clients
- Set offline whisper as default speech to text model as no setup api key reqd
- Extract flashing status message in chat input placeholder into
reusable function
- Use emoji prefixes for status messages
- Improve alt text of transcribe button to indicate what the button does
- Conflicts:
- src/interface/desktop/chat.html
Combine and use common class names for speak component
- src/khoj/database/adapters/__init__.py
Combine imports
- src/khoj/interface/web/chat.html
Combine and use common class names for speak component
- src/khoj/routers/api.py
Combine imports
- Add a dependency on the indexer API endpoint that rounds up the amount of data indexed and uses that to determine whether the next set of data should be processed
- Delete any files that are being removed for adminstering the calculation
- Show current amount of data indexed in the config page
- Ignore errors in deleting network requests to khoj server
- Also delete open network connection to khoj server on auto reindex
Otherwise when server is unreachable a bunch of failed network
connections accrue in the processes list
- Make auto-update of content index user configurable from khoj.el
- Handle server unavailable error on auto-index schedule job in khoj.el
Resolves#567
- Append chat message to chat logs as TextNodes in web, desktop clients
- Simplify Code to Identify Files from Github, Notion on Web, Desktop Client
- Use file source to find entries from github, notion on web, desktop client
- Pass file source to clients via text search API response
- Make Django Logs Follow Khoj Log Format, Verbosity
- Handle image search setup related warning
- Format Django initializing outputs using Khoj logger format
- Use `KHOJ_HOST` env var to set allowed/trusted domains to host Khoj
Ideally should rename model_directory to config_directory or some such
but the current image search code will need to be migrated soon. So
changing the variable name and creating a migration script for old
khoj.yml files using model-directory variable isn't worth it
Remove the explicity set of number of threads to use by pytorch. Use
the default used by it.
- Collect STDOUT from the `migrate', `collectstatic' commands and
output using the Khoj logger format and verbosity settings
- Only show Django `collectstatic' command output in verbose mode
- Fix showing the Initializing Khoj log line by moving it after logger
level set
Previously it was only searching for PDF and Markdown files. This was
meant to show only content from current vault as results.
But it has not scaled well as other clients also allow syncing PDF and
markdown files now. So remove this content type filter for now.
A proper solution would limit by using file/dir filters on server or
client side.
- Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder
- Update associated file paths and system references
### Overview
The parent hierarchy of org-mode entries can store important context.
This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index
### Details
- Test search uses ancestor headings as context for improved results
- Add ancestor headings of each org-mode entry to their compiled form
- Track ancestor headings for each org-mode entry in org-node parser
Resolves#85
- Update docs to show how to use Khoj Cloud
- Move self-hosting Khoj to separate section
- Add page to setup Desktop app
- Set default URL to Khoj Cloud URL in Obsidian, Emacs clients
- Upgrade FastAPI to >= latest version. Required upgrade of FastAPI.
Earlier version didn't support wrapping common query params in class
- Use per fixture app instead of a global FastAPI app in conftest
- Upgrade minimum required Django version
- Fix no notes chat director test with updated no notes message
No notes message was updated in commit 118f1143
- Use the knowledgeGraph, answerBox, peopleAlsoAsk and organic responses of serper.dev to provide online context for queries made with the /online command
- Add it as an additional tool for doing Google searches
- Render the results appropriately in the chat web window
- Pass appropriate reference data down to the LLM
- Adds support for multiple users to be connected to the same Khoj instance using their Google login credentials
- Moves storage solution from in-memory json data to a Postgres db. This stores all relevant information, including accounts, embeddings, chat history, server side chat configuration
- Adds the concept of a Khoj server admin for configuring instance-wide settings regarding search model, and chat configuration
- Miscellaneous updates and fixes to the UX, including chat references, colors, and an updated config page
- Adds billing to allow users to subscribe to the cloud service easily
- Adds a separate GitHub action for building the dockerized production (tag `prod`) and dev (tag `dev`) images, separate from the image used for local building. The production image uses `gunicorn` with multiple workers to run the server.
- Updates all clients (Obsidian, Emacs, Desktop) to follow the client/server architecture. The server no longer reads from the file system at all; it only accepts data via the indexer API. In line with that, removes the functionality to configure org, markdown, plaintext, or other file-specific settings in the server. Only leaves GitHub and Notion for server-side configuration.
- Changes license to GNU AGPLv3
Resolves#467Resolves#488Resolves#303Resolves#345Resolves#195Resolves#280Resolves#461Closes#259Resolves#351Resolves#301Resolves#296
- Link to Django admin panel for user to create Chat Models on their
Khoj server
- This should only get hit when user is not using Khoj cloud, as Khoj
cloud would already have Chat models configured
- While sigmoid normalization isn't required for reranking.
Normalizing score to distance metrics for both encoder and cross
encoder scores is useful to reason about them
- Softmax wasn't required as don't need probabilities, sigmoid is good
enough to get distance metric
- Expose ability to modify search model via Django admin interface
- Previously the bi_encoder and cross_encoder models to use were set
in code
- Now it's user configurable but with a default config generated by
default
- During the migration, the confidence score stopped being used. It
was being passed down from API to some point and went unused
- Remove score thresholding for images as image search confidence
score different from text search model distance score
- Default score threshold of 0.15 is experimentally determined by
manually looking at search results vs distance for a few queries
- Use distance instead of confidence as metric for search result quality
Previously we'd moved text search to a distance metric from a
confidence score.
Now convert even cross encoder, image search scores to distance metric
for consistent results sorting
Remove the Results Count button from the web app. It's hanging weirdly
with not much context to its purpose.
Reintroduce it in the Search card when created under the Features section
Reduce user confusion by joining config update with index updation for
each content type.
So only a single click required to configure any content type instead
of two clicks on two separate pages
- Notes prompt doesn't need to be so tuned to question answering. User
could just want to talk about life. The notes need to be used to
response to those, not necessarily only retrieve answers from notes
- System and notes prompts were forcing asking follow-up questions a
little too much. Reduce strength of follow-up question asking
The Chat models sometime output reference notes directly in the chat
body in unformatted form, specifically as Notes:\n['. Prevent that.
Reference notes are shown in clean, formatted form anyway
- Make mutable syncing variable not a const
- Show next sync time to make users aware of data sync is automated
- Keep a single Save button to reduce confusion. It does what Save All
previously did. Intent to manual sync should Save All
- Default to using app.khoj.dev as default Khoj URL to ease setup