Style profile pircture button on nav menu
- Use primary colored ring around subscribed user profile on nav menu
- Use gray colored ring around non-subscribed user profile on nav menu
- Use upper case initial as profile pic for user with no profile pic
- Click anywhere on nav menu item to trigger action
Previously the actual clickable area was smaller than the width of
the nav menu item
- Move the nav menu into the chat history side panel component, so that they both show up on one line
- Update all pages to use it with the new formatting
- in mobile, present the sidebar button, home button, and profile button evenly centered in the middle
- Pass userConfig from Home as prop to chatBodyData component with
loading state
- Pass loading state of userConfig to allow components to handle
rendering dependent elements once it is loaded
Use updated format for HTTP streamed responses from the Khoj server in the new chat UX
Remove references to the websocket connected field, as websocket use has been deprecated
Otherwise the Khoj's chat response is filling up in between the
streamed message and already rendered references section at the bottom
of the message
Define OnlineContext type to simplify typing online context param
across other interfaces and functions
- There were some state mismatches in configuring a whatsapp number. This commit fixes those issues and uses an external library for phone number validation
- Note that the SSR for next doesn't support rendering on the client-side, so it'll only update it one big chunk
- Fix unique key error in the chatmessage history for incoming messages
- Remove websocket value usage in the chat history side panel
- Remove other websocket code from the chat page
- Why
Profile section and billing section looked too empty (1 card each).
Combining them makes the setting page look more complete. Shows
subscription options early on
- Details
- Made Futurist text orange
- Made Unsubscribe a down button instead of cloud slash
- Updated toast title to subscription
- Improve Futurist trial title and description
- Remove now unnecessary button to Save in Card with dropdown
- Use toast to show success, failure (not working)
- Rename language to search, Move it to features section. Add icon to
the card
- Update references in new and old web client settings
- Arrange new client settings props and add header comments similar to
- config response for code readability
- Add a lot more suggestions cards, improve mobile rendering of suggestion cards, improve alignment of chat input, shift message when starts recording voice, remove dead code
* Converted navigation menu into a dropdown menu
* Moved collapsed side panel menu icons into top row
* Auto refresh when conversation is deleted to update side panel and route back to main page if deletion is on current conversation
* Highlight the current conversation in the side panel
* Dynamic homepage messages with current day and time of day.
* `colorutils` upgraded to have more expansive tailwind color options and dynamic class name generation.
* Converted create agent button alert into shadcn `ToolTip`
* Colored lines and icons for agents in chat window
* Cleaned up border styling in dark mode
* fixed three dot menu in side panel to be more easier to click
* Add the KhojLogo import in the nav menu and use a default user profile icon when not authenticated
* Get rid of custom --box-shadow CSS variable
* Pass the agent metadat through the chat body data in order to style the send button
* Add login to the unauthenticated login view, redirecto to home if conversation history not loaded
* Set a max height for the input text area
* Simplify tailwind class names
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
## Major: Breaking Changes
- Move API endpoints under /configure/<type>/model to /api/model/<type>
- Move API endpoints under /api/configure/content/ to /api/content/
- Accept file deletion requests by clients during sync
- Split /api/v1/index/update into /api/content PUT, PATCH API endpoints
## Minor: Create New API Endpoint
- Create API endpoints to get user content configurations
Related: #852
Changes for new agents page
- Modernized agent cards
- Responsive design to support mobile users
- Button for users to create their own agents (coming soon)
- Optimized to use tailwind and icon utils
- Side panel added for quick access to conversations
Previous logic was more brittle to break with simple unbalanced
'{' or '}' string present in the event data. This method of trying to
identify valid json obj was fairly brittle. It only allowed json
objects or processed event as raw strings.
Now we buffer chunk until we see our unicode magic delimiter and only
then process it.
This is much less likely to break based on event data and the
delimiter is more tunable if we want to reduce rendering breakage
likelihood further
- Add support for text to speech, speech to text. Add loading and responsive indicators to reflect state.
- When streaming for speech to text, show incremental transcription in the message input field
- When streaming text to speech, and a pause button in the chat message to allow user to stop playback
* V1 of the new automations page
Implemented:
- Shareable
- Editable
- Suggested Cards
- Create new cards
- added side panel new conversation button
- Implement mobile-friendly view for homepage
- Fix issue of new conversations being created when selected agent is changed
- Improve center of the homepage experience
- Fix showing agent during first chat experience
- dark mode gradient updates
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
- Deduplicate code to collect chat telemetry by relying on
end_llm_response event
- Log time to first token and total chat response time for latency
analysis of Khoj as an agent. Not just the latency of the LLM
- Remove duplicate timer in the image generation path
Do not need response generator to stuff compiled references in chat
stream using "### compiled references:" separator.
References are now sent to clients as structured json while streaming
## Overview
- Gemma 2 is a new open model family by Google. They've released a 9B, 29B param model. A 2B model is also expected.
- It performs really well on the Chatbot arena and shows good performance when testing within Khoj as well.
- Llama.cpp support for Gemma 2 architecture seems to have stabilized
- If Gemma 2 performs well in further testing, it can be made the default offline chat model for Khoj
- Once the 2B param model is released, the model size to download can be automatically chosen based on (V)RAM available
## Major
- Support Gemma 2 for Offline Chat
- Improve and fix chat model prompts for better, consistent context
## Minor
- Fix and improve offline chat actor, director tests
- Improve offline chat truncation to consider chat message delimiter tokens
Previously loading animation would be at top of message. Moving it to
bottom is more intuitve and easier to track.
Remove white-space: pre from list elements. It was adding too much y
axis padding to chat messages (and train of thought)
- Details
Only return notes refs, online refs, inferred queries and generated
response in non-streaming mode. Do not return train of throught and
other status messages
Incorporate missing logic from old chat API router into new one.
- Motivation
So we can halve chat API code by getting rid of the duplicate logic
for the websocket router
The deduplicated code:
- Avoids inadvertant logic drift between the 2 routers
- Improves dev velocity
- Overview
Use simpler HTTP Streaming Response to send status messages, alongside
response and references from server to clients via API.
Update web client to use the streamed response to show train of thought,
stream response and render references.
- Motivation
This should allow other Khoj clients to pass auth headers and recieve
Khoj's train of thought messages from server over simple HTTP
streaming API.
It'll also eventually deduplicate chat logic across /websocket and
/chat API endpoints and help maintainability and dev velocity
- Details
- Pass references as a separate streaming message type for simpler
parsing. Remove passing "### compiled references" altogether once
the original /api/chat API is deprecated/merged with the new one
and clients have been updated to consume the references using this
new mechanism
- Save message to conversation even if client disconnects. This is
done by not breaking out of the async iterator that is sending the
llm response. As the save conversation is called at the end of the
iteration
- Handle parsing chunked json responses as a valid json on client.
This requires additional logic on client side but makes the client
more robust to server chunking json response such that each chunk
isn't itself necessarily a valid json.
- Convert functions in SSE API path into async generators using yields
- Validate image generation, online, notes lookup and general paths of
chat request are handled fine by the web client and server API
* Update the automations UI to be a more suitable color distribution based on new designs
* Use accented colors for the metadata, update dark mode colors
* Update form to use icons as well and render more pretty inline form labels
Pull out /api/configure/phone API endpoints into /api/phone for
more concise and sufficiently explanatory API path
Refactor Flow
1. Rename /api/configure/phone -> /api/phone
Now the API to configure all the AI models is under /api/models.
This provides better organization and API hierarchy. The /configure
url segment was redundant.
- Rename POST /api/phone to PATCH /api/phone
- Rename GET /api/configure to GET /api/settings
Refactor Flow
1. Move out POST /user/name to main api.py
2. Rename /api/configure/<type>/model -> /api/model/<type>
3. Rename @api_configure to @api_mode
4. Rename file api_config.py to api_model.py
Pull out /api/configure/content API endpoints into /api/content to
allow for more logical organization of API path hierarchy
This should make the url more succinct and API request intent more
understandable by using existing HTTP method semantics along with the
path.
The /configure URL path segment was either
- redundant (e.g POST /configure/notion) or
- incorrect (e.g GET /configure/files)
Some example of naming improvements:
- GET /configure/types -> GET /content/types
- GET /configure/files -> GET /content/files
- DELETE /configure/files -> DELETE /content/files
This should also align, merge better the the content indexing API
triggered via PUT, PATCH /content
Refactor Flow
1. Rename /api/configure/types -> /api/content/types
2. Rename /api/configure -> /api
3. Move /api/content to api_content from under api_config
- Remove unused full_corpus boolean. The full_corpus=False code path
wasn't being used (accept for in a test)
- The full_corpus=True code path used was ignoring file deletion
requests sent by clients during sync. Unclear why this was done
- Added unit test to prevent regression and show file deletion by
clients during sync not ignored now
- This utilizes PUT, PATCH HTTP method semantics to remove need for
the "regenerate" query param and "/update" url suffix
- This should make the url more succinct and API request intent more
understandable by using existing HTTP method semantics
- Add day of week to system prompt of openai, anthropic, offline chat models
- Pass more context to offline chat system prompt to
- ask follow-up questions
- know where to find information about khoj (itself)
- Fix output mode selection prompt. Log error if model does not select
valid option from list of valid output modes provided
- Use consistent names for question, answers passed to
extract_questions_offline prompt
- Log which model extracts question, what the offline chat model sees
as context. Similar to debug log shown for openai models
- Pass system message as the first user chat message as Gemma 2
doesn't support system messages
- Use gemma-2 chat format
- Pass chat model name to generic, extract questions chat actors
Used to figure out chat template to use for model
For generic chat actor argument was anyway available but not being
passed, which is confusing
- Update references to the settings page to use new url across docs
and code
- Rename desktop and web settings page to settigns.html instead of
config[ure].html
- Use name, id for every [search|chat|voice|pain]_model_option
- Rename current_model_state field to more intuitive enabled_content_source
- Update references to the update fields in config.html
- Deprecate khoj-assistant pypi package. Use more accurate and
succinct pypi project name, khoj
- Update references to sye khoj pypi package in docs and code instead
of the legacy khoj-assistant pypi package
- Update pypi workflow to publish to both khoj, khoj-assistant for now
- Update stale python 3.9 support mentioned in our pyproject. Can't
support python 3.9 as depend on latest django which support >=3.10
- Put logic to get config data, detailed or basic into router helpers module
- Use the get config func across the config pages on web clients
- Put configure content and get_notion_auth_url funcs in router helper
module to avoid circular import
Migrates the Automations page to React, mostly keeping the overall design consistent with organization. Use component library, with some changes in color. Add easier management with straightforward form and editing experience.
Use system preference for determining dark mode if not explicitly set.
- Updated references panel
- Use subtle coloring for chat cards
- Chat streaming with train of thought
- Side panel with limited sessions, expandable
- Manage conversation file filters easily from the side panel
- Updated nav menu, easily go to agents/automations/profile
- Upload data from the chat UI (on click attachment icon)
- Slash command pop-up menu, scrollable and selectable
- Dark mode-enabled
- Mostly mobile friendly
- Pass Loading message, class name via props to both inline and normal
loading spinners
- Pass loading conversation message to loading spinner when chat
history is being fetched
- Create profile card componennt. Use it for agent profile card
- Pass agent persona from khoj server via API
- Put link to agent profile page in the hover card to make it 2 clicks
away. Othewise inadvertent clicks on agent in chat view lead away to
agent page
- Use tailwind line-clamp extension to clamp card to first two lines
- Reuse class name when get slash command icons
- Previous chat input styling didn't have the cursor centered in the
chat input text area. But it did allow seeing multi line chat inputs
for context
- Major
- Ask for prompt in prose
- Remove seed from SD3 image generation to improve diversity of output
for a given prompt
Otherwise for conversations with similar sounding
prompts, the images would be almost exactly the same. This maybe
another indicator of SD3's inability to capture detailed
instructions
- Consistently use "prompt" wording instead of "query" in improved
image generation prompts.
Previously a mix of those terms were being used, which could confuse
the chat model
- Minor
- Add day of week to prompt
- Remove 2-5 sentence limit on instructions to SD3. It seems to be
able to follow longer instructions just with less fidelity than
DALLE. And the 2-5 sentence instruction limit wasn't being adhered to
- Improve ability to edit, improve the image based on follow-up
instructions by the user
- Align prompts for DALLE and SD3. Only difference is to wrap text to
be rendered in quotes for SD3. This improves it's ability to render
requested text. DALLE cannot render text as well or consistently
- Because we're using a FastAPI api framework with a Django ORM, we're running into some interesting conditions around connection pooling and clean-up. We're ending up with a large pile-up of open, stale connections to the DB recurringly when the server has been running for a while. To mitigate this problem, given starlette and django run in different python threads, add a middleware that will go and call the connection clean up method in each of the threads.
- Transcribe on holding Ctrl+s keyboard shortcut
- Transcribe on holding the transcribe button pressed via mouse too
- Make the transcribe button robust to inadvertent touches by using timeout
- Do not transcribe, trigger auto-send on silences. Silence detection
is super rudimentary, just blocks standard emanations by whisper
when no speech
### Fix
- Fix degrade in speed when indexing large files
- Resolve org-mode indexing bug by splitting current section only once by heading
- Improve summarization by fixing formatting of text in indexed files
### Improve
- Improve scaling user, admin flows to delete all entries for a user
- Split once by heading (=first_non_empty) to extract current section body
Otherwise child headings with same prefix as current heading will
cause the section split to go into infinite loop
- Also add check to prevent getting into recursive loop while trying
to split entry into sub sections
Adding files to the DB for summarization was slow, buggy in two ways:
- We were updating same text of modified files in DB = no of chunks
per file times
- The `" ".join(file_content)' code was breaking each character in the
file content by a space. This formats the original file content
incorrectly before storing in the DB
Because this code ran in the main file indexing path, it was slowing down
file indexing. Knowledge bases with larger files were impacted more strongly
- Delete entries by batch to improve efficiency of query at scale
- Share code to delete all user entries between it's async, sync methods
- Add indicator to show when files being deleted on web config page
The Khoj CSP interferes with other Obsidian features and plugins as
CSP is applied page wide.
For now chat message sanitization via Dompurify should suffice.
Enable CSP when can scope it to only the Khoj Obsidian plugin.
Maybe better to fallback to non-summarize behavior if summarize intent
is just inferred but we can't actually summarize because the single
file added to conversation isn't satisfied
- Simplify quick jump between Khoj side pane and main editor view using keyboard shortcuts
- Enable voice chat in Obsidian to make interactions with Khoj more seamless
- Overview
Khoj wil be able to do online search out of the box, even for self-hosted users
- Default to Jina search, reader API when no Serper.dev, Olostep API keys
- Run online searches in parallel to process multiple queries faster
- Details
- Jina provides a [reader API](https://github.com/jina-ai/reader) for online search and web page reading
It requires no API key. This provides a good default to enable
online search for self-hosted readers requiring no additional setup.
- Jina search API also returns webpage contents with the results, so
just use those directly when Jina Search API used instead of
trying to read webpages separately. The extract relevant content from
webpage step using a chat model is still used from the
`read_webpage_and_extract_content' func in this case.
- Parse search results from Jina search API into same format as
Serper.dev for accurate rendering of online references by clients
- Run online searches in parallel with AsyncIO to process multiple queries faster
- Support Stable Diffusion 3 via API
Server Admin needs to setup model similar to DALLE-3 via Django Admin Panel
- Use shorter prompt generator to prompt SD3 to create better images
- Allow users to set paint model to use from web client config page
Jina AI provides a search and webpage reader API that doesn't require
an API key. This provides a good default to enable online search for
self-hosted readers requiring no additional setup.
Jina search API also returns webpage contents with the results, so
just use those directly when Jina Search API used instead of
trying to read webpages separately. The extract relvant content from
webpage step using a chat model is still used from the
`read_webpage_and_extract_content' func in this case.
Parse search results from Jina search API into same format as
Serper.dev for accurate rendering of online references by clients
- Added support for uploading .jpeg, .jpg, and .png files to Khoj from Web, Desktop app
- Updating indexer to generate raw text and entries using RapidOCR
- Details
* added support for indexing images via ocr
* fixed pyproject.toml
* Update src/khoj/processor/content/images/image_to_entries.py
Co-authored-by: Debanjum <debanjum@gmail.com>
* Update src/khoj/processor/content/images/image_to_entries.py
Co-authored-by: Debanjum <debanjum@gmail.com>
* removed redudant try except blocks
* updated desktop js file to support image formats
* added tests for jpg and png
* Fix processing for image to entries files
* Update unit tests with working image indexer
* Change png test from version verificaition to open-cv verification
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Co-authored-by: sabaimran <narmiabas@gmail.com>
This should improve fluidity of keyboard interactions with Khoj on
Obsidian.
Open Khoj chat view via keybinding or command pallete and ask
question using only the keyboard, with no mouse clicks required
- Automatically carry out voice chats with Khoj from within Obsidian
When send voice message, Khoj will auto respond with voice as well
- Listen to past Khoj messages as speech
- Add circular loading spinner to use while message is being converted
to speech
* Add a leader election mechanism to circumvent runtime issues for multiple schedulers
- Reduce the load on the DB and risk of issues on the service side by limiting the execution environment to one elected leader at a given time. This one is responsible for managing all of the execution of the jobs, though all workers are capable of adding and removing jobs
* Set a max duration for the schedule leader position (12 hrs), add some error if automation not added successfully
* rough sketch of desktop shortcuts. many bugs to fix still
* working MVP of desktop shortcut khoj
* UI fixes
* UI improvements for editable shortcut message
* major rendering fix to prevent clipboard text from getting lost
* UI improvements and bug fixes
* UI upgrades: custom top bar, edit sent message and color matching
* removed debug javascript file
* font reverted to Noto Sans
* cleaning up the code and removing diffs
* UX fixes
* cleaning up unused methods from html
* front end for button to send user back to main window to continue conversation
* UX fix for window and continue conversation support added
* migrated common js functions into chatutils.js
* Fix window closing issue in macos by
1. Use a helper function to determine if the window is open by seeing if there's a browser window with shortcut.html loaded
2. Use the event listener on the window to handle teardown
* removed extra comment and renamed continue convo button
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
- Add an experimental feature used for fact-checking falsifiable statements with customizable models. See attached screenshot for example. Once you input a statement that needs to be fact-checked, Khoj goes on a research spree to verify or refute it.
- Integrate frontend libraries for [Tailwind](https://tailwindcss.com/) and [ShadCN](https://ui.shadcn.com/) for easier UI development. Update corresponding styling for some existing UI components.
- Add component for model selection
- Add backend support for sharing arbitrary packets of data that will be consumed by specific front-end views in shareable scenarios
Initialize our migration to use Next.js for front-end views via Agents. This includes setup for getting authenticated users, reading in available agents, setting up a pop-up modal when you're clicking on an agent, and allowing users to start new conversations with agents.
Best attempt at an in-place migration, though there are some noticeable differences.
Also adds view for chat that are not being used, but in experimental phase.
Previously the text_to_image helper would only trigger the image
generation flow if OpenAI client was setup. This is not required
anymore as offline chat model + sd3 API works. So remove that check
Khoj will find and display notes similar to the current entry in the side pane when
1. find similar is open in side pane and
2. cursor has moved to a new entry
### Major
- Find similar notes to current note at cursor automatically in background
- Only show headings of search result and increase default results count
### Minor
- Pass absolute path of file to index from khoj.el emacs client
- Update help message to only show the smaller set of new keybindings
- Fix edge cases in loading some chat sessions
To improve the developer experience for front-end development, we're migrating to Next.js. In order to do this migration page-by-page, we're using static site generation via Next.js. This also helps us avoid making cross site requests from front-end to back-end for the time being, while giving a ramp to separating out server and client if needed for scale down the road.
Dev instructions for using the next.js setup are in the added README.
This adds scaffolding for including the built files in the python package as well as the docker images. Docker setup has been tested locally. In order to verify the build is working as expected, we can navigate to the {khoj_host}:42110/experimental and verify that the experiment page comes up.
This setup works with serving static files included in the src/interface/web folder from the Django app. The key bit for understanding the setup is in the yarn export command in package.json.
* Enable speech to text responses in khoj chat
- Current issue: reads out all the markdown formatting, plus waits for the whole result to be streamed before playing it
* Extract content from markdown-formatted text
* Add a loader for while you're waiting for Khoj's response
* Add user configuration option for chat model options, allow server side configuration for option list
* Join up APIs, views, admin pages to allow configuring custom voice models
When create new conversation session, automatically request query. As
that is expected next action after creating new session
Pass session-id to khoj-chat to allow reuse from
create-new-conversation func
When delete conversation session, do not call load chat session.
Unnecessary action.
Use thread-last to improve code flow in new, delete conversation funcs
* Add magic link email sign-in option
* Adding backend routes and model changes to keep state of email verification code and status
* Test and fix end to end email verification flow
* Add documentation for how to use the magic link sign-in when self-hosting Khoj
* Add magic link sign in to public conversation page
Previously the cursor would move to the Khoj side pane on opening it.
This would break user's flow, especially when find similar triggers
automatically
New behavior maintains smoother update of auto find similar without
disrupting user browsing
Previously it would show complete result body this would make the
result width variable and hard to track all the returned results
Showing just heading makes it easier to track
- Call find similar on current element if point has moved to new
element
- Delete the first result from find-similar search results as that'll
be the current note (which is trivially most similar to itself)
- Determine find-similar based text formating at the rendering layer
rather than at the top level find-similar func
* Uses entire file text and summarizer model to generate document summary.
* Uses the contents of the user's query to create a tailored summary.
* Integrates with File Filters #788 for a better UX.
To add multiple allowed Khoj domains pass them as a comma separated
list of domains via the KHOJ_DOMAIN environment variable
Resolve comment in issue #662
Given img src enforcement via CSP required loosening. Soft enforce it
via a regex replace of img HTML elements if the src isn't from the
whitelisted set of source prefixes.
Currently allowed source prefixes are
- app: for local images
- data: for inline generated images
- https://generated.khoj.dev: for cloud generated images
- Create and use a function to convert markdown to sanitized html
- Remove unused Latex delimiter handling as Katex isn't used in
Khoj chat on Obsidian
- Open Khoj in Emacs Side pane
Open Khoj chat, search in right pane to allow for ambient engagement
- Improve Khoj Chat
- Show online references used for chat
- Make chat API call async to not block user interactions
- Fix loading chat history, references in khoj.el chat buffer
- Improve Khoj Search, Find Similar functions
- Make calls to Khoj search API async to not block user interactions
- Support Conversation Sessions
- Create transient menu to open, create, delete conversation sessions from the Khoj Emacs client
- C-x o to switch to search org content conflicts with switch buffer shortkey
This is more apparent in the async search scenario as it prevents
perform other actions while async search is in progress
- Also switching content type wouldn't scale to all the content types
Khoj will support without causing more conflicting keybinding
Khoj side pane occupies a vertically split bottom right side pane.
If the bottom right window is not a vertical split, create a new
vertical split pane for khoj, otherwise reuse the existing window
- Fix getting file filters for not found conversations
- Allow iamge rendering in automation emails
- Fix nearest 15th minute calculation in automations creation
* added support for uploading multiple files at a time.
* optimized multiple file upload to use a batch upload
* allowing files to upload even if there is one unsupported file
See the currently active window in context while doing chat, search
or find similar operations in a side pane.
This is similar to how we've moved Khoj on Obsidian into the side pane
as well
# Major
- Disambiguate Text output mode to disambiguate from Default data source lookup
- Fix showing headings in intermediate step in generating chat response
- Remove "Path" prefix from org ancestor heading in compiled entry
# Minor
- Fix OpenAI chat actor, director unit tests
* Add language-specific syntax highlighting via highlight.js
- Add highlight.js to our assets CDN for fast load and compliance with the CSP
- See other stylesheets options here: https://cdnjs.com/libraries/highlight.js
* Bonus: set min-height to prevent increasing length of the sessions pane
* Fix references rendering and add highlight.js in public conversation
* Fix multilingual font rendering; fallback to an Arabic language font which contains more Asian characters. Close#756
* Tune font-sizes and styling to accomodate new fonts with old sizing
- Move connection-status styling out from inline html into css block
- Remove start typing chat-input height jitter
- align new-conversation button, text
- use relative font sizes instead of absolute font sizes in most places
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
* UI update for file filtered conversations
* Interactive file menu #UI to add/remove files on each conversation as references.
* Backend changes implemented to load selected file filters from a conversation into the querying process.
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
Previously if default output was selected by Khoj, we'd end up doing
an documents search as well, even when Khoj selected internet or
general data source to lookup.
This update disambiguates the default information mode from the text
output mode. To avoid doing documents search when not deemed necessary
by Khoj
Prevent XSS attacks by enforcing Content-Security-Policy (CSP) in apps.
Do not allow loading images, other assets from untrusted domains.
- Only allow loading assets from trusted domains
like 'self', khoj.dev, ipapi for geolocation, google (fonts, img)
- images from khoj domain, google (for profile pic)
- assets from khoj domain
- Do not allow iframe src
- Allow unsafe-inline script and styles for now as markdown-it escapes html
in user, khoj chat
- Add hostURL to CSP of the Desktop, Obsidian apps
Given web client is served by khoj server, it doesn't need to
explicitly allow for khoj.dev domain. So if user self-hosting, it'll
automatically allow the domain in the CSP (via 'self')
Whereas the Obsidian, Desktop clients allow configure the server URL.
Note *switching server URL breaks CSP until app is reloaded*
* The command menu (triggered by "/") now has a clickable list of possible commands, that automatically fill into the chat when pressed.
* The `/help` command now searches `khoj.dev` pages to provide useful assistance to the user.
---------
Co-authored-by: raghavt3 <raghavt3@illinois.edu>
Co-authored-by: sabaimran <65192171+sabaimran@users.noreply.github.com>
### Details
- **Chat with Khoj from right pane on Obsidian**
- Modal was too ephemeral, couldn't have it open for reference, quick jump to Khoj chat
- **Stream intermediate steps taken by Khoj** for generating response to the chat pane
Gives more transparency into Khoj 'thinking' process, e.g internet, notes searches performed, documents read etc.
The feedback allows us to tune our messages to elicit better responses by Khoj
- Add ability to **copy message to clipboard, paste chat messages directly into current file**
- Jump to **Search**, **Find Similar** functions from navigation bar on the Khoj Obsidian side pane
- Improve spacing, use consistent colors in chat message references and buttons
Resolves#789, #754
* Updating the API / UI to support sharing of automations
* Allow people to see the automations even when not logged in, and add an overlay effect
* Handle unauthenticated users taking actions
* Support showing pre-filled automation details on the config automations page
* Redirect user to login if they try to add an automation while unauthenticated
- Dedupe the code to add action buttons to chat messages
- Update the renderIncrementalMessage function to also add the action
buttons to newly generated chat messages by Khoj
* Disable automation recurrence at minute level frequency
* Set a max lifetime for django's connections to the db
* Disable any automation that has a non-numeric first digit (i.e., recuring on the minute level)
* Re-enable automations
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
Previously clicking inline links would open the URL directly in the
Desktop app. This was strange and it didn't provide any way to go back
to Khoj desktop app UI from the opened link
- Don't propagate max_tokens to the openai chat completion method. the max for the newer models is fixed at 4096 max output. The token limit is just used for input
* Add support for chatting with Anthropic's suite of models
- Had to use a custom class because there was enough nuance with how the anthropic SDK works that it would be better to simply separate out the logic. The extract questions flow needed modification of the system prompt in order to work as intended with the haiku model
- Pass file path of reference along with the compiled reference in
list of references returned by chat API converts
- Update the structure of references from list of strings to list of
dictionary (containing 'compiled' and 'file' keys)
- Pull out the compiled reference from the new references data struct
wherever it was is being used
Simplify, reuse, standardize code to render messages with references
in the obsidian, web and desktop clients. Specifically:
- Reuse function to create reference section, dedupe code
- Create reusable function to generate image markdown
- Simplify logic to render message with references
- Setup websocket using Khoj web app as reference.
- Moved the geolocating code to chat view out from the general pane
view
- Use loading spinner from web instead of the thinking emoji
It'll replace any highlighted text with the chat message or if not
text is highlighted, it'll insert the chat message at the last cursor
position in the active file
- Jump to chat, show similar actions from nav menu of Khoj side pane
- Add chat, search icons from web, desktop app
- Use lucide icon for find similar (for now)
- Match proportions of find similar icon to khoj other icons via css, js
- Use KhojPaneView abstract class to allow reuse of common functionality like
- Creating the nav bar header in side pane views
- Loading geo-location data for chat context
This should make creating new views easier
* Update suggested automations
* add a schedule picker when creating an automation
* Create a new conversation in flow of the automation scheduling in order to send a preview and deliver more consistent results
* Start adding in scaffolding to manually trigger a test job for an automation
* Add support for manually triggering automations for testing
* Schedule automation asynchronously
* Update styling of the preview button
* Improve admin lookup experience and prevent jobs from being scheduled to run every minute of everyday
* Ignore mypy issues on job info short description
### Description and Rationale for Changes
This feature includes thumbs up and thumbs down buttons on Khoj's chat responses that provide automated feedback. When a thumbs up/down button is clicked, the code sends an email to team@khoj.dev with the following:
* user query
* khoj's response
* whether the sentiment of the user was good or bad.
This is critical in improving Khoj's nondeterministic LLM model for a better user experience.
### List of Changes
* new endpoint in `api_chat.py` (/feedback) that can be used to trigger mail sending).
* thumbs up and thumbs down buttons implemented in `chat.html`
* new function in `routers/email.py` to handle feedback email sending via resend
* `feedback.html` template for a formatted email with the feedback.
---------
Co-authored-by: mythicalcow <mythicalcow@linux.myguest.virtualbox.org>
Co-authored-by: sabaimran <narmiabas@gmail.com>
* Improve the automations UX
- Add suggested jobs to elimiinate some of the cold start problem
- Make each of the tasks cards that are clickable/editable
* Hide suggested automations that have already been added
* Add a footer and reapply styling when a save action is taken on a card
- Allows having it open on the side as you traverse your Obsidian notes
- Allow faster time to response, having responses visible for context
- Enables ambient interactions
* Make conversations optionally shareable
- Shared conversations are viewable by anyone, without a login wall
- Can share a conversation from the three dot menu
- Add a new model for Public Conversation
- The rationale for a separate model is that public and private conversations have different assumptions. Separating them reduces some of the code specificity on our server-side code and allows us for easier interpretation and stricter security. Separating the data model makes it harder to accidentally view something that was meant to be private
- Add a new, read-only view for public conversations
- Add parsing logic for LaTeX-format math equations in the web chat
- Add placeholder delimiters when converting the markdown to HTML in order to avoid removing the escaped characters
- Add the `<!DOCTYPE html>` specification to the page
- Pass user and calling_url to the scheduled chat too when modifying
params of automation
- Update to use user timezone even when update job via API
- Move timezone string to timezone object calculation into the
schedule automation method
- Previously it was a section in the settings page. Move it to
independent, top-level page to improve visibility of feature
- Calculate crontime from natural language on web client before
sending it to to server for saving new/updated schedule to disk.
- Avoids round-trip of call to chat model
- Convert POST /api/automation API endpoint into a direct request for
automation with query_to_run, subject and schedule provided via the
automation page. This allows more granular control to create automation
- Make the POST automations endpoint more robust; runs validation
checks, normalizes parameters
- Make the config page content use the same top level 3-column layout
as the khoj-header-wrapper
This ensures the content is aligned with heading pane width
- Let cards and other settings sections scale to the width of their
grid element. This utilizes more of the screen space and does it
consistently across the different settings pages
- Create new POST API endpoint to create automations
- Use it in the settings page on the web interface to create
new automations
This simplified managing automations from the setting page by allowing
both delete and create from the same page
- Render crontime string in natural language in message & settings UI
- Show more fields in tasks web config UI
- Add link to the tasks settings page in task scheduled chat response
- Improve task variables names
Rename executing_query to query_to_run. scheduling_query to
scheduling_request
- Make timezone aware scheduling programmatic, instead of asking the
chat model to do the conversion. This removes the need for
scratchpad and may let smaller models handle the task as well
- Make chat model infer subject for email. This should make the
notification email more readable
- Improve email by using subject in email subject, task heading. Move
query to email final paragraph, which is where task metadata should
go
- Using inferred_query directly was brittle (like previous job id)
- And process lock id had a limited size, so wouldn't work for larger
inferred query strings
- Pass timezone string from ipapi to khoj via clients
- Pass this data from web, desktop and obsidian clients to server
- Use user tz to render next run time of scheduled task in user tz
This takes the load of the task scheduling chat actor / prompt from
having to artifically differentiate query to create scheduled task
from a scheduled task run.
- The task scheduling actor was having trouble calculating the
timezone. Giving the actor a scratchpad to improve correctness by
thinking step by step
- Add more examples to reduce chances of the inferred query looping to
create another reminder instead of running the query and sharing
results with user
- Improve task scheduling chat actor test with more tests and
by ensuring unexpected words not present in response
There's a difference between running a scheduled task and notifying
the user about the results of running the scheduled task.
Decide to notify the user only when the results of running the
scheduled task satisfy the user's requirements.
Use sync version of send_message_to_model_wrapper for scheduled tasks
- Store scheduled job state in Postgres so job schedules persist
across app restarts
- Use Process Locks to only allow single worker to process a given job
type. This prevents duplicating job runs across all workers
- Detect when user intends to schedule a task, aka reminder
Add new output mode: reminder. Add example of selecting the reminder
output mode
- Extract schedule time (as cron timestring) and inferred query to run
from user message
- Use APScheduler to call chat with inferred query at scheduled time
- Handle reminder scheduling from both websocket and http chat requests
- Support constructing scheduled task using chat history as context
Pass chat history to scheduled query generator for improved context
for scheduled task generation
Previously the make delete API response failed, after deleting token.
Required a page refresh to see that the API token was actually gone.
This was happening because the response type of the delete token API
endpoint isn't a string, so it failed FastAPI response validation
checks.
- Allow self-hosted users to customize their open ai base url. This allows you to easily use a proxy service and extend support for other models.
- This also includes a migration that associates any existing openai chat model configuration with an openai processor configuration
- Make changing model a paid/subscriber feature
- Removes usage of langchain's OpenAI wrapper for better control over parsing input/output
- Allow passing completion args through completion_with_backoff
- Pass model_kwargs in a separate arg to simplify this
- Pass model in `model_name' kwarg from the send_message_to_model func
`model_name' kwarg is used by langchain, not `model' kwarg
- Make valid file extension checking case insensitive on Desktop app
- Skip indexing non-existent folders on Desktop app
- Pass auth headers to fix lazy load of chat messages on Desktop app
- Set chat-message height to height of content in web, desktop
Previous cross-encoder model was a few years old, newer models should
have improved in quality. Model size increases by 50% compared to
previous for better performance, at least on benchmarks
Most newer, better embeddings models add a query, docs prefix when
encoding. Previously Khoj admins couldn't configure these, so it
wasn't possible to use these newer models.
This change allows configuring the kwargs passed to the query, docs
encoders by updating the search config in the database.
Improve tool, online search, webpage links, docs search chat actor
prompts. Ensure works with hermes-2-pro and llama-3.
Be more specific about generating JSON and not saying anything else.
- Improve extract question prompts to explicitly request JSON list
- Use llama-3 chat format if HF repo_id mentions llama-3. The
llama-cpp-python logic for detecting when to use llama-3 chat format
isn't robust enough currently
* Changed the styling of the link that takes a user to the settings page into a button
* added an indicator that shows if a user is connected to the server or not
* made a class name more descriptive and also made the text in first run message more intuitive
* changed the command to install dependencies in the README.md
* changed the class name of the first run message text to be more descriptive
* added icons in the desktop UI that shows if a file is synced successfully or not
* made the link class name in the homepage more descriptive
* fixed the hover issue on status box in the chat header pane
* fixed hovering issue on status box on macOS
- User configured max tokens limits weren't being passed to
`send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
on machine
- Use min of app configured max tokens, vram based max tokens and
model context window
- User configured max tokens limits weren't being passed to
`send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
on machine
- Use min of app configured max tokens, vram based max tokens and
model context window
To access the Khoj admin panel from a non HTTPS custom domain the
`KHOJ_NO_SSL' and `KHOJ_DOMAIN' env vars need to be explictly set.
See the updated setup docs for details.
Resolves#662
### Store Generated Images as WebP
- 78bac4ae Add migration script to convert PNG to WebP references in database
- c6e84436 Update clients to support rendering webp images inline
- d21f22ff Store Khoj generated images as webp instead of png for faster loading
### Lazy Fetch Chat Messages to Improve Time, Data to First Render
This is especially helpful for long conversations with lots of images
- 128829c4 Render latest msgs on chat session load. Fetch, render rest as they near viewport
- 9e558577 Support getting latest N chat messages via chat history API
### Intelligently set Context Window of Offline Chat to Improve Performance
- 4977b551 Use offline chat prompt config to set context window of loaded chat model
### Fixes
- 148923c1 Fix to raise error on hitting rate limit during Github indexing
- b8bc6bee Always remove loading animation on Desktop app if can't login to server
- 38250705 Fix `get_user_photo` to only return photo, not user name from DB
### Miscellaneous Improvements
- 689202e0 Update recommended CMAKE flag to enable using CUDA on linux in Docs
- b820daf3 Makes logs less noisy
- Reduces time to first render when loading long chat sessions
- Limits size of first page load, when loading long chat sessions
These performance improvements are maximally felt for large chat
sessions with lots of images generated by Khoj
Updated web and desktop app to support these changes for now
Previously you couldn't configure the n_ctx of the loaded offline chat
model. This made it hard to use good offline chat model (which these
days also have larger context) on machines with lower VRAM
- Show telemetry enabled/disabled state on init, not every 2 minutes
- Convert no docs synced logs to debug level instead of warning
Having synced docs isn't as important to use Khoj now, unlike before
- Magika on Desktop app was too bloated (100Mb to 250Mb) and broke
install for some reason. Not sure why it was causing the app install
to fail but do not have time to currently investigate
- Just use file extensions whitelist it's good enough for now. Let
server handle the deeper identification of file type
### Index more text file types
- Index all text, code files in Github repos. Not just md, org files
- Send more text file types from Desktop app and improve indexing them
- Identify file type by content & allow server to index all text files
### Deprecate Github Indexing Features
- Stop indexing commits, issues and issue comments in a Github repo
- Skip indexing Github repo on hitting Github API rate limit
### Fixes and Improvements
- **Fix indexing files in sub-folders from Desktop app**
- Standardize structure of text to entries to match other entry processors
- Show internet search, webpage read, image query, image generation steps
- Standardize, improve rendering of the intermediate steps on the web app
Benefits:
1. Improved transparency, allow users to see what Khoj is doing behind
the scenes and modify their query patterns to improve response quality
2. Reduced websocket connection keep alive timeouts for long running steps
- `file-type' doesn't handle mis-labelled files or files without
extensions well
- Only show supported file types in file selector dialog on Desktop app
Use Magika to get list of text file extensions. Combine with other
supported extensions to get complete list of supported file extensions.
Use it to limit selectable files in the File Open dialog.
Note: Folder selector will index text files with no extensions as well
* Don't trigger any re-indexing on server initailization
* Integrate Resend to send welcome emails when a new user signs up
- Only send if this is the first time they've signed in
- Configure welcome email with basic styling, as more complex designs don't work and style tag did not work
### Enable copying chat messages. Improve copy button behavior and styling
- Add button to copy chat messages on Desktop, Web apps
- Improve copy button's icon, hover color & click animation in Desktop, Web apps
### Improve Navigation, Chat Session Panes on Desktop, Web apps
- Dynamically generate navigation menu based on user info from server
- Create API endpoint to get authenticated user information
- Collapse navigation tabs into icons on mobile. Add spacing to them
- Add Chat navigation tab back to top pane on Web app
- Use proper icons for Search, Chat and Agents tab on navigation pane
### Miscellaneous Improvements
- Make current chat expand to full width when session panel collapsed on Desktop App
- Add chat session loading spinner to Desktop App (same as Web app)
### Fixes
- Show title bar in Khoj desktop app on Windows to simplify close, minimize etc.
- Only render first run setup message once if error or server not running
- Fix showing Search navigation tab from Agent pages on web client
The username and location in system prompt should disambiguate user
context from user's actual message for the chat model.
It doesn't need to be told to not mention the context or acknowledge
the context instructions in it's response, as it understands that this
information is just context and not part of the user's actual message.
- Move new conversation button to right of "Conversation" title
- Reduce size of chat message loading ellipsis animation
- Add loading animation for chat session
The `has_documents' flag wasn't being passed. So the search tab
always showing up as empty instead of being dynamically enabled if
documents had been indexed.
- `fs.readdir' func in node version 18.18.2 has buggy `recursive' option
See nodejs/node#48640, effect-ts/effect#1801 for details
- We were recursing down a folder in two ways on the Desktop app.
Remove `recursive: True' option to the `fs.readdirSync' method call
to recurse down via app code only
Add process_single_plaintext_file func etc with similar signatures as
org_to_entries and markdown_to_entries processors
The standardization makes modifications, abstractions easier to create
Sleep until rate limit passed is too expensive, as it keeps a
app worker occupied.
Ideally we should schedule job to contine after rate limit wait time
has passed. But this can only be added once we support jobs scheduling.
Normal indexing quickly Github hits rate limits. Purpose of exposing
Github indexer is for indexing content like notes, code and other
knowledge base in a repo.
The current indexer doesn't scale to index metadata given Github's
rate limits, so remove it instead of giving a degraded experience of
partially indexed repos
- Allow syncing more file types from desktop app to index on server
- Use `file-type' package to identify valid text file types on Desktop app
- Split plaintext entries into smaller logical units than a whole file
Since the text splitting upgrades in #645, compiled chunks have more
logical splits like paragraph, sentence.
Show those (potentially) smaller snippets to the user as references
- Tangential Fix:
Initialize unbound currentTime variable for error log timestamp
- Use Magika's AI for a tiny, portable and better file type
identification system
- Existing file type identification tools like `file' and `magic'
require system level packages, that may not be installed by default
on all operating systems (e.g `file' command on Windows)
## Major
- Parse markdown, org parent entries as single entry if fit within max tokens
- Parse a file as single entry if it fits with max token limits
- Add parent heading ancestry to extracted markdown entries for context
- Chunk text in preference order of para, sentence, word, character
## Minor
- Create wrapper function to get entries from org, md, pdf & text files
- Remove unused Entry to Jsonl converter from text to entry class, tests
- Dedupe code by using single func to process an org file into entries
Resolves#620
* Add support for using OAuth2.0 in the Notion integration
* Add notion to the admin page
* Remove unnecessary content_index and image search/setup references
* Trigger background job to start indexing Notion after user configures it
* Add a log line when a new Notion integration is setup
* Fix references to the configure_content methods
`re.MULTILINE' should be passed to the `flags' argument, not the
`max_splits' argument of the `re.split' func
This was messing up the indexing by only allowing a maximum of
re.MULTILINE splits. Fixing this improves the search quality to
previous state
More content indexed per entry would result in an overall scores
lowering effect. Increase default search distance threshold to counter that
- Details
- Fix expected results post indexing updates
- Fix search with max distance post indexing updates
- Minor
- Remove openai chat actor test for after: operator as it's not expected anymore
- Major
- Do not split org file, entry if it fits within the max token limits
- Recurse down org file entries, one heading level at a time until
reach leaf node or the current parent tree fits context window
- Update `process_single_org_file' func logic to do this recursion
- Convert extracted org nodes with children into entries
- Previously org node to entry code just had to handle leaf entries
- Now it recieve list of org node trees
- Only add ancestor path to root org-node of each tree
- Indent each entry trees headings by +1 level from base level (=2)
- Minor
- Stop timing org-node parsing vs org-node to entry conversion
Just time the wrapping function for org-mode entry extraction
This standardizes what is being timed across at md, org etc.
- Move try/catch to `extract_org_nodes' from `parse_single_org_file'
func to standardize this also across md, org
These changes improve context available to the search model.
Specifically this should improve entry context from short knowledge trees,
that is knowledge bases with sparse, short heading/entry trees
Previously we'd always split markdown files by headings, even if a
parent entry was small enough to fit entirely within the max token
limits of the search model. This used to reduce the context available
to the search model to select appropriate entries for a query,
especially from short entry trees
Revert back to using regex to parse through markdown file instead of
using MarkdownHeaderTextSplitter. It was easier to implement the
logical split using regexes rather than bend MarkdowHeaderTextSplitter
to implement it.
- DFS traverse the markdown knowledge tree, prefix ancestry to each entry
These changes improve entry context available to the search model
Specifically this should improve entry context from short knowledge trees,
that is knowledge bases with small files
Previously we split all markdown files by their headings,
even if the file was small enough to fit entirely within the max token
limits of the search model. This used to reduce the context available
to select the appropriate entries for a given query for the search model,
especially from short knowledge trees
- Previous simplistic chunking strategy of splitting text by space
didn't capture notes with newlines, no spaces. For e.g in #620
- New strategy will try chunk the text at more natural points like
paragraph, sentence, word first. If none of those work it'll split
at character to fit within max token limit
- Drop long words while preserving original delimiters
Resolves#620
This was earlier used when the index was plaintext jsonl file. Now
that documents are indexed in a DB this func is not required.
Simplify org,md,pdf,plaintext to entries tests by removing the entry
to jsonl conversion step
- Convert extract_org_entries function to actually extract org entries
Previously it was extracting intermediary org-node objects instead
Now it extracts the org-node objects from files and converts them
into entries
- Create separate, new function to extract_org_nodes from files
- Similarly create wrapper funcs for md, pdf, plaintext to entries
- Update org, md, pdf, plaintext to entries tests to use the new
simplified wrapper function to extract org entries
- Move green server connected dot to the bottom. Show status when
disconnected from server
- Move "New conversation" button to right of the "Conversation" title
- Center alignment of the new conversation and connection status buttons
- Overview
- Extract more structured date variants (e.g with dot(.) & slash(/) separators, 2-digit year)
- Extract some natural, partial dates as well from entries
- Capability
Add ability to extract the following additional date forms:
- Natural Dates: 21st April 2000, February 29 2024
- Partial Natural Dates: March 24, Mar 2024
- Structured Dates: 20/12/24, 20.12.2024, 2024/12/20
Note: Previously only YYYY-MM-DD ISO-8601 structured date form was extracted for date filters
- Performance
Using regexes is MUCH faster than using the `dateparser' python library
It's a little crude but gives acceptable performance for large datasets
- Much faster than using dateparser
- It took 2x-4x for improved regex to extracts 1-15% more dates
- Whereas It took 33x to 100x for dateparser to extract 65% - 400% more dates
- Improve date extractor tests to test deduping dates, natural,
structured date extraction from content
- Extract some natural, partial dates and more structured dates
Using regex is much faster than using dateparser. It's a little
crude but should pay off in performance.
Supports dates of form:
- (Day-of-Month) Month|AbbreviatedMonth Year|2DigitYear
- Month|AbbreviatedMonth (Day-of-Month) Year|2DigitYear
Previously we just extracted dates in YYYY-MM-DD format from content
for date filterings during search.
Use dateparser to extract dates across locales and natural language
This should improve notes returned as context when chat searches
knowledge base with date filters
Fallback to regex for date parsing from content if dateparser fails
- Limit natural date extractor capabilities to improve performance
- Assume language is english
Language detection otherwise takes a REALLY long time
- Do not extract unix timestamps, timezone
- This isn't required, as just using date and approximating dates as UTC
- When setting up the default agent, configure every conversation that doesn't have an agent to use the Khoj agent
- Fix reverse migration for the locale removal migration
Previously we were skipping the extract questions step for offline
chat as default offline chat model wasn't good enough to output proper
json given the time it took to extract questions.
The new default offline chat models gives json much more regularly and
with date filters, so the extract questions step becomes useful given
the impact on latency
- Benefits of moving to llama-cpp-python from gpt4all:
- Support for all GGUF format chat models
- Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac)
- Supports models with more capabilities like tools, schema
enforcement, speculative ddecoding, image gen etc.
- Upgrade default chat model, prompt size, tokenizer for new supported
chat models
- Load offline chat model when present on disk without requiring internet
- Load model onto GPU if not disabled and device has GPU
- Load model onto CPU if loading model onto GPU fails
- Create helper function to check and load model from disk, when model
glob is present on disk.
`Llama.from_pretrained' needs internet to get repo info from
HuggingFace. This isn't required, if the model is already downloaded
Didn't find any existing HF or llama.cpp method that looked for model
glob on disk without internet
* Initial pass at backend changes to support agents
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications
* Customize default behaviors for conversations without agents or with default agents
* Add a new web client route for viewing all agents
* Use agent_id for getting correct agent
* Add web UI views for agents
- Add a page to view all agents
- Add slugs to manage agents
- Add a view to view single agent
- Display active agent when in chat window
- Fix post-login redirect issue
* Fix agent view
* Spruce up the 404 page and improve the overall layout for agents pages
* Create chat actor for directly reading webpages based on user message
- Add prompt for the read webpages chat actor to extract, infer
webpage links
- Make chat actor infer or extract webpage to read directly from user
message
- Rename previous read_webpage function to more narrow
read_webpage_at_url function
* Rename agents_page -> agent_page
* Fix unit test for adding the filename to the compiled markdown entry
* Fix layout of agent, agents pages
* Merge migrations
* Let the name, slug of the default agent be Khoj, khoj
* Fix chat-related unit tests
* Add webpage chat command for read web pages requested by user
Update auto chat command inference prompt to show example of when to
use webpage chat command (i.e when url is directly provided in link)
* Support webpage command in chat API
- Fallback to use webpage when SERPER not setup and online command was
attempted
- Do not stop responding if can't retrieve online results. Try to
respond without the online context
* Test select webpage as data source and extract web urls chat actors
* Tweak prompts to extract information from webpages, online results
- Show more of the truncated messages for debugging context
- Update Khoj personality prompt to encourage it to remember it's capabilities
* Rename extract_content online results field to webpages
* Parallelize simple webpage read and extractor
Similar to what is being done with search_online with olostep
* Pass multiple webpages with their urls in online results context
Previously even if MAX_WEBPAGES_TO_READ was > 1, only 1 extracted
content would ever be passed.
URL of the extracted webpage content wasn't passed to clients in
online results context. This limited them from being rendered
* Render webpage read in chat response references on Web, Desktop apps
* Time chat actor responses & chat api request start for perf analysis
* Increase the keep alive timeout in the main application for testing
* Do not pipe access/error logs to separate files. Flow to stdout/stderr
* [Temp] Reduce to 1 gunicorn worker
* Change prod docker image to use jammy, rather than nvidia base image
* Use Khoj icon when Khoj web is installed on iOS as a PWA
* Make slug required for agents
* Simplify calling logic and prevent agent access for unauthenticated users
* Standardize to use personality over tuning in agent nomenclature
* Make filtering logic more stringent for accessible agents and remove unused method:
* Format chat message query
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
### Overview
Khoj can now read website directly without needing to go through the search step first
### Details
- Parallelize simple webpage read and extractor
- Rename extract_content online results field to web pages
- Tweak prompts to extract information from webpages, online results
- Test select webpage as data source and extract web urls chat actors
- Render webpage read in chat response references on Web, Desktop apps
- Pass multiple webpages with their urls in online results context
- Support webpage command in chat API
- Add webpage chat command for read web pages requested by user
- Create chat actor for directly reading webpages based on user message
Previously even if MAX_WEBPAGES_TO_READ was > 1, only 1 extracted
content would ever be passed.
URL of the extracted webpage content wasn't passed to clients in
online results context. This limited them from being rendered
- Fallback to use webpage when SERPER not setup and online command was
attempted
- Do not stop responding if can't retrieve online results. Try to
respond without the online context
* Initial pass at backend changes to support agents
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications
* Customize default behaviors for conversations without agents or with default agents
* Use agent_id for getting correct agent
* Merge migrations
* Simplify some variable definitions, add additional security checks for agents
* Rename agent.tuning -> agent.personality
- Use the conversation id of the retrieved conversation rather than the
potentially unset conversation id passed via API
- await creating new chat when no chat id provided and no existing
conversations exist
- Move some common methods into separate functions to make the UI components more efficient
- The normal HTTP-based chat connection will still work and serves as a fallback if the websocket is unavailable
- Convert to a model of calling the search API directly with a function call (rather than using the API method)
- Gracefully handle websocket connection disconnects
- Ensure that the rest of the response is still saved, as it is currently, if the user disconects from the client
- Setup unchangeable context at the beginning of the session when the connection is established (like location, username, etc)
The recently added after: operator to online search actor was too
restrictive, gave worse results than when just use natural language
dates in search query
Previously was assuming the system prompt is being always passed as
the first message. So expected there to be at least 2 messages in logs.
This broke chat actors querying with single long non system message.
A more robust way to extract system prompt is via the message role
instead
- Ask for Confirmation before deleting chat session in Desktop, Web app
- Save chat session rename on hitting enter in title edit input box
- No need to flash previous conversation cleared status message
- Move chat session delete button after rename button in Desktop app
- Add prompt for the read webpages chat actor to extract, infer
webpage links
- Make chat actor infer or extract webpage to read directly from user
message
- Rename previous read_webpage function to more narrow
read_webpage_at_url function
### Major
- Enforce json mode response from OpenAI chat actors prev using string lists
- Use `gpt-4-turbo-preview' as default chat model, extract questions actor
- Make Khoj read khoj website to respond with accurate, up-to-date information about itself
- Dedupe query in notes prompt. Improve OAI chat actor, director tests
### Minor
- Test data source, output mode selector, web search query chat actors
- Improve notes search actor to always create a non-empty list of queries
- Construct available data sources, output modes as a bullet list in prompts
- Use consistent agent name across static and dynamic examples in prompts
- Add actor's name to extract questions prompt to improve context for guidance
Previously only the notes references would get rendered post response
streaming when when both online and notes references were used to
respond to the user's message
- Allow passing response format type to OpenAI API via chat actors
- Convert in-context examples to use json objects instead of str lists
- Update actors outputting str list to request output to be json_object
- OpenAI's json mode enforces the model to output valid json object
- Remove stale tests
- Improve tests to pass across gpt-3.5 and gpt-4-turbo
- The haiku creation director was failing because of duplicate query in
instantiated prompt
- Remove the option for Notes search query generation actor to return
no queries. Whether search should be performed is decided before,
this step doesn't need to decide that
- But do not throw warning if the response is a list with no elements
- Add examples where user queries requesting information about Khoj
results in the "online" data source being selected
- Add an example for "general" to select chat command prompt
Previously the examples constructed from chat history used "Khoj" as
the agent's name but all 3 prompts using the func used static examples
with "AI:" as the pertinent agent's name
- Add example to read khoj.dev website for up-to-date info to setup,
use khoj, discover khoj features etc.
- Online search should use site: and after: google search operators
- Show example of adding the after: date filter to google search
- Give local event lookup example using user's current location in
query
- Remove unused select search content type prompt
- Add a page to view all agents
- Add slugs to manage agents
- Add a view to view single agent
- Display active agent when in chat window
- Fix post-login redirect issue
### Major
- Read web pages in parallel to improve chat response time
- Read web pages directly when Olostep proxy not setup
- Include search results & web page content in online context for chat response
### Minor
- Simplify, modularize and add type hints to online search functions
Previously if a web page was read for a sub-query, only the extracted
web page content was provided as context for the given sub-query. But
the google results themselves have relevant snippets. So include them
- Simplify content arg to `extract_relevant_info' function. Validate,
clean the content arg inside the `extract_relevant_info' function
- Extract `search_with_google' function outside the parent function
- Call the parent function a more appropriate `search_online' instead
of `search_with_google'
- Simplify the `search_with_google' function using list comprehension.
Drop empty search result fields from chat model context for response
to reduce cost and response latency
- No need to show stacktrace when unable to read webpage, basic error
is enough
- Add type hints to online search functions to catch issues with mypy
- Time reading webpage, extract info from webpage steps for perf
analysis
- Deduplicate webpages to read gathered across separate google
searches
- Use aiohttp to make API requests non-blocking, pair with asyncio to
parallelize all the online search webpage read and extract calls
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications
- Trigger
SentenceTransformer Cross Encoder models now run fast on GPU enabled machines, including Mac ARM devices since UKPLab/sentence-transformers#2463
- Details
- Use cross-encoder to rerank search results by default on GPU machines and when using an inference server
- Only call search API when pause in typing search query on web, desktop apps
Wait for 300ms since stop typing before calling search API.
This smooths out UI jitter when rendering search results, especially
now that we're reranking for every search query on GPU enabled devices
Emacs already has 300ms debounce time. More convoluted to add
debounce time to Obsidian search modal, so not updating that yet
Latest sentence-transformer package uses GPU for cross-encoder. This
makes it fast enough to enable reranking on machines with GPU.
Enabling search reranking by default allows (at least) users with GPUs
to side-step learning the UI affordance to rerank results
(i.e hitting Cmd/Ctrl-Enter or ENTER).
- Fix
`get_conversation_by_user' shouldn't return new conversation if
conversation with requested id not found.
It should only return new conversation if no specific conversation
is requested and no conversations found for user at all
- Repro
- Delete a new chat, this calls loadChat via window.onload which
calls server /chat/history API endpoint with conversationId set to
that of just deleted conversation sporadically
The call to GET chat/history API with conversationId set occurs
when window.onload triggers before the conversationId is deleted
by the delete button after the DELETE /chat/history API call (via race)
- In such a scenario, get_conversation_by_user called by
chat/history API with conversationId of deleted conversation
returns a new conversation
- Miscellaneous
- Chat history load should be logged as call to that chat_history api,
not the "chat" api
- Show status updates of clearing conversation history in chat input
- Simplify web, desktop client code by removing unnecessary new variables
* Upload generated images to s3, if AWS credentials and bucket is available.
- In clients, render the images via the URL if it's returned with a text-to-image2 intent type
* Make the loading screen more intuitve, less jerky and update the programmatic copy button
* Update the loading icon when waiting for a chat response
* Add additional styling changes for showing UI changes when dragging file to the main screen
* Add a loading spinner when file upload is in progress, and don't index github/notion when indexing files
* Add an explicit icon for file uploading in the chat button menu
* Add appropriate dragover styling when picking a file from the file picker/browser
* Add a loading screen when retrieving chat history. Fix width of the chat window. Put attachment icon to the left of chat input
* Make major improvements to the image generation flow
- Include user context from online references and personal notes for generating images
- Dynamically select the modality that the LLM should respond with
- Retun the inferred context in the query response for the dekstop, web chat views to read
* Add unit tests for retrieving response modes via LLM
* Move output mode unit tests to the actor suite, rather than director
* Only show the references button if there is at least one available
* Rename aget_relevant_modes to aget_relevant_output_modes
* Use a shared method for generating reference sections, simplify some of the prompting logic
* Make out of space errors in the desktop client more obvious
- Open external links using the default link handler registered on OS
for the link type, e.g http:// -> firefox, mailto: thunderbird etc
- Confirm before opening non-http URL using an external app