- So when searching across content types (with content-type = "all")
org-mode results get rendered differently than markdown, PDF etc. results
- Set div class for each result separately instead of a single uber div
for styling. This allows styling div of each result based on the
content-type of that result
- No need to create placeholder "all" content type on web interface as
server is passing an all content type by itself
- Add cards to configure each of the Github repositories
- Fix a bug in the API which caused all other settings to be wiped when updating one of the content types
- Provide an error message to the user if they have a misconfiguration in their chat settings
- Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view
- Support indexing a list of repositories
- Show success/failure status message much closer to the save button
Previously status message was shown on top of the page, which wasn't
always in view and wasn't easily seen
- Improve the status message to more clearly show next steps on success
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.
This results in search across all enabled asymmetric search text
content types
- Add new filter abstract method to remove filter terms from query
- Use the filter method to remove filter terms, encode this defiltered
query and pass it to the query methods of each search types
TODO: Encoding query is still taking 100-200 ms unlike before. Need to
investigate why
- Update API to return content from all enabled content types when type
is not set to specific type in HTTP request param
- To do this efficiently run the search queries in parallel threads
- Default is 30. So number of paginated requests required to get all
items (commits, files) will reduce by 67%
- No need to increase page size for the get tree Github API request from
`get_markdown_files'
Get tree Github API doesn't support pagination and return 100K items
in response. This should be way more than enough for our current
use-cases
- Previously wasn't prefixing "token" to PAT token in Auth header
This resulted in the request being considered unauthenticated
- Unauthenticated requests to Github API are limited to 60 requests/hour
Authenticated requests to Github API are allowed 5000 requests/hour
- Add a central configuration management page to make management of config details easier
- Add relevant api endpoints both for client and server to update/request data as necessary
- Attempt to update the favicon
The Llama_Hub Github plugin is fairly limited.
The Github Rest API is well supported and can easily be extended to
index commit messages, issues, discussions, PRs etc.
- Make API endpoints on Khoj server accept `client` as request parameter
- Khoj API endpoints: /chat, /search, /update
- Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server
- Khoj clients: Emacs, Obsidian and Web
- Also log khoj server_version running to telemetry server
- This improves latency of @general chat by avoiding unnecessary
compute
- It also avoids passing references in API response when they haven't
been used to generate the chat response. So interfaces don't have to
add logic to not render them unnecessarily
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
- Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
- Match argument names passed to khoj openai completion funcs with
arguments passed to langchain calls to OpenAI
- This simplifies the logic in the khoj openai completion funcs
- Fix bug where both LangChain and Khoj retry requests 6 times each.
So a total of 12 requests at >1minute intervals for each chat
response in case of OpenAI API being down
- Retrying too many times when the API is failing doesn't help
- The earlier 60 second request timeout was spacing out the interval
between retries way too much. This slowed down chat response times
quite a bit when API was being flaky
- With these updates you'll know if call to chat API failed in under a
minute
- Use ChatModel and ChatOpenAI to call OpenAI chat model instead of
using OpenAI package directly
- This is being done as part of migration to rely on LangChain for
creating agents and managing their state
- Khoj chat will now respond to general queries if:
1. no relevant reference notes available or
2. when explicitly induced by prefixing the chat message with "@general"
- Previously Khoj Chat would a lot of times refuse to respond to
general queries not answerable from reference notes or chat history
- Make chat quality tests more robust
- Add more equivalent chat response options refusing to answer
- Force haiku writing to not give any preable, just the haiku
- Simplifies switching between different OpenAI chat models. E.g GPT4
- It was previously hard-coded to use gpt-3.5-turbo. Now it just
defaults to using gpt-3.5-turbo, unless chat-model field under
conversation processor updated in khoj.yml
Otherwise if heading > max_tokens than the search models will just see
a heading (with repeated filename) for each compiled entry and not
actual content.
100 characters should be sufficient to include filename (not path) and
entry heading. If longer rather truncate to pass entry unique text to
model for search context
Previously filename was appended to the end of the compiled entry.
This didn't provide appropriate structured context
Test filename getting prepended as heading to compiled entry
All compiled snippets split by max tokens (apart from first) do not
get the heading as context.
This limits search context required to retrieve these continuation
entries
- cl-push expects a generatlized variable. Else throws (setf quote)
undefined warning
- This results in the config call failing on calling khoj entrypoint
- Remove waiting for server message as it hides the messages from the
server
- Fix the nil message that were being rendered, by checking before
showing messages from server
- Consistently prefix messages from khoj with khoj.el
Previously khoj.el was calling the server configure API even when
config was same as before.
This had broken the khoj search as you type experience from emacs
Also show more details to user about what in khoj is being configured
Resolves#185, #199
- Issue
IndexName created from Obsidian Absolute Vault path wasn't replacing
windows path, drive separators with underscore. It was only
replacing unix path separators
- Fix
Also replace windows drive and path separators with _ while creating
IndexName in Khoj Obsidian plugin
Makes it easier to tell pip associated with which python is being
used. Easier to debug when users have different versions of python
installed (e.g 3.10 and 3.11)
- Explicity split entry string by space during split by max_tokens
- Prevent formatting of compiled entry from being lost
- The formatting itself contains useful information
No point in dropping the formatting unnecessarily,
even if (say) the currrent search models don't account for it (yet)
Append originating filename to compiled string of each entry for
better search quality by providing more context to model
Update markdown_to_jsonl tests to ensure filename being added
Resolves#142
This follows expected behavior for obsidain search modals
E.g Ominsearch and default Obsidian search.
The note creation code is borrowed from Omnisearch.
Resolves#133
- Give space in the input field. Too narrow previously
- References should be indexed from 1 instead of 0
- Use Obsidian font size variables to scale fonts in chat appropriately