Removing unused content types will reduce khoj code to manage
- 0f993b3 Drop support for Ledger as a separate content type
Khoj will soon get a generic text indexing content type in Index plain text files #237.
This along with a file filter should suffice for searching through Ledger transactions
- c9db532 Remove unused org-music as an indexable content type from Khoj
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.
Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Cleaning up that code will reduce number of content types for khoj to
manage.
- Add one-click disablement
- Remove fields that probably don't need to be edited (our implementation details)
- Add a green tick if a given field is configured
- In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network)
- Prevent re-indexing for Github data if this is a demo instance
- Fix up some issues with the CSS which made settings page small in mobile
- In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page
- Break out of rendering list if at end of org block in org.js
- This would previous hang rendering results in web interface
Should try fix this upstream in org.js as well
- Previously Khoj could only support Python upto 3.10 due to pytorch.
But lots of folks had python 3.11 installed by default on their machines.
This required installing python 3.10 and dealing with virtual envs.
With Torch >= 2.0.1 now able to support python 3.11, at least one
class of installation troubles for Khoj should drop. See
https://github.com/pytorch/pytorch/issues/86566 for reference
- Preliminary testing indicates using the new torch 2.x may reduce
search time by 25% (from 80ms to 60ms on Mac M1)
- Update Docs to not require mentioning python <=3.10 required
- Update Github test workflow to run khoj tests with python 3.11 too
- Use a request session to reduce the overhead of setting up a new connection with the Github URL each request
- Use the streaming feature for the REST api to reduce some of the memory footprint
- Set image_search.query to async to use it with multi-threading
This is same as text_search.query being set to an async method
- Exit search early if no search_model is defined in state.model
- So when searching across content types (with content-type = "all")
org-mode results get rendered differently than markdown, PDF etc. results
- Set div class for each result separately instead of a single uber div
for styling. This allows styling div of each result based on the
content-type of that result
- No need to create placeholder "all" content type on web interface as
server is passing an all content type by itself
- Add cards to configure each of the Github repositories
- Fix a bug in the API which caused all other settings to be wiped when updating one of the content types
- Provide an error message to the user if they have a misconfiguration in their chat settings
- Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view
- Support indexing a list of repositories
- Show success/failure status message much closer to the save button
Previously status message was shown on top of the page, which wasn't
always in view and wasn't easily seen
- Improve the status message to more clearly show next steps on success
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.
This results in search across all enabled asymmetric search text
content types
- Add new filter abstract method to remove filter terms from query
- Use the filter method to remove filter terms, encode this defiltered
query and pass it to the query methods of each search types
TODO: Encoding query is still taking 100-200 ms unlike before. Need to
investigate why
- Update API to return content from all enabled content types when type
is not set to specific type in HTTP request param
- To do this efficiently run the search queries in parallel threads
- Default is 30. So number of paginated requests required to get all
items (commits, files) will reduce by 67%
- No need to increase page size for the get tree Github API request from
`get_markdown_files'
Get tree Github API doesn't support pagination and return 100K items
in response. This should be way more than enough for our current
use-cases
- Previously wasn't prefixing "token" to PAT token in Auth header
This resulted in the request being considered unauthenticated
- Unauthenticated requests to Github API are limited to 60 requests/hour
Authenticated requests to Github API are allowed 5000 requests/hour
- Add a central configuration management page to make management of config details easier
- Add relevant api endpoints both for client and server to update/request data as necessary
- Attempt to update the favicon
The Llama_Hub Github plugin is fairly limited.
The Github Rest API is well supported and can easily be extended to
index commit messages, issues, discussions, PRs etc.
- Make API endpoints on Khoj server accept `client` as request parameter
- Khoj API endpoints: /chat, /search, /update
- Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server
- Khoj clients: Emacs, Obsidian and Web
- Also log khoj server_version running to telemetry server
- This improves latency of @general chat by avoiding unnecessary
compute
- It also avoids passing references in API response when they haven't
been used to generate the chat response. So interfaces don't have to
add logic to not render them unnecessarily
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
- Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
- Match argument names passed to khoj openai completion funcs with
arguments passed to langchain calls to OpenAI
- This simplifies the logic in the khoj openai completion funcs
- Fix bug where both LangChain and Khoj retry requests 6 times each.
So a total of 12 requests at >1minute intervals for each chat
response in case of OpenAI API being down
- Retrying too many times when the API is failing doesn't help
- The earlier 60 second request timeout was spacing out the interval
between retries way too much. This slowed down chat response times
quite a bit when API was being flaky
- With these updates you'll know if call to chat API failed in under a
minute
- Use ChatModel and ChatOpenAI to call OpenAI chat model instead of
using OpenAI package directly
- This is being done as part of migration to rely on LangChain for
creating agents and managing their state
- Khoj chat will now respond to general queries if:
1. no relevant reference notes available or
2. when explicitly induced by prefixing the chat message with "@general"
- Previously Khoj Chat would a lot of times refuse to respond to
general queries not answerable from reference notes or chat history
- Make chat quality tests more robust
- Add more equivalent chat response options refusing to answer
- Force haiku writing to not give any preable, just the haiku
- Simplifies switching between different OpenAI chat models. E.g GPT4
- It was previously hard-coded to use gpt-3.5-turbo. Now it just
defaults to using gpt-3.5-turbo, unless chat-model field under
conversation processor updated in khoj.yml
Otherwise if heading > max_tokens than the search models will just see
a heading (with repeated filename) for each compiled entry and not
actual content.
100 characters should be sufficient to include filename (not path) and
entry heading. If longer rather truncate to pass entry unique text to
model for search context
Previously filename was appended to the end of the compiled entry.
This didn't provide appropriate structured context
Test filename getting prepended as heading to compiled entry
All compiled snippets split by max tokens (apart from first) do not
get the heading as context.
This limits search context required to retrieve these continuation
entries
- cl-push expects a generatlized variable. Else throws (setf quote)
undefined warning
- This results in the config call failing on calling khoj entrypoint
- Remove waiting for server message as it hides the messages from the
server
- Fix the nil message that were being rendered, by checking before
showing messages from server
- Consistently prefix messages from khoj with khoj.el
Previously khoj.el was calling the server configure API even when
config was same as before.
This had broken the khoj search as you type experience from emacs
Also show more details to user about what in khoj is being configured
Resolves#185, #199
- Issue
IndexName created from Obsidian Absolute Vault path wasn't replacing
windows path, drive separators with underscore. It was only
replacing unix path separators
- Fix
Also replace windows drive and path separators with _ while creating
IndexName in Khoj Obsidian plugin
Makes it easier to tell pip associated with which python is being
used. Easier to debug when users have different versions of python
installed (e.g 3.10 and 3.11)
- Explicity split entry string by space during split by max_tokens
- Prevent formatting of compiled entry from being lost
- The formatting itself contains useful information
No point in dropping the formatting unnecessarily,
even if (say) the currrent search models don't account for it (yet)
Append originating filename to compiled string of each entry for
better search quality by providing more context to model
Update markdown_to_jsonl tests to ensure filename being added
Resolves#142
This follows expected behavior for obsidain search modals
E.g Ominsearch and default Obsidian search.
The note creation code is borrowed from Omnisearch.
Resolves#133
- Give space in the input field. Too narrow previously
- References should be indexed from 1 instead of 0
- Use Obsidian font size variables to scale fonts in chat appropriately
- Add message sender, date metadata as message footer
- Use css directly from Khoj Chat Web Interface.
- Modify it to work under a Obsidian modal
- So replace html, body styling from web interface to instead
styling new "khoj-chat" class attached to contentEl of modal
Converts paths to glob style regexes that will index all org files
recursively under the specified list of path
Should help setup for org-roam users from khoj.el
- khoj-auto-setup controls whether to automatically check for and
setup khoj server from within Emacs
- extract install, start, configure sequence into public, interactive
method. Allows calling khoj-setup during package load via init.el
- Fix: Do not attempt to configure or wait for server ready if
user has said no to auto-setup request
- Fix logic to mark server started vs ready
- Previously the started/running vs ready variables defs were getting
intertwined
- Server started indicates server bootup has been triggered
- Server ready indicates server API ready to accept requests
- If khoj server started outside emacs, khoj--server-ready should be set
to true by khoj--server-running method (instead of waiting for proc msg)
- If khoj server is unconfigured the /config/types endpoint wouldn't
return anything. Using config/data/default allows checking khoj server
running status without requiring it to be configured as well
If the config hasn't changed there'll be no update. If config has
changed indexing will get triggered asynchronously. But user cannot
make query till indexing done
As easier to know when server ready to configure
- Use process filter, sentinel to mark when khoj server is ready or not
- Display server messages for visibility into server boot-up process
- Wait until server ready to open khoj transient menu in Emacs
Until then khoj features wouldn't work anyway, so avoids confusion
- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
tenacity package. This is officially suggested and used by other
popular GPT based libraries
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
argument to generate_chatml_messages_with_context method
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
- Add continuation suffix, ..., when reference definition truncated
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
- Improves color coding for readability
- Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
- Generalize the render-chat-response method to handle rendering
history or chat response from chat API reponse
- Trigger rendering of khoj chat history if Khoj chat buffer not
created for this session yet
- Use org-insert-link method to improve link rendering robustness
Previous simple mechanism to crete org-links would result in links
escaping out of formating. Use a user-facing org-mode method to
remove/reduce probability of this
- Replace newlines with space to render reference notes as links
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
- [sender-name]: *[message]*
- /[receive-date]/
- Add references as org links with context visible on hover,
but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
Use `-map-indexed' method from dash
- Reasons:
- GPT can extract date aware search queries with date filters
better than ChatGPT given the same prompt.
- Need quality more than cost savings for now.
- Need to figure ways to improve prompt for ChatGPT before using it
Update Search Actor prompt with answers, more precise primer and
two more examples for context
Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
extract complete questions
- This should improve search results quality
- Example Expected Inferred Questions from User Message using History:
1. "What is the name of Arun's daughter?"
=> "What is the name of Arun's daughter"
2. "Where does she study?" =>
=> "Where does Arun's daughter study?" OR
=> "Where does Arun's daughter, Reena study?"
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
Answer time range based questions
Limit search to specified timeframe in question using date filter
E.g "What national parks did I visit last year?" adds
dt>="2022-01-01" dt<"2023-01-01" to Khoj search
Note: Temperature set to 0. Message to search queries should be deterministic
Create Rubric to Test Chat Quality and Capabilities
### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities
### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
- Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
- Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
- Measure quality at 2 separate layers
- **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
- **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes.
- Mark desired but not currently available capabilities as expected to fail <br />
This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
answering
- Make GPT be more reliable in looking at past conversations for
forming response
# Improve Khoj Chat
## Main Changes
- Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost
- Improve Prompt to answer query using indexed notes
- Previously was asking GPT to summarize the notes
- Both the chat and answer API use this new prompt
- Support Multi-Turn conversations
- Pass previous messages and associated reference notes to ChatGPT for context
- Show note snippets referenced to generate response
- Allows fact-checking, getting details
- Simplify chat interface by using only single unified chat type for now
## Miscellaneous
- Replace summarize with answer API. Summarize via API not useful for now
- Only pass Khoj search results above a threshold confidence to GPT for context
- Allows Khoj to say don't know if it can't find answer to query from notes
- Allows relying on (only) conversation history to generate response in multi-turn conversation
- Move Chat API out of beta. Update Readme
GPT still mostly says I don't know when answer not in notes or chats
But with this its more inclined to answer general questions not in
chats or notes while informing user that the information is not from
existing chats or notes
- Chat uses compiled form of search results, not the raw entries to
provide context for chat. The compiled snipped search results
themselves are unique and using multiple of them for context from
the same raw note is fine if they cross the score and rank thresholds
This should improve the context provided for chat
- Also apply score_threshold, no deduplication to the answers API
- Issue
The file path separator by khoj server and the Obsidian vault were
different on Windows
- Fix
Normalize file path to use forward slash(/) to find the matching
note file in the Obsidian vault for jump to it
Resolves#177
Answer does not rely on past conversations, just the knowledge base.
It is meant for one off interactions, like search rather than a
continuing conversation like chat
For now it is only exposed via API. Later it will be expose in the
interfaces as well
Remove ability to select different chat types from the chat web
interface as there is only a single chat type
Stop appending answers to the conversation logs
- Only use decent quality search results, if any, as context
- Pass source results used by previous chat messages as context
- Loosen prompt to allow looking at previous chats and notes to answer
- Pass current date for context
- Make GPT provide reason when it can't answer the question. Gives
user context to tune their questions
- Set context by either including last 2 chat messages from active
session or past 2 conversation summaries from conversation logs
- Set personality in system message
- Place personality system message before last completed back & forth
This may stop ChatGPT forgetting its personality as conversation progresses given:
- The conditioning based on system role messages is light
- If system message is too far back in conversation history, the
model may forget its personality conditioning
- If system message at end of conversation, the model can think its
the start of a new conversation
- Inserting the system message before last completed back & forth should
prevent ChatGPT from assuming its the start of a new conversation
while not losing personality conditioning from the system message
- Simplfy the Khoj Chat API to for now just answer from users notes
instead of trying to infer other potential interaction types.
- This is the default expected behavior from the feature anyway
- Use the compiled text of the top 2 search results for context
- Benefits of using ChatGPT
- Better model
- 1/10th the price
- No hand rolled prompt required to make GPT provide more chatty,
assistant type responses
- Improve GPT prompt
- Make GPT answer users query based on provided notes instead
of summarizing the provided notes
- Make GPT be truthful using prompt and reduced temperature
- Use Official OpenAI Q&A prompt from cookbook as starting reference
- Replace summarize API with the improved answer API endpoint
- Default to answer type in chat web interface. The chat type is not
fit for default consumption yet
Previous behavior was resulting in a null reference error. As key for
the core content/search type was not present in current config
Fallback to using default config for unconfigured core content type
instead
See #165 for details
- Use emojis to make info logs easier to read
- Inform when khoj is ready to use
- Provide information on what khoj is doing while starting up
- Inform when content/search types and processors are setup
- Inform when models are being loaded from the web as this step can
take time
- Convert all other info logs to be only shown in verbose mode
- Text before headings was not being indexed due to buggy orgnode
parsing logic
- Resolved indexing intro text from files with and without headings in
them
- Ensure intro text node has heading set to all title lines collected
from the file
Resolves#165
- Test /config/types API when no plugin configured, only plugin configured
and no content configured scenarios
- Do not throw null reference exception while configuring search types
when no plugin configured
- Do not throw null reference exception on calling /config/types API
when no plugin configured
Resolves bug introduced by #173
Repro:
1. Open khoj server with `khoj` on first run
2. Install/enable Khoj Obsidian plugin (to configure khoj server)
3. Restart khoj server with `khoj`
Bug:
- Unconfigured processor and search_types are instantiated as None in
self.current_config
- While creating the desktop GUI, these null configs are attempted to
be accessed as valid dictionaries for creating their GUI panels
- This results in the null ref errors
Fix:
Use default config to create their GUI elements for unconfigured
search and processor types
Resolves#167
- Previously was return all core content types even if they had not been
setup
- Add test to validate only configured content types are returned by
the api/config/types API endpoint
- Remove need for interfaces to downcase content types returned by API
before using the type in search and other API endpoint
- Fix to check for search_type.name in plugin keys instead of value
Configure app routes after configuring server.
Import API routers after search type is dynamically populated.
Allow API to recognize the dynamically populated plugin search types
as valid type query param.
Enable searching for plugin type content.
- Remove unneeded type ignore for mps with the latest mypy
- Stop excluding PyQT desktop GUI code from MyPy checks
- Do not warn about unused ignores. Some issue with mypy giving
different errors in different environments (venv, system and pre-commit)
- Use Rich to render uvicorn, fastAPI logs as well
The previous CustomFormatter only worked on khoj logs
- Improve rendering stacktrace on errors using Rich
- What
- The Emacs and Obsidian interfaces stay in their original
directories under src/
- src/khoj now only contains code meant for pypi packaging
- Benefits
- This avoids having to update khoj MELPA, Obsidian plugin config as
the Emacs, Obsidian code is under their original directories
- It separates the code in src/khoj meant for python packaging from
code for external interfaces like Emacs and Obsidian
- Why
The khoj pypi packages should be installed in `khoj' directory.
Previously it was being installed into `src' directory, which is a
generic top level directory name that is discouraged from being used
- Changes
- move src/* to src/khoj/*
- update `setup.py' to `find_packages' in `src' instead of project root
- rename imports to form `from khoj.*' in complete project
- update `constants.web_directory' path to use `khoj' directory
- rename root logger to `khoj' in `main.py'
- fix image_search tests to use the newly rename `khoj' logger
- update config, docs, workflows to reference new path `src/khoj'
- By default the obsidian plugin automatically configures the khoj
backend to index the current vault
- For more complex scenarios, users can manage their ~/.khoj/khoj.yml
manually by toggling the auto-configure setting off in the khoj
plugin settings
Resolves#156
- Background
1. Obsidian stores markdown notes as utf8[1]
2. By default, the python `open' command uses the OS locale encoding[2]
This was causing the `UnicodeDecodeError: <locale_encoding> codec can't decode byte' error
- Fix
- Read markdown files as utf8
The Obsidian plugin is the main use-case for markdown files in
khoj currently and that stores md files as utf8.
Do not assume utf8 for other content types like org-mode, beancount for now.
- Fail if error in reading file as utf8, instead of ignoring errors.
Would rather have user realize that their files are not going to
get indexed correctly.
[1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3
[2]: https://docs.python.org/3/library/functions.html#open
Khoj plugin page from within Obsidian isn't recognized. Seems like it
needs an uppercase readme file only. So it doesn't show the Khoj
readme from within Obsidian itself.
- Update khoj.el test to reflect updated rendering logic
- Move ledger render function before image rendered to group functions
with similar logic closer
Split find file, jump to file code to make onChooseSuggestion more readable
- Use find, instead of using return in forEach to get first match
- Move the jump to file+heading code out from forEach
Do not reference global app object from child objects and funcs
directly.
It is only available for debugging purposes and access to it maybe
dropped in the future.
Previously no query syntax helpers, like the "file:" prefix, were used
before checking if query contains file path.
This made query to image search brittle to misinterpretation and
pointless checking
Add test to verify search by image at file works as expected
- Support querying with text surrounding point in any text buffer
Previously could only find items similar to org entry at point
- Find similar items of specified content type indexed on khoj
Previously only looked for similar org entries indexed on khoj
Now uses the content-type configured in khoj transient menu to find
items of the specified content type
- Details
- Generalize the get-current-org-entry-text func to get text for any
outline section
- Replace leading whitespaces from query text as well
- Create method to get current paragraph text from non-outline mode
buffers
- Update transient, find-similar funcs to pass, use content-type
configured in khoj transient menu
- Generalize query title creation logic to remove markdown headings
prefix (#) apart from org heading prefix (*) as well
- Update last used khoj content-type and results from the
find-similar and update funcs for later reuse
- Jump to top of results buffer after results rendered
Enable searching for notes similar to the current note being viewed
## Main Changes
- 39a18e2 Extend search modal to search for similar notes
- Hide input field on init, Trigger search on opening modal when in similar notes mode
- Set input to contents of current markdown file and get notes similar to it
- Re-rank, by default, when searching for similar notes
- Filter out current note from similar note search results
- 0bed410 Only show `Find Similar Note' command in Editor
- Hide input field on init, Trigger search on opening modal in similar notes mode
- Set input to current markdown file and get similar notes to it
- Enable rerank when searching for similar notes
- Filter out current note from similar note search results
- Screenshot querying "Setup Editor" on test vault with Khoj Readmes
- New features showcase:
- information keybindings, rerank keybinding at bottom of modal
- fixed top level headings in search results
- search results snipped if greater than N words
- Previously top level headings would have get stripped of the
space between heading text and the prefix # symbols. That is,
`# Top Level Heading' would get converted to `#Top Level Heading'
- This would mess up their rendering as a heading in search results
- Add unit tests to text_to_jsonl processors to prevent regression
Provides a more consistent rendering of results in modal.
Makes it easier to see more results in modal.
To see complete entry, user can always just jump to entry from modal
### Overview
- Provide a chat interface to engage with and inquire your notes
- Simplify interacting with the beta `chat` and `summarize` APIs
### Use
- Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize
- Type your queries, see summarized response by Khoj from your notes
**Note**:
- **You will need to add an API key from OpenAI to your khoj.yml**
- **Your query and top note from search result will be sent to OpenAI for processing**
## Details
- 177756b Show chat history on loading chat page on web interface
- d8ee0f0 Save chat history to disk for persistence, seeing chat logs
- 5294693 Style chat messages as speech bubbles
- d170747 Add khoj web interface and chat styling to new chat page on khoj web
- de6c146 Implement functional, unstyled chat page for khoj web interface
- The previous mechanism to trigger saving on shutdown event did not work
- Use scheduler to persist chat sessions to disk at a 5 minute interval
- This improve time granularity, fixed interval of saving chat logs
- It may lose ~5 minutes of chat history until mechanism to also
write on shutdown found/resolved
- Create conversation directory if it doesn't exist before attempting write
- Reset chat_session after writing it to disk
- Wrap messages into speech bubbles
- Color messages by khoj blue, sender grey
- Add those standard protrusions to the speech bubbles for fun
- Align bubbles left or right based on sender
- messages by khoj are left aligned, message by self are right aligned
- Put message metadata like sender and time under speech bubble
- use data-* attribute and ::after css pseudo-selector for this
- Update renderMessage func to accept time param, remove unused type_ param
Not all notes are in the past. Notes can be about stuff in the future.
Casting them to past tense gives the impression that they've already
happened / been done.
- Changes
- Use blue color for khoj heading font
- This fixes the title color issue
- Update background to lighter shade
- This fixes the body text color issue
- Update colors for todo, done, miscellaneous todo state, tag color
- This does not fix the color contrast issue but seems like an acceptable solution
- Using white text rather than black text on blue background
better even though the black text on blue background passes the
WCAG acceptable contrast score
- For details see blog post:
https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/
- Add border to tags to give them tag pills look and differntiate
from todo states
- Buttons and inputs
- Change background color of input fields like type dropdown,
update button and results count counter, to match background
color of page
- Add shadow on hover over button, dropdowns
Resolves#111
- Ensure message input box sticks to bottom of screen
- Ensure chat logs div is scrollable when logs become longer than screen
Do not make the whole page scroll, just the chat logs body div
Uses longest file path match to find markdown file in vault
corresponding to file of search result returned by Khoj
Allow jumping to search result from khoj plugin modal on Android too
Previous mechanism of manually triggering getSuggestions,
renderSuggestions flow was corrupting traversing and opening
reranked search results in KhojModal
Emulate event that would anyway trigger the get & render of results in
modal. This lets obsidian core handle the flow without digging too
deep into obsidian cores handling of the flow. Lowers the chance of
breakage
We need the index file paths to make sense on the khoj backend server
Having path of index on backend relative to current vault directory
on frontend ignores the fact that the frontend maybe on a different
machine than the khoj backend server
Using unique index name per vault allows switching vaults without
overwriting indices of other vaults created on khoj backend when khoj
obsidian plugin is loaded on opening a different vault
- Overview
Limits using Khoj with a single vault at a time. This is
automatically configured to the most recently opened vault.
Once directory filters are supported on backend, the plugin will be
updated to index multiple vault but search only current vault from
current vaults khoj obsidian plugin
- Code Details
- Remove setting to configure Vault directory from Khoj Obsidian plugin
- Automatically configure Khoj to index only current Vault.
- Overwrites any previous vaults that were intended to be indexed by
Khoj backend
- Force update of index after configuring vault
- Why
It's not helpful for now and can lead to more problems, confusion.
Once directory filters
- Previously the backend was just throwing backend error.
The frontend calling the /update API wasn't getting notified
- Now the frontend can react appropriately and make the issue
visible to the user
- Only show notification on plugin load and failure.
- In settings page, set current backend status at top of pane instead
of showing notification
Notices bubbles cluttered the UI while typing updates to settings
- Show notification once index updated via settings pane button click
There was no notification on index updated, which usually takes time
on the backend
- Display warning at top of khoj obsidian plugin settings
- Make search command available only if connected to backend
- Show warning notice on clicking khoj search ribbon button
- Call saveData after configureKhojBackend to ensure
connnectedToBackend setting saved after being (potentially) updated
in configureKhojBackend function
- Previously the plugin would not load if cannot connect to Khoj backend
- Silently failing to load with no reason provided is not helpful
- Load plugin to allow user to fix the Khoj URL in their plugin setting
- Show reason for khoj plugin not working. More helpful than failing silently
Use the timer context manager in all places where code was being timed
- Benefits
- Deduplicate timing code scattered across codebase.
- Provides single place to manage perf timing code
- Use consistent timing log patterns
The query method had become too big.
Extract out filter, score, sort and deduplicate logic used by
text_search.query into separate methods.
This should improve readabilty of code.
- Changes
- Fix method signatures of BaseFilter subclasses.
Else typing information isn't translating to them
- Explicitly pass `entries: list[Entry]' as arg to `load' method
- Fix type of `raw_entries' arg to `apply' method
to list[Entry] from list[str]
- Rename `raw_entries' arg to `apply' method to `entries'
- Fix `raw_query' arg used in `apply' method of subclasses to `query'
- Set type of entries, corpus_embeddings in TextSearchModel
- Verification
Ran `mypy --config-file .mypy.ini src' to verify typing
- `torch.Tensor' is apparently a legacy tensor constructor
- Using that to create tensor on MPS devices throws error:
RuntimeError: legacy constructor expects device type: cpu but device type: mps was passed
- `torch.tensor' can handle creating tensors on Mac GPU (MPS) fine
This is unlike the more general chat API that combines summarization
of top search result and conversing with the OpenAI model
This should give faster summary results. As no intent categorization
API call required
- Use latest davinci model for tests
- Wrap prompt in triple quotes to improve legibilty
- `understand' method returns dictionary instead of string. Fix its test
- Fix prompt for new model to pass `chat_with_history' test
- Default to using `text-davinci-003' if conversation model not
explicitly configured by user. Stop using the older `davinci' and
`davinci-instruct' models
- Use `model' instead of `engine' as parameter.
Usage of `engine' parameter in OpenAI API is deprecated
- Init processor before search to instantiate `openai_api_key'
from `khoj.yml'. The key is used to configure search with openai models
- To use OpenAI models for search in Khoj
- Set `encoder' to name of an OpenAI model. E.g text-embedding-ada-002
- Set `encoder-type' in `khoj.yml' to `src.utils.models.OpenAI'
- Set `model-directory' to `null', as online model cannot be stored on disk
Long words (>500 characters) provide less useful context to models.
Dropping very long words allow models to create better embeddings by
passing more of the useful context from the entry to the model
- Previously `model_type' was set in the setup of each `search_type'
- All encoders were of type `SentenceTransformer'
- All cross_encoders were of type `CrossEncoder'
- Now `encoder-type' can be configured via the new `encoder_type' field
in `TextSearchConfig' under `search-type` in `khoj.yml`.
- All the specified `encoder-type' class needs is an `encode' method
that takes entries and returns embedding vectors
- Ensure all tensors are on MPS device before doing operations across them
- Background
- GPU is used by default for Khoj on MacOS now
- Needed PyTorch > 1.13.0 on Macs to use GPU, which we do now
- MPS should speed up search and indexing on MacOS
Fix usage warning for unescaped single quote in `khoj.el' docstring.
Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs
⛔ Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
- Features
- Search using Khoj from within the Obsidian app
Allow Natural language search on your (markdown) notes in Obsidian Vault
- Show search results as rendered (instead of raw) Markdown
Improve legibility of the results
- Jump to selected note from search result in Khoj search modal
Simplify seeing result within its original note context
- Automatically configure khoj to index markdown files in current vault
Reduce khoj setup steps for plugin users by using reasonable defaults
- Code updates the markdown config in khoj.yml and triggers index update
- It can be configured by user in khoj plugin settings, if required
- Add Demo and detailed Readme for the Obsidian plugin
Ease setup and usage. Give context about capabilities
- Miscellaneous
- Trying keep a mono repo until the Khoj project is mature enough
to reduce maintainance burden
This can ease configuring khoj from the different interfaces
- Don't need to know all the (default) config used by khoj.
- Just get default config by calling the above API endpoint.
- Then modify desired portions and call POST /api/config/data to
configure khoj.
- Start khoj server (in non-GUI mode) without needing config file
already instantiated.
- But throw warning to configure khoj to use it
- This allows plugins to configure the app via the /config/data APIs
- To be used by the Khoj obsidian plugin to configure markdown content
in khoj
- Poll scheduler every minute using threading.Timer
- Use 60 seconds polling interval to avoid fork bombing
- Schedule next via the same poll scheduler
- Allow clean program interrupt by running scheduler in daemon mode
- There are 3 paths to updating/setting the index (stored in state.model)
- App start
- API
- Scheduler
- Put all updates to the index behind a lock. As multiple updates path
that could (potentially) run at the same time (via API or Scheduler)
- Remove property drawer from test entry for max_words splitting test
- Property drawer is not required for the test
- Keep minimal test case to reduce chance for confusion
- Required because entries are now split by the max_word count supported
by the ML models
- This would now result in potentially duplicate hits, entries being
returned to user
- Do deduplication after ranking to get the top ranked deduplicated
results