* Use separate functions for adding files and folders to configuration for indexing
* Add a loading bar while data is syncing
* Bump the minor version for the application
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes
* Add support for configuring/using offline chat from within Obsidian
* Fix type checking for search type
* If Github is not configured, /update call should fail
* Fix regenerate tests same as the update ones
* Update help text for offline chat in obsidian
* Update relevant description for Khoj settings in Obsidian
* Simplify configuration logic and use smarter defaults
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
* Add docs for more organized, accessible information detailing Khoj setup
* Delete duplicated files
* Add a coverpage without enabling it. Add logo and theme
* Remove obsidian README.md
* Add plausible script to index.html via docsify
* Update the /chat endpoint to conditionally support streaming
- If streams are enabled, return the threadgenerator as it does currently
- If stream is disabled, return a JSON response with the response/compiled references separated out
- Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian
- Rename chat/init/ to chat/history
* Update khoj.el to use the /history endpoint
- Update corresponding unit tests to use stream=true
* Remove & from call to /chat for obsidian
* Abstract functions out into a helpers.py file and clean up some of the error-catching
- What
- Stream chat responses from OpenAI API to Web, Obsidian clients
- Implement using a callback function which manages a queue where new tokens can be placed as they come on. As the thread is read from, tokens are removed.
- When the final token has been processed, add the `compiled_references` to the queue to be rendered by the `chat` client
- When the thread has been closed, save the accumulated conversation log in the user's history using a `partial func`
- Incrementally decode tokens on the front end and add them as they appear from the streamed response
- Why
This significantly reduces perceived latency and OpenAI API request timeouts for Chat
Closes https://github.com/khoj-ai/khoj/issues/257
- I needed to installed node-fetch to accomplish this, as the built-in request object from Obsidian doesn't seem to support streaming and the built-in fetch object is very sensitive to any and all cross origin requests
Removing unused content types will reduce khoj code to manage
- 0f993b3 Drop support for Ledger as a separate content type
Khoj will soon get a generic text indexing content type in Index plain text files #237.
This along with a file filter should suffice for searching through Ledger transactions
- c9db532 Remove unused org-music as an indexable content type from Khoj
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.
Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Cleaning up that code will reduce number of content types for khoj to
manage.
- Previously Khoj could only support Python upto 3.10 due to pytorch.
But lots of folks had python 3.11 installed by default on their machines.
This required installing python 3.10 and dealing with virtual envs.
With Torch >= 2.0.1 now able to support python 3.11, at least one
class of installation troubles for Khoj should drop. See
https://github.com/pytorch/pytorch/issues/86566 for reference
- Preliminary testing indicates using the new torch 2.x may reduce
search time by 25% (from 80ms to 60ms on Mac M1)
- Update Docs to not require mentioning python <=3.10 required
- Update Github test workflow to run khoj tests with python 3.11 too
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.
This results in search across all enabled asymmetric search text
content types
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
- Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
- cl-push expects a generatlized variable. Else throws (setf quote)
undefined warning
- This results in the config call failing on calling khoj entrypoint
- Remove waiting for server message as it hides the messages from the
server
- Fix the nil message that were being rendered, by checking before
showing messages from server
- Consistently prefix messages from khoj with khoj.el
Previously khoj.el was calling the server configure API even when
config was same as before.
This had broken the khoj search as you type experience from emacs
Also show more details to user about what in khoj is being configured
Resolves#185, #199
- Issue
IndexName created from Obsidian Absolute Vault path wasn't replacing
windows path, drive separators with underscore. It was only
replacing unix path separators
- Fix
Also replace windows drive and path separators with _ while creating
IndexName in Khoj Obsidian plugin
Makes it easier to tell pip associated with which python is being
used. Easier to debug when users have different versions of python
installed (e.g 3.10 and 3.11)
This follows expected behavior for obsidain search modals
E.g Ominsearch and default Obsidian search.
The note creation code is borrowed from Omnisearch.
Resolves#133
- Give space in the input field. Too narrow previously
- References should be indexed from 1 instead of 0
- Use Obsidian font size variables to scale fonts in chat appropriately
- Add message sender, date metadata as message footer
- Use css directly from Khoj Chat Web Interface.
- Modify it to work under a Obsidian modal
- So replace html, body styling from web interface to instead
styling new "khoj-chat" class attached to contentEl of modal
Converts paths to glob style regexes that will index all org files
recursively under the specified list of path
Should help setup for org-roam users from khoj.el
- khoj-auto-setup controls whether to automatically check for and
setup khoj server from within Emacs
- extract install, start, configure sequence into public, interactive
method. Allows calling khoj-setup during package load via init.el
- Fix: Do not attempt to configure or wait for server ready if
user has said no to auto-setup request
- Fix logic to mark server started vs ready
- Previously the started/running vs ready variables defs were getting
intertwined
- Server started indicates server bootup has been triggered
- Server ready indicates server API ready to accept requests
- If khoj server started outside emacs, khoj--server-ready should be set
to true by khoj--server-running method (instead of waiting for proc msg)
- If khoj server is unconfigured the /config/types endpoint wouldn't
return anything. Using config/data/default allows checking khoj server
running status without requiring it to be configured as well
If the config hasn't changed there'll be no update. If config has
changed indexing will get triggered asynchronously. But user cannot
make query till indexing done
As easier to know when server ready to configure
- Use process filter, sentinel to mark when khoj server is ready or not
- Display server messages for visibility into server boot-up process
- Wait until server ready to open khoj transient menu in Emacs
Until then khoj features wouldn't work anyway, so avoids confusion
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
- Add continuation suffix, ..., when reference definition truncated
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
- Improves color coding for readability
- Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
- Generalize the render-chat-response method to handle rendering
history or chat response from chat API reponse
- Trigger rendering of khoj chat history if Khoj chat buffer not
created for this session yet
- Use org-insert-link method to improve link rendering robustness
Previous simple mechanism to crete org-links would result in links
escaping out of formating. Use a user-facing org-mode method to
remove/reduce probability of this
- Replace newlines with space to render reference notes as links
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
- [sender-name]: *[message]*
- /[receive-date]/
- Add references as org links with context visible on hover,
but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
Use `-map-indexed' method from dash
- Issue
The file path separator by khoj server and the Obsidian vault were
different on Windows
- Fix
Normalize file path to use forward slash(/) to find the matching
note file in the Obsidian vault for jump to it
Resolves#177
- Remove need for interfaces to downcase content types returned by API
before using the type in search and other API endpoint
- Fix to check for search_type.name in plugin keys instead of value
- What
- The Emacs and Obsidian interfaces stay in their original
directories under src/
- src/khoj now only contains code meant for pypi packaging
- Benefits
- This avoids having to update khoj MELPA, Obsidian plugin config as
the Emacs, Obsidian code is under their original directories
- It separates the code in src/khoj meant for python packaging from
code for external interfaces like Emacs and Obsidian
- Why
The khoj pypi packages should be installed in `khoj' directory.
Previously it was being installed into `src' directory, which is a
generic top level directory name that is discouraged from being used
- Changes
- move src/* to src/khoj/*
- update `setup.py' to `find_packages' in `src' instead of project root
- rename imports to form `from khoj.*' in complete project
- update `constants.web_directory' path to use `khoj' directory
- rename root logger to `khoj' in `main.py'
- fix image_search tests to use the newly rename `khoj' logger
- update config, docs, workflows to reference new path `src/khoj'
- By default the obsidian plugin automatically configures the khoj
backend to index the current vault
- For more complex scenarios, users can manage their ~/.khoj/khoj.yml
manually by toggling the auto-configure setting off in the khoj
plugin settings
Resolves#156
Khoj plugin page from within Obsidian isn't recognized. Seems like it
needs an uppercase readme file only. So it doesn't show the Khoj
readme from within Obsidian itself.
- Update khoj.el test to reflect updated rendering logic
- Move ledger render function before image rendered to group functions
with similar logic closer
Split find file, jump to file code to make onChooseSuggestion more readable
- Use find, instead of using return in forEach to get first match
- Move the jump to file+heading code out from forEach
Do not reference global app object from child objects and funcs
directly.
It is only available for debugging purposes and access to it maybe
dropped in the future.
- Support querying with text surrounding point in any text buffer
Previously could only find items similar to org entry at point
- Find similar items of specified content type indexed on khoj
Previously only looked for similar org entries indexed on khoj
Now uses the content-type configured in khoj transient menu to find
items of the specified content type
- Details
- Generalize the get-current-org-entry-text func to get text for any
outline section
- Replace leading whitespaces from query text as well
- Create method to get current paragraph text from non-outline mode
buffers
- Update transient, find-similar funcs to pass, use content-type
configured in khoj transient menu
- Generalize query title creation logic to remove markdown headings
prefix (#) apart from org heading prefix (*) as well
- Update last used khoj content-type and results from the
find-similar and update funcs for later reuse
- Jump to top of results buffer after results rendered
Enable searching for notes similar to the current note being viewed
## Main Changes
- 39a18e2 Extend search modal to search for similar notes
- Hide input field on init, Trigger search on opening modal when in similar notes mode
- Set input to contents of current markdown file and get notes similar to it
- Re-rank, by default, when searching for similar notes
- Filter out current note from similar note search results
- 0bed410 Only show `Find Similar Note' command in Editor
- Hide input field on init, Trigger search on opening modal in similar notes mode
- Set input to current markdown file and get similar notes to it
- Enable rerank when searching for similar notes
- Filter out current note from similar note search results
- Screenshot querying "Setup Editor" on test vault with Khoj Readmes
- New features showcase:
- information keybindings, rerank keybinding at bottom of modal
- fixed top level headings in search results
- search results snipped if greater than N words
Provides a more consistent rendering of results in modal.
Makes it easier to see more results in modal.
To see complete entry, user can always just jump to entry from modal
### Overview
- Provide a chat interface to engage with and inquire your notes
- Simplify interacting with the beta `chat` and `summarize` APIs
### Use
- Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize
- Type your queries, see summarized response by Khoj from your notes
**Note**:
- **You will need to add an API key from OpenAI to your khoj.yml**
- **Your query and top note from search result will be sent to OpenAI for processing**
## Details
- 177756b Show chat history on loading chat page on web interface
- d8ee0f0 Save chat history to disk for persistence, seeing chat logs
- 5294693 Style chat messages as speech bubbles
- d170747 Add khoj web interface and chat styling to new chat page on khoj web
- de6c146 Implement functional, unstyled chat page for khoj web interface
- Wrap messages into speech bubbles
- Color messages by khoj blue, sender grey
- Add those standard protrusions to the speech bubbles for fun
- Align bubbles left or right based on sender
- messages by khoj are left aligned, message by self are right aligned
- Put message metadata like sender and time under speech bubble
- use data-* attribute and ::after css pseudo-selector for this
- Update renderMessage func to accept time param, remove unused type_ param
- Changes
- Use blue color for khoj heading font
- This fixes the title color issue
- Update background to lighter shade
- This fixes the body text color issue
- Update colors for todo, done, miscellaneous todo state, tag color
- This does not fix the color contrast issue but seems like an acceptable solution
- Using white text rather than black text on blue background
better even though the black text on blue background passes the
WCAG acceptable contrast score
- For details see blog post:
https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/
- Add border to tags to give them tag pills look and differntiate
from todo states
- Buttons and inputs
- Change background color of input fields like type dropdown,
update button and results count counter, to match background
color of page
- Add shadow on hover over button, dropdowns
Resolves#111
- Ensure message input box sticks to bottom of screen
- Ensure chat logs div is scrollable when logs become longer than screen
Do not make the whole page scroll, just the chat logs body div
Uses longest file path match to find markdown file in vault
corresponding to file of search result returned by Khoj
Allow jumping to search result from khoj plugin modal on Android too
Previous mechanism of manually triggering getSuggestions,
renderSuggestions flow was corrupting traversing and opening
reranked search results in KhojModal
Emulate event that would anyway trigger the get & render of results in
modal. This lets obsidian core handle the flow without digging too
deep into obsidian cores handling of the flow. Lowers the chance of
breakage
We need the index file paths to make sense on the khoj backend server
Having path of index on backend relative to current vault directory
on frontend ignores the fact that the frontend maybe on a different
machine than the khoj backend server
Using unique index name per vault allows switching vaults without
overwriting indices of other vaults created on khoj backend when khoj
obsidian plugin is loaded on opening a different vault
- Overview
Limits using Khoj with a single vault at a time. This is
automatically configured to the most recently opened vault.
Once directory filters are supported on backend, the plugin will be
updated to index multiple vault but search only current vault from
current vaults khoj obsidian plugin
- Code Details
- Remove setting to configure Vault directory from Khoj Obsidian plugin
- Automatically configure Khoj to index only current Vault.
- Overwrites any previous vaults that were intended to be indexed by
Khoj backend
- Force update of index after configuring vault
- Why
It's not helpful for now and can lead to more problems, confusion.
Once directory filters
- Only show notification on plugin load and failure.
- In settings page, set current backend status at top of pane instead
of showing notification
Notices bubbles cluttered the UI while typing updates to settings
- Show notification once index updated via settings pane button click
There was no notification on index updated, which usually takes time
on the backend
- Display warning at top of khoj obsidian plugin settings
- Make search command available only if connected to backend
- Show warning notice on clicking khoj search ribbon button
- Call saveData after configureKhojBackend to ensure
connnectedToBackend setting saved after being (potentially) updated
in configureKhojBackend function
- Previously the plugin would not load if cannot connect to Khoj backend
- Silently failing to load with no reason provided is not helpful
- Load plugin to allow user to fix the Khoj URL in their plugin setting
- Show reason for khoj plugin not working. More helpful than failing silently
Fix usage warning for unescaped single quote in `khoj.el' docstring.
Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs
⛔ Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
- Features
- Search using Khoj from within the Obsidian app
Allow Natural language search on your (markdown) notes in Obsidian Vault
- Show search results as rendered (instead of raw) Markdown
Improve legibility of the results
- Jump to selected note from search result in Khoj search modal
Simplify seeing result within its original note context
- Automatically configure khoj to index markdown files in current vault
Reduce khoj setup steps for plugin users by using reasonable defaults
- Code updates the markdown config in khoj.yml and triggers index update
- It can be configured by user in khoj plugin settings, if required
- Add Demo and detailed Readme for the Obsidian plugin
Ease setup and usage. Give context about capabilities
- Miscellaneous
- Trying keep a mono repo until the Khoj project is mature enough
to reduce maintainance burden
- The instructions suggest installing khoj-assistant via pip install.
This installs the latest tagged/release version of khoj
- To match that version user should install khoj.el from MELPA stable
instead of MELPA
Update readme to ask user to install khoj.el from MELPA when a
pre-release version of the main khoj app is installed. Else install
khoj.el from MELPA Stable
- Reason
- All clients that currently consume the API are part of Khoj
- Any breaking API changes will be fixed in clients immediately
- So decoupling client from API is not required
- This removes the burden of maintaining muliple versions of the API
- Split router.py into v1.0, beta and frontend (no-prefix) api modules
under new router package. Version tag in main.py via prefix
- Update frontends to use the versioned api endpoints
- Update tests to work with versioned api endpoints
- Update docs to mentioned, reference only versioned api endpoints
In my installation, it appears that `url-request-method` is sometimes set
globally to POST. Need to explicitly set it to ensure that GET is always
used as intended.
- Simplify tracking khoj query history, saving/sharing links
- Do not execute search, when query only contains whitespaces
- Prevents error when try process results of empty query
- As `/reload` updates index incrementally, it's relatively quick
- This makes exposing `/reload` endpoint a better default to expose
via the web interface than `the /regenerate' endpoint
- Default config has `input_files' set to None
- This was being passed to `FileBrowser' on Initialization
- But `FileBrowser' expects `content_files' of list type, not None
- This resulted in an unexpected NoneType failure
- This also pushes the updated URL state to history
- Allows jumping back to the web interface after clicking on an image
and having the type set to image search
- Previously type would get reset to the default search type on
jumping back
- When no file selected in file browser an empty line/entry gets added
to input entries list
- Bug got introduced due to insufficient update on change to add
instead of insert
- Update is_none_or_empty helper method to also check for empty string
- Allows adding multiple image directories via GUI
- Allow adding multiple files in different directories via GUI
- Previously users couldn't add multiple directories via GUI
They'd have to manually append to input field if multiple files, directories
- To clear/overwrite is much easier.
The user can just select text to delete in input area
- Issue
Fix configuring image search from Desktop GUI. It was broken before.
The Desktop GUI was updating input-files field under content-type > image.
This field is not used for image search. So image search couldn't be
configured from the Desktop GUI
- Fix
- Set input-directories when field of search type image is set from GUI
- Otherwise set input-files field in config
- Show a helpful error message in the GUI to the user, instead of the
crashing if loading config fails, for e.g if file wasn't found
- Collate GUI errors into an ErrorType enum class
- Remove previous error messages before showing the new one
Previously if the embeddings were already there only the khoj.yml
config file would get updated. The embeddings would remain old.
1. This results in a stale app state where the config doesn't
match the embeddings
2. Currently the user cannot update their config from the config
screen. They'd have to use a combination of config screen and web
interface>regenerate button to trigger it or delete their ~/.khoj dir
This commit should resolve the above issues
- Prevent immediate overwrite of re-ranked results by
incremental-search without rerank triggered via post-command-hook.
- This triggers right after the reranking results are rendered, so
user never ends up seeing them
- Add docstrings, mention args in them. Make docstring crisper
- prefix funcs, variables with khoj--
- Require emacs >27.1 for json-parse-buffer
- Use lexical binding
- Add quickstart docs to elisp file itself
- Bump version of khoj.el
- What
- Convert the config screen into the main application window
with configuration as just one of the functionality it provides
- Rename config screen to main window to match new designation
- Why
- System Tray isn't available everywhere (e.g Linux)
- This requires moving functionality into a normal window for cross-compat
- Start evolving configure screen away from just being a configure screen
- Update Window Title to just say Khoj
- Allow Opening Web Interface to Search from Khoj configure screen
- Rename "Start" Button to more accurate "Configure"
- Disable Search button on first run and while configuring app
- Issue
- In the previous form, updates to self.current_config would update
default_config as python does a shallow copy
- So self.current_config is just referencing the values of default_config
- Hence updates to current_config updates the default_config values too
- This is not what we want
- Fix
- Deep copy the default_config values. Now updates to
self.current_config wouldn't affect the default_config
- Generating embeddings takes time
- If user enables a content type and clicks start.
The app starts to generate embeddings when loading the new settings
- Run this function in a separate thread to keep config screen responsive
- But disable start button to prevent re-entrant threads
- Also show a minimal visual indication that the app is saving state
- Convert Input Fields into PlainTextEdit
- Display Each Selected File on a Separate Line in Field
- Set Height of FileBrowser Input Field based on Number of Lines/Files
- This fixes the field expanding when configure screen is expanded
- Allows for reusability of the labelled text field
- Simplifies the logic to save settings for conversation processor
- Avoid having to pass the khoj_sample.yml data file into pip, native apps
- Packaging data files into python packages is annoying.
- There's `MANIFEST.in`, `data_files` and `package_data` in setup.py
- Bdist, wheel, generated source tarball use different set of these fields
and put the data files in different locations
- Rather just code the default config into a constant. Avoid
pointless file reads as well this way
- Include khoj_sample.yml in pip package to load default config from
- Create khoj config directory if it doesn't exist
- Load config from khoj_sample.yml if khoj.yml config doesn't exist
Conflicts:
- src/main.py
- router functions have moved to router
- move logic to handle null query perf timer variables into
router.py
- set main.py to current branch, not master
- If a user manually edits the input file lines, clicking start should
use that. Currently it just looks at the files selected last via file
browser
- We want to allow users to manually enter file paths in field. Which
is why the field hasn't been set to read-only
- Track current (saved/loaded) config separate from the new config (to
be written) when user clicks Start
- Fallback to using default config when no config for the specific
content type or processor is specified in khoj.yml
- Earlier were only loading default config on first run, not after
- Create Child CheckBox, LineEdit classes for Processor Widgets
- Create ProcessorType, similar to SearchType
- Track ProcessorType the widgets are associated with
- Simplify update, save, load of config based on type
- Update configuration to use by the backend, while app is running
- Trigger after user hits start button with their config.
The config gets written to khoj.yml file first, then the updated
config is loaded onto memory
- Results get priority screen real estate
- Allows quick speed key based traversal of results as cursor
on switching to buffer is at top level heading
- E.g C-x o n n o 2 jumps to entry in actual file of second result
- Unlike before when it is at the #+STARTUP org buffer customization
settings
- Only allow adding files with appropriate file extension for each search type
- e.g .org for org-mode search, directory for image search
- Extract file paths added to config and enablement state of each search type
- This extracted state will be used to populate the khoj.yml config file
- Simplifies the configure screen layout and allows it to be of constant width
- It was buggy, the configure screen would dynamically expand but not
restore back to original size on disabling search type after enable
- Follow convention, two hyphens indicate variable private to library
- Defcustom are user configurable variables. So they should have single -
- Use khoj-results-count variable directly in code
- Fix regression since moving to use `which-key-show-full-keymap~
- The above function reads user keypress, so eats up 1 keypress
before starting to enter query
- No way to pass no-paging config via the external function to the
internally used which-key--show-keymap function that does allow
setting no-paging to not read user keypress
- So use the internal function instead and set no-paging arg to t
- The keybindings to select search types was previously confusing as
it only highlighted the final symbol to press (the C-x was shown but
it wasn't made apparent that it had to be pressed before)
- Previously some keybindings unrelated to khoj were also being shown
in the which-key popup. Now only the khoj keybindings are visible
- More generally, this allows configuring the khoj search anytime
while in khoj minibuffer window
- Earlier could only configure search type at the start of the search
- What
- Default to last used search type, when no search type specified
- Allow user to change search type before they enter query (and
after they've called khoj), if they want
- Why
- Reduce time from intent to results by using reasonable defaults
- Make interactions smoother, more intuitive
- Do not want browsers to use the small, grainy favicons
- Firefox for Android does use the bigger icon, when it's the only one available
- Update svg to match the 144x144 ratio just for consistency
Currently only get into this state when debug breakpoints on backend
are keeping the connection open and user exits khoj search from Emacs
Results in a number of open connections that slow khoj down.
- Makes it easier to fold/unfold, traverse and read results
- This 2 level nesting is already being used on the web interface
- Previously we were using the original nesting depth of the entry.
This was aimed at providing more of the orginal context of the
results. But currently this additional information does not provide
as much, for the decreased legibility of the results
- Improve code layout by ensuring all web interface specific code
under the src/interface/web directory
- Rename config API to more specifi /config instead of /ui
- Rename config data GET, POST api to /config/data instead of /config
- Previously we were statically populating types dropdown field in the web interface with all available search types
- This change populates the type dropdown field with only search types that are enabled/configured
- It queries the `/config` backend API to see which of the available search types are configured
- Populate via `.then` after enabled search types in dropdown are
populated
- Call to `/config` API is async and will usually complete after the value of type field is set from url
- So value of type field would earlier be overridden when search types
dropdown is populated after the call to `/config` API completes
- Get /config API and check config for which available search types is
populated. This gives us the list of enabled search types
- Dynamically populate search type field with enabled search types only
- Setting query value to default option when query param wasn't
passed via URL was overriding placeholder text in query field
- We wanted placeholder text in field, not the query field to actually
be populated by placeholder text
- This clears field when user starts typing query into the query field,
instead of them having to manually delete the default text populated
Having org-mode result headings change size based on their depth in
the source document makes is a confusing UI experience.
Improve font-size, line-spacing and margins of results to make
delineation between entries, and differntiating between entry heading
and it's body easier to visually infer.
Do not white-space: pre-line. Improves rendering of Markdown results
## Support Incremental Search on Khoj Web Interface
- Use default, fast path to query /search API while user is typing
- Upgrade to cross-encoder re-ranked results once user hits enter on search box
## Improve Render of Org Results on Web Interface
- We were previously just wrapping results from /search API into a pre formatted div field. This was not easy to read
- Use [org.js](https://mooz.github.io/org-js/) to render results from Khoj `/search` API as proper HTML
- Improve org.js to render all task states, stylize task tags and make org-mode results look more like original content
Closes#42#41
In current state:
- Rerank results:
- If user idles while entering query OR
- exits normally
- Do not rerank results:
- If user exits abnormally, e.g via C-g from query
- Rename functions to more standard, descriptive names
- Keep known, required code for incremental search
- E.g Do not set buffer local flag in hooks on minibuffer setup
- Only query when user in khoj minibuffer
- Use active-minibuffer-window and track khoj minibuffer
- (minibuffer-prompt) is not useful for our use-case here
- (For now) Run re-rank only if user idle while querying
- Do not run rerank on teardown/completion
- The reranking lag (~2s) is annoying; hit enter,
wait to see results
- Also triggered when user exits abnormally,
so C-g also results in rerank which is even more annoying
- Emacs will still hang if re-ranking gets triggered on idle but
that's better than always getting triggered. And better than not
having mechanism to get results re-ranked via cross-encoder at all
- Update khoj-simple to work cross-encoder re-ranked results like before
- Increment major version as incremental search considered a breaking
change and a major update to search capability
- Reason:
Allow natural search on markdown based notes, documentation,
websites etc
- Details:
- Create markdown processor to extract Markdown entries (identified by
Heading) into standard jsonl format required by text_search
- Update API, Configs to support interfacing with new markdown type
- Update Emacs, Web clients to support interfacing with new markdown
type via API
- Update Readme to mentiond markdown is also supported
Closes#35
- Had already made some progress on this earlier by updating the image
search responses. But needed to update the text search responses to
use lowercase entry and score
- Update khoj.el to consume the updated json response keys for text
search
- Avoids having to click the query input box
- Just open page, type whatever and hit enter to do image search
- For other search types select appropriate type from dropdown
- Use shr to render image response from html in result buffer
Earlier was using org-mode. But rendering HTML with shr seems cleaner
- Use Headings to Add highlights
- Use Random to Force fetch of Image. Similar to what was done for Web interface
- Remove trailing elisp brackets from response
- Show query match scores by image model for each image in results
Adding a random, unused url param at the end of the img.src string
fixes the issue. As the browser thinks it's a new image and doesn't
use the image data that's already cached because of which it wasn't
even making the fetch call for the image
- Allow viewing image results returned by Semantic Search.
Until now there wasn't any interface within the app to view image
search results. For text results, we at least had the emacs interface
- This should help with debugging issues with image search too
For text the Swagger interface was good enough
- Add search query to top of buffer as Beancount comment
- Remove trailing ) from response
- Separate entries by empty line
- Load beancount-mode in semantic search on ledger buffer
- Previously:
The text the model was trained on was being used to
re-create a semblance of the original org-mode entry.
- Now:
- Store raw entry as another key:value in each entry json too
Only return actual raw org entries in results
But create embeddings like before
- Also add link to entry in file:<filename>::<line_number> form
in property drawer of returned results
This can be used to jump to actual entry in it's original file