- The reload API adds the ability to separate out the loading of
embeddings from file without having to restart app or (re-)generate embeddings
- Before this the only way to load model from file was by restarting app
- The other way to reload the model embeddings by regenerating them
was to expensive for larger datasets
- This unlocks at least 1 use-case, where
- we regenerate model via an app instance running on a separate server and
- just reload the generated embeddings on the client device
- This allows us to offload the expensive embedding generation
compute to a background server while letting
- This avoids having to (re-)restart application on client device or
be forced to generate embeddings on the client device itself
- But it requires the model relevant files to be synced to the client device
This can be done with any file syncing application like Syncthing
- We can then call /regenerate on server and /reload client on a
regular schedule to keep our data up to date on semantic search
- This is still clunky but it should be commitable
- General enough that it'll work even when a users notes are not in the home directory
- While solving for the special case where:
- Notes are being processed on a different machine and used on a different machine
- But the notes directory is in the same location relative to home on both the machines
- Use Set for Tags instead of dictionary with empty keys
- No Need to store First Tag separately
- Remove properties methods associated with storing first tag separately
- Simplify extraction of tags string in org_to_jsonl
- Split notes_string creation into multiple f-string in separate line
for code readability
- Now that excluding the times line from the raw body of node,
show it in repr so user can see it for reference
- But the model doesn't need to see it for it's embeddings to be
confused by
- Add links to property drawer
- This ensures results returned by semantic search contain these links
- This allows the user to jump to entry within original file for context
- The ID, file+heading based links are more robust to find relevant
entry in original file than the line no based link,
as edits being done by user to original files between embedding regenerations
Sentence Transformer MSMarco Model isn't date aware
So no use of adding scheduled, deadline dates to model embeddings for consideration
This reverts commit a2a08d1354.
- Introduce prompt for GPT to automatically extract user's search intent
- Expose new search api endpoint to use that to set SearchType being
passed to search API
- Currently meant as an experimental API to gauge usefulness,
extendability. Evaluating for phone or voice use-case
To prompt improve readability:
- Remove newline escape sequence and use actual newline directly
- This avoids one long line of text as prompt and
- Remove escaping of double quotes
- Add search query to top of buffer as Beancount comment
- Remove trailing ) from response
- Separate entries by empty line
- Load beancount-mode in semantic search on ledger buffer
- Fix loading entries from jsonl in extract_entries method
- Only extract Title from jsonl of each entry
This is the only thing written to the jsonl for symmetric ledger
- This fixes the trailing escape seq in loaded entries
- Remove the need for semantic-search.el response reader to do pointless complicated cleanup
- Make symmetric_ledger:extract_entries use beancount_to_jsonl:load_jsonl
Both methods were doing similar work
- Make load_jsonl handle loading entries from both gzip and uncompressed jsonl
- Keeps directory paths consistent between host and container volumes
- Consistency simplifies documentation and updates required to setup
sample_config.yml for local installation
- Put test data for each content type into separate directories
- Makes config.yml for docker and local host consistent
- Prepending tests to /data in sample_config.yml makes application
run on local host using test data
- Allows mounting separate volume for each content type in docker-compose
- Ignore gitignore to only add tests content, not generated models or embeddings