Commit graph

235 commits

Author SHA1 Message Date
Debanjum Singh Solanky
fb86be8cd9 Add ID, File+Heading based Links to Org-Mode Entries
- Add links to property drawer
- This ensures results returned by semantic search contain these links
- This allows the user to jump to entry within original file for context
- The ID, file+heading based links are more robust to find relevant
  entry in original file than the line no based link,
  as edits being done by user to original files between embedding regenerations
2022-06-17 03:11:11 +03:00
Debanjum Singh Solanky
de23fc2051 Revert Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search
Sentence Transformer MSMarco Model isn't date aware
So no use of adding scheduled, deadline dates to model embeddings for consideration

This reverts commit a2a08d1354.
2022-06-17 02:57:28 +03:00
Debanjum Singh Solanky
a2a08d1354 Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search 2022-06-17 02:55:27 +03:00
Debanjum Singh Solanky
cfbd5c4ecc Update global model on regenerate via API 2022-06-17 00:49:06 +03:00
Debanjum
35117af322
Show Demo of Semantic Search in Readme
Merge pull request #27 from debanjum/debanjum/add-demo
2022-05-14 01:32:18 -07:00
Debanjum Singh Solanky
2eab256af9 Delete markdown file. It helped upload the demo video to Github 2022-05-14 04:30:20 -04:00
Debanjum Singh Solanky
96c588b7bc Add demo of semantic search to repository 2022-05-14 04:29:25 -04:00
Debanjum
19f8f85333
Show Demo of Semantic Search in Readme
- Use Markdown file to help upload demo to Github
- Use generated link from upload into Readme org file
2022-05-14 01:29:13 -07:00
Debanjum Singh Solanky
031d6bddb4 Delete markdown file. It helped upload the demo video to Github 2022-05-14 04:25:17 -04:00
Debanjum Singh Solanky
c78bf84eef Introduce search api endpoint that auto infers search type intent
- Introduce prompt for GPT to automatically extract user's search intent
- Expose new search api endpoint to use that to set SearchType being
  passed to search API
- Currently meant as an experimental API to gauge usefulness,
  extendability. Evaluating for phone or voice use-case
2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
8ef7917014 Fix json format passed in prompt to GPT 2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
f57b7f65ea Wrap prompts for GPT in triple quotes to improve prompt readability
To prompt improve readability:
- Remove newline escape sequence and use actual newline directly
  - This avoids one long line of text as prompt and
- Remove escaping of double quotes
2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
1eba7b1c6f Use empty_escape_sequence constant to strip response text from gpt 2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
37bfc956c9 Update Readme Local Development Section 2022-02-27 23:16:58 -05:00
Debanjum Singh Solanky
1c3a1420f8 Update asymmetric extract_entries method to handle uncompressed jsonl
This is similar to what was done for the symmetric extract_entries
method earlier
2022-02-27 19:03:31 -05:00
Debanjum Singh Solanky
3d8a07f252 Extract empty line escape sequences var into constants file for reuse 2022-02-27 19:01:49 -05:00
Debanjum Singh Solanky
624a3faf92 Update Readme. Improve Organization, Reduce Staleness 2022-02-26 19:04:49 -05:00
Debanjum Singh Solanky
bb5d0d8908 Improve Semantic Search Buffer Names in Emacs
- Allow multiple semantic searches buffers to exist simultaneously
  - Uniquify semantic search buffer namew
- Add query and search-type to semantic search buffer name for easier
  disambiguration, search and find appropriate
2022-02-26 18:30:14 -05:00
Debanjum
6a84ca965a
Merge pull request #25 from debanjum/users/debanjum/improve-semantic-search-on-ledger
Improve Extraction and Rendering of Semantic Search on Ledger
2022-02-26 15:18:22 -08:00
Debanjum Singh Solanky
b68558651b Improve Extraction of Beancount Entries
- Only extract entries starting with YYYY-MM-DD from Beancount
- Strip Trailing Escape Sequences from Entries
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
b3ac2dd730 Improve Results Rendered on Emacs from Semantic Search on Ledger
- Add search query to top of buffer as Beancount comment
- Remove trailing ) from response
- Separate entries by empty line
- Load beancount-mode in semantic search on ledger buffer
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
502c68d4f8 Remove trailling escape sequence in ledger search response entries
- Fix loading entries from jsonl in extract_entries method
  - Only extract Title from jsonl of each entry
    This is the only thing written to the jsonl for symmetric ledger
  - This fixes the trailing escape seq in loaded entries
  - Remove the need for semantic-search.el response reader to do pointless complicated cleanup

- Make symmetric_ledger:extract_entries use beancount_to_jsonl:load_jsonl
  Both methods were doing similar work

- Make load_jsonl handle loading entries from both gzip and uncompressed jsonl
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
248aa632c0 Do not throw warning for beancount files with .beancount extension 2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
76cd63f4bd Fix count of processed jsonl entries shown to user by ledger processor
Count lines not chars
2022-02-26 17:46:06 -05:00
Debanjum Singh Solanky
f08591c880 Set PORT arg when building docker image in the build workflow 2022-01-29 18:11:47 -05:00
Debanjum Singh Solanky
359f25b0a4 Rename publish workflow to build. Add badge to the workflow on Readme 2022-01-29 18:11:47 -05:00
Debanjum Singh Solanky
4add348e3c Remove context from path to Dockerfile in Github build, push action 2022-01-29 17:16:12 -05:00
Debanjum Singh Solanky
859258864c Update Readme badge post rename of build.yml to test.yml 2022-01-29 17:10:43 -05:00
Debanjum Singh Solanky
fa685dc37f Create Github workflow to build, publish docker container to registry
- Rename the build workflow to test workflow
2022-01-29 17:08:19 -05:00
Debanjum Singh Solanky
78b76d65a0 Minor fix to notes jsonl file extension in sample_config.yml 2022-01-29 04:13:36 -05:00
Debanjum Singh Solanky
7c773d29ef Update github workflow to use environment.yml under config/ directory 2022-01-29 03:43:34 -05:00
Debanjum Singh Solanky
c31abad0a6 Mount embeddings to /data/embeddings for directory naming consistency
- Keeps directory paths consistent between host and container volumes
- Consistency simplifies documentation and updates required to setup
  sample_config.yml for local installation
2022-01-29 03:24:02 -05:00
Debanjum Singh Solanky
b0067fc32e Store docker, conda, semantic-search configuration in a config directory
- Improves organization of config files required for application
- Declutters the application root directory from configs
2022-01-29 02:41:11 -05:00
Debanjum Singh Solanky
79c2224eaa Improve test data organization and update correspoding conftests
- Put test data for each content type into separate directories
- Makes config.yml for docker and local host consistent
  - Prepending tests to /data in sample_config.yml makes application
    run on local host using test data
  - Allows mounting separate volume for each content type in docker-compose
- Ignore gitignore to only add tests content, not generated models or embeddings
2022-01-29 02:03:17 -05:00
Debanjum Singh Solanky
3e889760c7 Merge sample_config, docker_sample_config yml into a single sample_config.yml
- Update readme to indicate how to update the new sample_config to run on test data
2022-01-29 01:32:12 -05:00
Debanjum Singh Solanky
2bc2780501 Mention the experimental /chat API interacts with OpenAI's API 2022-01-29 00:11:40 -05:00
Debanjum Singh Solanky
6ed667aed0 Add Troubleshooting Section, Minor Fixes to Readme 2022-01-29 00:11:40 -05:00
Debanjum
d943d2be80
Merge pull request #21 from debanjum/saba/dockerize
Add Docker support to semantic-search
2022-01-28 20:27:40 -08:00
Saba
1ba7fa66e5 Update README and default folders in docker_sample_config.yml
- Add instruction to using Docker with README
- Use the ./tests/data folder in docker_sample_conifg.yml so it can work right away for users
2022-01-28 23:20:50 -05:00
Saba
52e701b3c2 Simplify Dockerfile by removing multibuild
- Install exiftool dependency directly in the miniconda image
2022-01-24 21:54:10 -05:00
Saba
33bc62dc19 Fix type of use_xmp_metadata to be bool, rather than str 2022-01-24 21:53:26 -05:00
Saba
9fb410fc25 Clean up docker_sample_config.yml
- Uncomment other search types
- Explain the file prefixes behavior and how it interfaces with the docker image
2022-01-24 14:11:38 -05:00
Saba
9802023c79 Clean up docker-compose
- Mount the local directory to /app
- Reformat the file paths to generically indicate what their purpose is
- Add comments to assist users who wasnt to modify properties themselves
2022-01-24 14:10:18 -05:00
Saba
4ae8c15170 Clean the Dockerfile
- Use /app as the working directory
- Clarify comment to explain why the ENTRYPOINT is constructed as it is
- Move explanations for the argument to docker-compose, where it's set
- Copy required artifacts from the first build image into the subsequent one (exiftool)
2022-01-24 14:08:55 -05:00
Saba
66d08ab5df Rename web to server in docker-compose.yml 2022-01-24 00:14:01 -05:00
Saba
77fa8718d9 Working example with docker-compose
Still need quite a bit of clean-up, but this adds a working docker-compose + Dockerfile setup
2022-01-23 23:44:38 -05:00
Saba
875188dc6f Initialize working on #20 to add Docker support
- Add a Dockerfile which uses an Ubuntu image to install relevant dependencies (exif) and uses a Miniconda image for setting up/reusing the conda environment
- Add a dummy docker-compose file
2022-01-23 14:57:28 -05:00
sabaimran
974690939c
Merge pull request #19 from debanjum/rename-config-types-for-consistency
Rename RawConfig Types for Consistency
2022-01-14 21:14:08 -05:00
Debanjum Singh Solanky
179153dc5a Rename RawConfig Types for Consistency
- Naming convention - [ContentType][ConfigType]Config
  - Where [ConfigType] ~ Content, Search, Processor
  - Where [ContentType] ~ Text, Image, Asymmetric, Symmetric, Conversation

- Current Configs:
  - Content:
    - Org Notes
    - Org Music
    - Image
    - Ledger/Beancount

  - Search:
     - Asymmetric
     - Symmetric
     - Image

  - Processor:
    - Conversation
2022-01-14 20:54:38 -05:00
Debanjum
ed7c2901f5
Merge pull request #18 from debanjum/deb/save-models-to-disk-on-first-run
Save Search Models to Disk on First Run

## Why
  - Improve application startup time
  - Startup application and perform semantic search even if user offline
  - Use search model config in YAML file for all search types (asymmetric, symmetric, image)

## Details
  - Load search models from disk when available
  - Use search model config specified in YAML file
  - Add search config for Symmetric Search used by Ledger/Beancount transaction search
2022-01-14 17:30:46 -08:00