Commit graph

2325 commits

Author SHA1 Message Date
Debanjum Singh Solanky
8ef7917014 Fix json format passed in prompt to GPT 2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
f57b7f65ea Wrap prompts for GPT in triple quotes to improve prompt readability
To prompt improve readability:
- Remove newline escape sequence and use actual newline directly
  - This avoids one long line of text as prompt and
- Remove escaping of double quotes
2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
1eba7b1c6f Use empty_escape_sequence constant to strip response text from gpt 2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
37bfc956c9 Update Readme Local Development Section 2022-02-27 23:16:58 -05:00
Debanjum Singh Solanky
1c3a1420f8 Update asymmetric extract_entries method to handle uncompressed jsonl
This is similar to what was done for the symmetric extract_entries
method earlier
2022-02-27 19:03:31 -05:00
Debanjum Singh Solanky
3d8a07f252 Extract empty line escape sequences var into constants file for reuse 2022-02-27 19:01:49 -05:00
Debanjum Singh Solanky
624a3faf92 Update Readme. Improve Organization, Reduce Staleness 2022-02-26 19:04:49 -05:00
Debanjum Singh Solanky
bb5d0d8908 Improve Semantic Search Buffer Names in Emacs
- Allow multiple semantic searches buffers to exist simultaneously
  - Uniquify semantic search buffer namew
- Add query and search-type to semantic search buffer name for easier
  disambiguration, search and find appropriate
2022-02-26 18:30:14 -05:00
Debanjum
6a84ca965a
Merge pull request #25 from debanjum/users/debanjum/improve-semantic-search-on-ledger
Improve Extraction and Rendering of Semantic Search on Ledger
2022-02-26 15:18:22 -08:00
Debanjum Singh Solanky
b68558651b Improve Extraction of Beancount Entries
- Only extract entries starting with YYYY-MM-DD from Beancount
- Strip Trailing Escape Sequences from Entries
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
b3ac2dd730 Improve Results Rendered on Emacs from Semantic Search on Ledger
- Add search query to top of buffer as Beancount comment
- Remove trailing ) from response
- Separate entries by empty line
- Load beancount-mode in semantic search on ledger buffer
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
502c68d4f8 Remove trailling escape sequence in ledger search response entries
- Fix loading entries from jsonl in extract_entries method
  - Only extract Title from jsonl of each entry
    This is the only thing written to the jsonl for symmetric ledger
  - This fixes the trailing escape seq in loaded entries
  - Remove the need for semantic-search.el response reader to do pointless complicated cleanup

- Make symmetric_ledger:extract_entries use beancount_to_jsonl:load_jsonl
  Both methods were doing similar work

- Make load_jsonl handle loading entries from both gzip and uncompressed jsonl
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
248aa632c0 Do not throw warning for beancount files with .beancount extension 2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
76cd63f4bd Fix count of processed jsonl entries shown to user by ledger processor
Count lines not chars
2022-02-26 17:46:06 -05:00
Debanjum Singh Solanky
f08591c880 Set PORT arg when building docker image in the build workflow 2022-01-29 18:11:47 -05:00
Debanjum Singh Solanky
359f25b0a4 Rename publish workflow to build. Add badge to the workflow on Readme 2022-01-29 18:11:47 -05:00
Debanjum Singh Solanky
4add348e3c Remove context from path to Dockerfile in Github build, push action 2022-01-29 17:16:12 -05:00
Debanjum Singh Solanky
859258864c Update Readme badge post rename of build.yml to test.yml 2022-01-29 17:10:43 -05:00
Debanjum Singh Solanky
fa685dc37f Create Github workflow to build, publish docker container to registry
- Rename the build workflow to test workflow
2022-01-29 17:08:19 -05:00
Debanjum Singh Solanky
78b76d65a0 Minor fix to notes jsonl file extension in sample_config.yml 2022-01-29 04:13:36 -05:00
Debanjum Singh Solanky
7c773d29ef Update github workflow to use environment.yml under config/ directory 2022-01-29 03:43:34 -05:00
Debanjum Singh Solanky
c31abad0a6 Mount embeddings to /data/embeddings for directory naming consistency
- Keeps directory paths consistent between host and container volumes
- Consistency simplifies documentation and updates required to setup
  sample_config.yml for local installation
2022-01-29 03:24:02 -05:00
Debanjum Singh Solanky
b0067fc32e Store docker, conda, semantic-search configuration in a config directory
- Improves organization of config files required for application
- Declutters the application root directory from configs
2022-01-29 02:41:11 -05:00
Debanjum Singh Solanky
79c2224eaa Improve test data organization and update correspoding conftests
- Put test data for each content type into separate directories
- Makes config.yml for docker and local host consistent
  - Prepending tests to /data in sample_config.yml makes application
    run on local host using test data
  - Allows mounting separate volume for each content type in docker-compose
- Ignore gitignore to only add tests content, not generated models or embeddings
2022-01-29 02:03:17 -05:00
Debanjum Singh Solanky
3e889760c7 Merge sample_config, docker_sample_config yml into a single sample_config.yml
- Update readme to indicate how to update the new sample_config to run on test data
2022-01-29 01:32:12 -05:00
Debanjum Singh Solanky
2bc2780501 Mention the experimental /chat API interacts with OpenAI's API 2022-01-29 00:11:40 -05:00
Debanjum Singh Solanky
6ed667aed0 Add Troubleshooting Section, Minor Fixes to Readme 2022-01-29 00:11:40 -05:00
Debanjum
d943d2be80
Merge pull request #21 from debanjum/saba/dockerize
Add Docker support to semantic-search
2022-01-28 20:27:40 -08:00
Saba
1ba7fa66e5 Update README and default folders in docker_sample_config.yml
- Add instruction to using Docker with README
- Use the ./tests/data folder in docker_sample_conifg.yml so it can work right away for users
2022-01-28 23:20:50 -05:00
Saba
52e701b3c2 Simplify Dockerfile by removing multibuild
- Install exiftool dependency directly in the miniconda image
2022-01-24 21:54:10 -05:00
Saba
33bc62dc19 Fix type of use_xmp_metadata to be bool, rather than str 2022-01-24 21:53:26 -05:00
Saba
9fb410fc25 Clean up docker_sample_config.yml
- Uncomment other search types
- Explain the file prefixes behavior and how it interfaces with the docker image
2022-01-24 14:11:38 -05:00
Saba
9802023c79 Clean up docker-compose
- Mount the local directory to /app
- Reformat the file paths to generically indicate what their purpose is
- Add comments to assist users who wasnt to modify properties themselves
2022-01-24 14:10:18 -05:00
Saba
4ae8c15170 Clean the Dockerfile
- Use /app as the working directory
- Clarify comment to explain why the ENTRYPOINT is constructed as it is
- Move explanations for the argument to docker-compose, where it's set
- Copy required artifacts from the first build image into the subsequent one (exiftool)
2022-01-24 14:08:55 -05:00
Saba
66d08ab5df Rename web to server in docker-compose.yml 2022-01-24 00:14:01 -05:00
Saba
77fa8718d9 Working example with docker-compose
Still need quite a bit of clean-up, but this adds a working docker-compose + Dockerfile setup
2022-01-23 23:44:38 -05:00
Saba
875188dc6f Initialize working on #20 to add Docker support
- Add a Dockerfile which uses an Ubuntu image to install relevant dependencies (exif) and uses a Miniconda image for setting up/reusing the conda environment
- Add a dummy docker-compose file
2022-01-23 14:57:28 -05:00
sabaimran
974690939c
Merge pull request #19 from debanjum/rename-config-types-for-consistency
Rename RawConfig Types for Consistency
2022-01-14 21:14:08 -05:00
Debanjum Singh Solanky
179153dc5a Rename RawConfig Types for Consistency
- Naming convention - [ContentType][ConfigType]Config
  - Where [ConfigType] ~ Content, Search, Processor
  - Where [ContentType] ~ Text, Image, Asymmetric, Symmetric, Conversation

- Current Configs:
  - Content:
    - Org Notes
    - Org Music
    - Image
    - Ledger/Beancount

  - Search:
     - Asymmetric
     - Symmetric
     - Image

  - Processor:
    - Conversation
2022-01-14 20:54:38 -05:00
Debanjum
ed7c2901f5
Merge pull request #18 from debanjum/deb/save-models-to-disk-on-first-run
Save Search Models to Disk on First Run

## Why
  - Improve application startup time
  - Startup application and perform semantic search even if user offline
  - Use search model config in YAML file for all search types (asymmetric, symmetric, image)

## Details
  - Load search models from disk when available
  - Use search model config specified in YAML file
  - Add search config for Symmetric Search used by Ledger/Beancount transaction search
2022-01-14 17:30:46 -08:00
Debanjum Singh Solanky
ed144f7984 Setup Search with Search_Config to Fix Tests
- Rename pytest fixture search_config to more appropriate
content_config
- Create search_config pytest fixture
- Use search_config where search being setup, used in tests
2022-01-14 20:13:14 -05:00
Debanjum Singh Solanky
c64e0c2965 Load model from HuggingFace if model_directory unset in config YAML
- Do not save/load the model to/from disk when model_directory unset
in config.yml
- Add symmetric search default config to cli.py
2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky
510faa1904 Save Image Search Model to Disk 2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky
934ec233b0 Add Search Config for Symmetric Model. Save Model to Disk 2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky
b63026d97c Save Asymmetric Search Model to Disk
- Improve application load time
- Remove dependence on internet to startup application and perform semantic search
2022-01-14 17:36:27 -05:00
Debanjum
e8146e8ebb
Merge pull request #17 from albd/patch-1
Fix url error in README
2022-01-13 18:09:10 -08:00
Albert Davies
2e2069f720
Fix url error in README
Misplaced quotes
2022-01-13 16:28:46 -08:00
Debanjum Singh Solanky
2e53fbc844 Fix the user intent extraction prompt for GPT. Clean up chatbot test 2022-01-12 10:36:01 -05:00
Debanjum Singh Solanky
ea28897cdd Remove deprecated conversation_history field from config 2022-01-12 10:35:52 -05:00
Debanjum Singh Solanky
5a686b7be9 Add logs for chat bot in verbose mode 2022-01-12 10:35:52 -05:00