Debanjum Singh Solanky
fc531a1915
Resolve relative file paths to model embeddings in all search types
2021-08-28 22:26:12 -07:00
Debanjum Singh Solanky
74faa34bee
Update sample config to add minimal config for ledger, image search
2021-08-22 21:54:49 -07:00
Debanjum Singh Solanky
8dec58b12a
Update Readme to state can now query beancount transactions, images
2021-08-22 21:50:27 -07:00
Debanjum Singh Solanky
4daeddbbda
Enable Semantic Search on Images
2021-08-22 21:42:37 -07:00
Debanjum Singh Solanky
fd217fe8b7
Enable Semantic Search for Beancount transactions
2021-08-22 21:36:06 -07:00
Debanjum Singh Solanky
97263b8209
Move CLI into a separate module. Move CLI tests into a separate file
2021-08-21 19:21:38 -07:00
Debanjum Singh Solanky
78a1f4ebb4
Use YAML file to allow user to configure application. Add tests
...
- YAML Config
- Can specify all params[1] earlier being passed via cmd args in config YAML
- Can now also configure sentence-transformer models to use etc for search
- [1] Config params
- org files
- compressed entries file config path
- embeddings file config path
- Include sample_config.yaml
- Include sample .org file from this repos readmes
- CLI
- Configuration Priority: Config via cmd > Config via YAML > Default Config
- Test CLI, include test config.yml for the tests
- Set default type to None unless set via query param to API
Run notes search if search_enabled, also if type is None (default)
Prepares for running queries on all search types unless type
specified in API query param
- Update Readme
2021-08-21 19:07:39 -07:00
Debanjum Singh Solanky
bafc86d583
Add helpers to merge dictionaries and get keys deep inside a dictionary
2021-08-21 18:27:50 -07:00
Debanjum Singh Solanky
eddbc67358
Document how to install latest version in Readme
2021-08-17 18:27:10 -07:00
Debanjum Singh Solanky
252266b62a
Pass type of item via regenerate API. Default type query param to None
2021-08-17 18:25:07 -07:00
Debanjum Singh Solanky
ff7207a6bd
Extract commandline arguments into separate testable method
2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky
a3a1100be9
Arrange modules in standardized ordering
2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky
569e30b1c8
Create a few basic tests
2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky
af9660f28e
Move application files under src directory. Update Readmes
...
- Remove callign asymmetric search script directly command.
It doesn't work anymore on calling directly due to internal package
import issues
2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky
c35c6fb0b3
Reuse asymmetric.setup & input validation from asymmetric & org_to_jsonl
...
Create asymmetric.setup method to
- initialize model
- generate compressed jsonl
- compute embeddings
put input_files, input_file_filter validation in org_to_jsonl for
reuse in main.py, asymmetic.py
2021-08-17 00:45:40 -07:00
Debanjum Singh Solanky
02a84df37a
Update state vars after regeneration. Minimize time app in inconsistent state
2021-08-16 23:47:33 -07:00
Debanjum Singh Solanky
0509854e14
Replace README.md with README.org. Can be used as notes for testing
2021-08-16 20:00:05 -07:00
Debanjum Singh Solanky
79aff85fcb
Update Readme. No separate SETUP step required. Simpler RUN step
...
- Setup now happens on first run of application
- Embeddings can now be regenerated without killing app by calling API
2021-08-16 19:24:04 -07:00
Debanjum Singh Solanky
95bf26a7f2
Set verbosity commandline parameters default value to 0
2021-08-16 19:16:29 -07:00
Debanjum Singh Solanky
04a9a6d62f
Expose API endpoint to (re-)generate embeddings from latest notes
...
- Provides mechanism to update notes from within application
- Instead of having to pass the same arguments multiple times
Pass it once (or rely on defaults when possible) and let app keep
state and location of intermediary files
- Allows user to not have to deal with the internals of the application
- E.g user doesn't have to specify the jsonl.gz or embeddings file path
The app will still put those files in a default location
- The user doesn't have to run the generation from the commandline
as a separate step
2021-08-16 18:52:38 -07:00
Debanjum Singh Solanky
1c00c33e73
Improve debug output from org_to_jsonl.py script
2021-08-16 18:50:29 -07:00
Debanjum Singh Solanky
2a57156428
Fix org_to_jsonl. Use passed args not global variables in methods. Fix orgnode import
2021-08-16 17:37:44 -07:00
Debanjum Singh Solanky
66238004d8
Use verbosity level instead of bool across application
...
For consistent, more granular verbosity controls across app
Allows user to increase verbosity by passing -vvv flags passed to main.py
2021-08-16 17:15:41 -07:00
Debanjum Singh Solanky
adbf157deb
Remove usage of the closure to search_notes as it's not required
2021-08-16 16:52:48 -07:00
Debanjum Singh Solanky
649e5d1327
Allow reuse of get_absolute_path, is_none_or_empty methods
...
- Move them to utils.helper.py for reuse
- Import those modules where required
- Delete duplicate methods defined in org_to_jsonl.py, asymmetric.py
2021-08-16 16:33:43 -07:00
Debanjum Singh Solanky
9703afb814
Rename search_types to search_type to standardize to singular naming
...
Using singular names for other directories in application already
- processor instead of processors
- interface instead of interfaces
2021-08-16 16:31:30 -07:00
Debanjum Singh Solanky
19d6678eb1
Allow importing org-to-jsonl as module for reuse
...
To allow importing org-to-jsonl as module
- Wrap code in __main__ into a org-to-jsonl method
- Rename processor/org-mode to processor/org_mode
- Add __init__.py to processor directory
2021-08-16 16:31:30 -07:00
Debanjum Singh Solanky
5f8221f77e
Remove unused verbose argument to collate_results method
2021-08-16 13:54:41 -07:00
Debanjum Singh Solanky
85bf15628d
Use better cmdline argument names. Drop unneeded no-compress argument
...
Can infer to compress or not via the output_file suffix
2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky
d9f60c00bf
Warn if any input files to org-to-json are potentially non org-mode files
...
That is, if the file paths in the input set don't end with .org
2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky
3aa0c30fee
Use absolute file path to open files in org-to-jsonl.py, asymmetric.py
...
Exit script if neither org_files, org_file_filter is present
2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky
e773611558
Remove unused jsonl_file argument from convert_org_entries_to_jsonl
2021-08-16 13:49:35 -07:00
Debanjum Singh Solanky
8b29e272d3
Standardize interface, better default args for org-to-json.py script
...
- Remove non-standard, unnecessary argument for org-directory
Pass path each file in org-files and org-files-filter argument directly
- Allow shorthand -i, -o for input files, output files
- Default to compress, unless user explicitly specifies not to
2021-08-16 11:29:08 -07:00
Debanjum Singh Solanky
7547e90745
Minor doc updates after merging emacs package with main repository
2021-08-16 02:02:26 -07:00
Debanjum Singh Solanky
ec157ea0ff
Add Emacs interface to semantic-search directly to main repository
...
Too much overhead to maintain multiple repositories, especially when
the Emacs library for semantic-search is a single file.
Import Readme from the emacs-semantic-search repository too
2021-08-16 01:27:46 -07:00
Debanjum Singh Solanky
dcf7b2d04f
Remove requirements.txt for now as virtualenv setup doesn't work
...
Haven't gotten it to work on Mac or Ubuntu. Remove to avoid confusion
for now. Application depends on miniconda for now
2021-08-16 00:15:10 -07:00
Debanjum Singh Solanky
3b81fafa3e
Use updated path to MiniLM bi-encoder model on hugging-face
2021-08-15 23:57:22 -07:00
Debanjum Singh Solanky
4839153086
Acknowledge ML models used for search. Simplify path used in commands
2021-08-15 23:56:18 -07:00
Debanjum Singh Solanky
c58c1d96aa
Change default install directory to current, fix open file code
2021-08-15 23:01:55 -07:00
Debanjum Singh Solanky
ae15e429b5
Reduce indentation from 4 to 2 in Readme.md.
...
Prevent everything looking like code blocks due to 4 space indentations
2021-08-15 22:56:36 -07:00
Debanjum Singh Solanky
636b6195cc
Add Readme, License. Update .gitignore
2021-08-15 22:52:37 -07:00
Debanjum Singh Solanky
354c541b62
Add org processor to generate compressed jsonl from org-mode files
...
The corpus embeddings are generated from this compressed JSONL
using the specified transformer ML model
2021-08-15 22:52:31 -07:00
Debanjum Singh Solanky
b74cb9a104
Move install.py to new utils dir as it's for cmdline ease of use only
2021-08-15 19:10:30 -07:00
Debanjum Singh Solanky
ec92f3e146
Move different search types into search_types directory
2021-08-15 19:09:50 -07:00
Debanjum Singh Solanky
4d681c86ec
Update requirements.txt for users wanting to use pip install
2021-08-15 18:45:37 -07:00
Debanjum Singh Solanky
d75df54385
Create API interface for Semantic Search
...
Use FastAPI, Uvicorn to create app with API endpoint at /search
Example Query: http://localhost:8000/?q= "why sleep?"&t="notes'&n=5
2021-08-15 18:11:48 -07:00
Debanjum Singh Solanky
e3088c8cf8
Create environment.yml to install prerequisites for app via conda
2021-08-15 17:48:38 -07:00
Debanjum Singh Solanky
660e6c3937
Add explicit filters to asymmetric search
...
User can filter results to ones which include, exclude specified words
To show entries which include, exclude specific words, user should prepend
a '+', '-' before the word. E.g "+hello -bye"
2021-08-15 17:48:38 -07:00
Debanjum Singh Solanky
91a2c598fe
Resolve paths to absolute paths once. Use pathlib glob directly
2021-08-09 00:39:33 -07:00
Debanjum Singh Solanky
ca0a22f4dd
Search for images similar to query image provided by the user
...
Example user passes path to an image in query. e.g ~/Pictures/photo.jpg
The script should return images in images_embedding most similar to
the query image
2021-08-09 00:21:02 -07:00