Commit graph

7 commits

Author SHA1 Message Date
Debanjum Singh Solanky
19d6678eb1 Allow importing org-to-jsonl as module for reuse
To allow importing org-to-jsonl as module
  - Wrap code in __main__ into a org-to-jsonl method
  - Rename processor/org-mode to processor/org_mode
  - Add __init__.py to processor directory
2021-08-16 16:31:30 -07:00
Debanjum Singh Solanky
85bf15628d Use better cmdline argument names. Drop unneeded no-compress argument
Can infer to compress or not via the output_file suffix
2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky
d9f60c00bf Warn if any input files to org-to-json are potentially non org-mode files
That is, if the file paths in the input set don't end with .org
2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky
3aa0c30fee Use absolute file path to open files in org-to-jsonl.py, asymmetric.py
Exit script if neither org_files, org_file_filter is present
2021-08-16 13:49:39 -07:00
Debanjum Singh Solanky
e773611558 Remove unused jsonl_file argument from convert_org_entries_to_jsonl 2021-08-16 13:49:35 -07:00
Debanjum Singh Solanky
8b29e272d3 Standardize interface, better default args for org-to-json.py script
- Remove non-standard, unnecessary argument for org-directory
  Pass path each file in org-files and org-files-filter argument directly
- Allow shorthand -i, -o for input files, output files
- Default to compress, unless user explicitly specifies not to
2021-08-16 11:29:08 -07:00
Debanjum Singh Solanky
354c541b62 Add org processor to generate compressed jsonl from org-mode files
The corpus embeddings are generated from this compressed JSONL
using the specified transformer ML model
2021-08-15 22:52:31 -07:00