- Previously top level headings would have get stripped of the
space between heading text and the prefix # symbols. That is,
`# Top Level Heading' would get converted to `#Top Level Heading'
- This would mess up their rendering as a heading in search results
- Add unit tests to text_to_jsonl processors to prevent regression
- Start standardizing implementation of the `text_to_jsonl' processors
- `text_to_jsonl; scripts already had a shared structure
- This change starts to codify that implicit structure
- Benefits
- Ease adding more `text_to_jsonl; processors
- Allow merging shared functionality
- Help with type hinting
- Drawbacks
- Lower agility to change. But this was already an implicit issue as
the text_to_jsonl processors got more deeply wired into the app
- Image search already uses a sorted list of images to process
- Prevents index of entries to desync when entries, embeddings
generated by a separate server/app instance
- Update existings code, tests to process input-filters as list
instead of str
- Test `text_to_jsonl' get files methods to work with combination of
`input-files' and `input-filters'
Resolves#84