Commit graph

3 commits

Author SHA1 Message Date
Timothy Carambat
2a9066e83a
OCR PDFs as fallback during upload ()
* OCR PDFs as fallback in spawn thread

* wip

* build our own worker fanout and wrapper

* norm pkgs

* bump dev
2025-02-14 11:57:31 -08:00
Timothy Carambat
d1ca16f7f8
Add tokenizer improvments via Singleton class and estimation ()
* Add tokenizer improvments via Singleton class
linting

* dev build

* Estimation fallback when string exceeds a fixed byte size

* Add notice to tiktoken on backend
2025-01-30 17:55:03 -08:00
Sean Hatfield
79656718b2
[FEAT] Create custom pdfloader ()
* implement custom PDFLoader to remove LC dep

* remove unneeded comment

* remove pdfjs as dep and fix page splitting using pdf-parse

* linting + export rename for desktop compat

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
2024-07-11 12:26:11 -07:00
Renamed from collector/processSingleFile/convert/asPDF.js (Browse further)