2 KiB
2 KiB
PATH-worthy Scripts
Hey folks, this repo is just a collection of various scripts I use frequently enough to justify keeping them in my system PATH. I haven't written documentation for all of these scripts. I might in time. For now, here's just a few highlights.
bates
A simple Python-based utility for extracting Bates numbers from PDF documents and optionally renaming files based on those numbers. Particularly useful for organizing legal documents or any PDFs with sequential numbering.
Overview
This tool helps you:
- Extract Bates numbers from PDFs (both text-based and scanned documents)
- Rename files based on their Bates number range
- Preserve original filenames in macOS Finder comments
- Process entire folders of PDFs in one go
- Prepare files for use with my Bates Source Link DEVONthink script
Installation
- Install Python dependencies:
pip3 install pdfplumber
If you plan to use OCR capabilities, also install:
pip3 install pytesseract pdf2image
- For OCR support, install system dependencies:
On macOS:
brew install tesseract poppler
On Ubuntu/Debian:
sudo apt-get install tesseract-ocr poppler-utils
Basic Usage
Test extraction without renaming files:
python3 bates.py /path/to/folder --prefix "FWS-" --digits 6 --dry-run
Rename files based on Bates numbers:
python3 bates.py /path/to/folder --prefix "FWS-" --digits 6 --name-prefix "FWS "
Options
--prefix
: The Bates number prefix to search for (default: "FWS-")--digits
: Number of digits after the prefix (default: 6)--ocr
: Enable OCR for scanned documents--dry-run
: Test extraction without renaming files--name-prefix
: Prefix to use when renaming files--log
: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
Notes
- Always test with
--dry-run
first - Original filenames are preserved in Finder comments (macOS only)
- OCR is disabled by default to keep things fast