PATH-worthy Scripts

Hey folks, this repo is just a collection of various scripts I use frequently enough to justify keeping them in my system PATH. I haven't written documentation for all of these scripts. I might in time. For now, here's just a few highlights.

bates

A simple Python-based utility for extracting Bates numbers from PDF documents and optionally renaming files based on those numbers. Particularly useful for organizing legal documents or any PDFs with sequential numbering.

Overview

This tool helps you:

Extract Bates numbers from PDFs (both text-based and scanned documents)
Rename files based on their Bates number range
Preserve original filenames in macOS Finder comments
Process entire folders of PDFs in one go
Prepare files for use with my Bates Source Link DEVONthink script

Installation

Install Python dependencies:

pip3 install pdfplumber

If you plan to use OCR capabilities, also install:

pip3 install pytesseract pdf2image

For OCR support, install system dependencies:

On macOS:

brew install tesseract poppler

On Ubuntu/Debian:

sudo apt-get install tesseract-ocr poppler-utils

Basic Usage

Test extraction without renaming files:

python3 bates.py /path/to/folder --prefix "FWS-" --digits 6 --dry-run

Rename files based on Bates numbers:

python3 bates.py /path/to/folder --prefix "FWS-" --digits 6 --name-prefix "FWS "

Options

--prefix: The Bates number prefix to search for (default: "FWS-")
--digits: Number of digits after the prefix (default: 6)
--ocr: Enable OCR for scanned documents
--dry-run: Test extraction without renaming files
--name-prefix: Prefix to use when renaming files
--log: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Notes

Always test with --dry-run first
Original filenames are preserved in Finder comments (macOS only)
OCR is disabled by default to keep things fast

2 KiB Raw Blame History