pathScripts/README.md

2 KiB

PATH-worthy Scripts

Hey folks, this repo is just a collection of various scripts I use frequently enough to justify keeping them in my system PATH. I haven't written documentation for all of these scripts. I might in time. For now, here's just a few highlights.

bates

A simple Python-based utility for extracting Bates numbers from PDF documents and optionally renaming files based on those numbers. Particularly useful for organizing legal documents or any PDFs with sequential numbering.

Overview

This tool helps you:

  • Extract Bates numbers from PDFs (both text-based and scanned documents)
  • Rename files based on their Bates number range
  • Preserve original filenames in macOS Finder comments
  • Process entire folders of PDFs in one go
  • Prepare files for use with my Bates Source Link DEVONthink script

Installation

  1. Install Python dependencies:
pip3 install pdfplumber

If you plan to use OCR capabilities, also install:

pip3 install pytesseract pdf2image
  1. For OCR support, install system dependencies:

On macOS:

brew install tesseract poppler

On Ubuntu/Debian:

sudo apt-get install tesseract-ocr poppler-utils

Basic Usage

Test extraction without renaming files:

python3 bates.py /path/to/folder --prefix "FWS-" --digits 6 --dry-run

Rename files based on Bates numbers:

python3 bates.py /path/to/folder --prefix "FWS-" --digits 6 --name-prefix "FWS "

Options

  • --prefix: The Bates number prefix to search for (default: "FWS-")
  • --digits: Number of digits after the prefix (default: 6)
  • --ocr: Enable OCR for scanned documents
  • --dry-run: Test extraction without renaming files
  • --name-prefix: Prefix to use when renaming files
  • --log: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Notes

  • Always test with --dry-run first
  • Original filenames are preserved in Finder comments (macOS only)
  • OCR is disabled by default to keep things fast