Pdftable is a python module and command line utility that analyzes XML output from the program pdftohtml in order to extract tables from PDF files and output them as CSV data. It makes it easier to automate the process of parsing tabular data contained within reports, ledgers, or other data sets that are only published in PDF.
Example application:
Written by Kyle Cronan. Please let me know if you find this program useful! Send your feedback to <kyle at pbx org>