Pdftable is a python module and command line utility that analyzes XML output from the program pdftohtml in order to extract tables from PDF files and output them as CSV data. It makes it easier to automate the process of parsing tabular data contained within reports, ledgers, or other data sets that are only published in PDF.

Download pdftable

Project detail and discussion

Get support

Browse SVN

 

Example application:

Written by Kyle Cronan. Please let me know if you find this program useful! Send your feedback to <kyle at pbx org>

Get pdftable at SourceForge.net. Fast, secure and Free Open Source software downloads