- From: John Erickson <olyerickson@gmail.com>
- Date: Thu, 4 Apr 2013 06:18:11 -0400
- To: Hatem Ben Yacoub <hatemben@gmail.com>
- Cc: "eGov IG (Public)" <public-egov-ig@w3.org>
Hatem, this is an extremely interesting tool! Note to everyone: even though Mozilla was one of the supporters, it works in all browsers. Or, at least also Chrome ;) A couple suggestions: 1. In addition to enabling the user to download and copy the selected table segment, please provide a way (or at least start thinking about a way) for there to be a permanent/re-usable/reliable URL to the selected content. The reason is, some of us have RDF conversion workflows that document the provenance, starting with the download URL of the source CSV. 2. I can understand how headers present a problem..but it would be extremely useful to have them working! Maybe you can extract them first, then associate them with selected table segments on a follow-up pass. But you'll need to have created a URL for the selected header cells ;) NOTE: One compromise is to only do COMPLETE tables if the headers are to be included. 3. Related to the above, you really need to encode provenance (see W3C PROV) for this to really be useful to people using extracted tabular data "in anger." Thanks again for this good work! John On Wed, Apr 3, 2013 at 4:25 PM, Hatem Ben Yacoub <hatemben@gmail.com> wrote: > Hi all, > > One of the problems that many Open Government data projects faces is > the availability of tons of old documents in PDF format, which is not > open and reusable format. Today, Mozilla announced Tabula, a new tool > to help liberate tables trapped in PDFs. > > The online demo is amazing : http://tabula.nerdpower.org/ > > To use it simply make a rectangular selection over tables on the PDF > pages. (Avoid headers) > > Sources https://github.com/jazzido/tabula > > Official announcement : > http://source.mozillaopennews.org/en-US/articles/introducing-tabula/ > > > Best, > -- > Eng. Hatem Ben Yacoub > ICT & eGOV Consultant > http://hbyconsultancy.com > > http://twitter.com/hatem > http://facebook.com/hatemben > -- John S. Erickson, Ph.D. Director, Web Science Operations Tetherless World Constellation (RPI) <http://tw.rpi.edu> <olyerickson@gmail.com> Twitter & Skype: olyerickson
Received on Thursday, 4 April 2013 10:18:47 UTC