W3C home > Mailing lists > Public > public-egov-ig@w3.org > April 2013

Re: Introducing Tabula (PDF to CSV conversion tool)

From: Paola Di Maio <paola.dimaio@gmail.com>
Date: Thu, 4 Apr 2013 16:24:39 +0530
Message-ID: <CAMXe=Spn9xM4i-h1X=3zPidkmg3DLk8DW-OUbwkDkRng7r3WOQ@mail.gmail.com>
To: John Erickson <olyerickson@gmail.com>
Cc: Hatem Ben Yacoub <hatemben@gmail.com>, "eGov IG (Public)" <public-egov-ig@w3.org>
Indeed looks good balance of simplicity and useful functionality, nice

and reminds me of the 'tabulator' concept a bit more trimmed

Wonder why there is no conversion to RDF?  can we not also have a CSV to
RDF button?
would that not make sense?



PDM



On Thu, Apr 4, 2013 at 3:48 PM, John Erickson <olyerickson@gmail.com> wrote:

> Hatem, this is an extremely interesting tool! Note to everyone: even
> though Mozilla was one of the supporters, it works in all browsers.
> Or, at least also Chrome ;)
>
> A couple suggestions:
> 1. In addition to enabling the user to download and copy the selected
> table segment, please provide a way (or at least start thinking about
> a way) for there to be a permanent/re-usable/reliable URL to the
> selected content. The reason is, some of us have RDF conversion
> workflows that document the provenance, starting with the download URL
> of the source CSV.
> 2. I can understand how headers present a problem..but it would be
> extremely useful to have them working! Maybe you can extract them
> first, then associate them with selected table segments on a follow-up
> pass. But you'll need to have created a URL for the selected header
> cells ;) NOTE: One compromise is to only do COMPLETE tables if the
> headers are to be included.
> 3. Related to the above, you really need to encode provenance (see W3C
> PROV) for this to really be useful to people using extracted tabular
> data "in anger."
>
> Thanks again for this good work!
>
> John
>
> On Wed, Apr 3, 2013 at 4:25 PM, Hatem Ben Yacoub <hatemben@gmail.com>
> wrote:
> > Hi all,
> >
> > One of the problems that many Open Government data projects faces is
> > the availability of tons of old documents in PDF format, which is not
> > open and reusable format. Today, Mozilla announced Tabula, a new tool
> > to help liberate tables trapped in PDFs.
> >
> > The online demo is amazing : http://tabula.nerdpower.org/
> >
> > To use it simply make a rectangular selection over tables on the PDF
> > pages. (Avoid headers)
> >
> > Sources https://github.com/jazzido/tabula
> >
> > Official announcement :
> > http://source.mozillaopennews.org/en-US/articles/introducing-tabula/
> >
> >
> > Best,
> > --
> > Eng. Hatem Ben Yacoub
> > ICT & eGOV Consultant
> > http://hbyconsultancy.com
> >
> > http://twitter.com/hatem
> > http://facebook.com/hatemben
> >
>
>
>
> --
> John S. Erickson, Ph.D.
> Director, Web Science Operations
> Tetherless World Constellation (RPI)
> <http://tw.rpi.edu> <olyerickson@gmail.com>
> Twitter & Skype: olyerickson
>
>
Received on Thursday, 4 April 2013 10:55:10 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:00:51 UTC