- From: Owen Ambur <Owen.Ambur@verizon.net>
- Date: Thu, 04 Apr 2013 12:03:50 -0400
- To: "'Gannon Dick'" <gannon_dick@yahoo.com>, <paoladimaio10@googlemail.com>, "'John Erickson'" <olyerickson@gmail.com>
- Cc: "'Hatem Ben Yacoub'" <hatemben@gmail.com>, "'eGov IG \(Public\)'" <public-egov-ig@w3.org>
- Message-id: <002b01ce314e$00085800$00190800$@Ambur@verizon.net>
Gannon, yes, of course, I am interested in seeing strategic and performance plans and reports rendered in open, standard, machine-readable StratML format whenever possible. It would be much better if the original, authoritative sources were in StratML (XML) format so that PDF and other renditions could automatically be rendered therefrom. However, to the degree that may not occur, it will be good to see how far tools like this can take us in automating an otherwise backward process. Owen From: Gannon Dick [mailto:gannon_dick@yahoo.com] Sent: Thursday, April 04, 2013 10:20 AM To: paoladimaio10@googlemail.com; John Erickson Cc: Hatem Ben Yacoub; eGov IG (Public) Subject: Re: Introducing Tabula (PDF to CSV conversion tool) A StratML wrapper would make a lot of sense too, I think. The XFORMS construction methods are already largely in place. CSV imports could be aggregated and marked up and classified in a more targeted way, but not preclude conversion to RDF at a later time. The Journal Publishing Suite (NIH) as well as various LOC citation schemes, MADS, MODS, etc. use this strategy. Owen ? _____ From: Paola Di Maio <paola.dimaio@gmail.com> To: John Erickson <olyerickson@gmail.com> Cc: Hatem Ben Yacoub <hatemben@gmail.com>; eGov IG (Public) <public-egov-ig@w3.org> Sent: Thursday, April 4, 2013 5:54 AM Subject: Re: Introducing Tabula (PDF to CSV conversion tool) Indeed looks good balance of simplicity and useful functionality, nice and reminds me of the 'tabulator' concept a bit more trimmed Wonder why there is no conversion to RDF? can we not also have a CSV to RDF button? would that not make sense? PDM On Thu, Apr 4, 2013 at 3:48 PM, John Erickson <olyerickson@gmail.com> wrote: Hatem, this is an extremely interesting tool! Note to everyone: even though Mozilla was one of the supporters, it works in all browsers. Or, at least also Chrome ;) A couple suggestions: 1. In addition to enabling the user to download and copy the selected table segment, please provide a way (or at least start thinking about a way) for there to be a permanent/re-usable/reliable URL to the selected content. The reason is, some of us have RDF conversion workflows that document the provenance, starting with the download URL of the source CSV. 2. I can understand how headers present a problem..but it would be extremely useful to have them working! Maybe you can extract them first, then associate them with selected table segments on a follow-up pass. But you'll need to have created a URL for the selected header cells ;) NOTE: One compromise is to only do COMPLETE tables if the headers are to be included. 3. Related to the above, you really need to encode provenance (see W3C PROV) for this to really be useful to people using extracted tabular data "in anger." Thanks again for this good work! John On Wed, Apr 3, 2013 at 4:25 PM, Hatem Ben Yacoub <hatemben@gmail.com> wrote: > Hi all, > > One of the problems that many Open Government data projects faces is > the availability of tons of old documents in PDF format, which is not > open and reusable format. Today, Mozilla announced Tabula, a new tool > to help liberate tables trapped in PDFs. > > The online demo is amazing : http://tabula.nerdpower.org/ > > To use it simply make a rectangular selection over tables on the PDF > pages. (Avoid headers) > > Sources https://github.com/jazzido/tabula > > Official announcement : > http://source.mozillaopennews.org/en-US/articles/introducing-tabula/ > > > Best, > -- > Eng. Hatem Ben Yacoub > ICT & eGOV Consultant > http://hbyconsultancy.com > > http://twitter.com/hatem > http://facebook.com/hatemben > -- John S. Erickson, Ph.D. Director, Web Science Operations Tetherless World Constellation (RPI) <http://tw.rpi.edu> <olyerickson@gmail.com> Twitter & Skype: olyerickson
Received on Thursday, 4 April 2013 16:05:05 UTC