- From: Gannon Dick <gannon_dick@yahoo.com>
- Date: Thu, 4 Apr 2013 07:19:59 -0700 (PDT)
- To: "paoladimaio10@googlemail.com" <paoladimaio10@googlemail.com>, John Erickson <olyerickson@gmail.com>
- Cc: Hatem Ben Yacoub <hatemben@gmail.com>, "eGov IG \(Public\)" <public-egov-ig@w3.org>
- Message-ID: <1365085199.31469.YahooMailNeo@web122903.mail.ne1.yahoo.com>
A StratML wrapper would make a lot of sense too, I think. The XFORMS construction methods are already largely in place. CSV imports could be aggregated and marked up and classified in a more targeted way, but not preclude conversion to RDF at a later time. The Journal Publishing Suite (NIH) as well as various LOC citation schemes, MADS, MODS, etc. use this strategy. Owen ? ________________________________ From: Paola Di Maio <paola.dimaio@gmail.com> To: John Erickson <olyerickson@gmail.com> Cc: Hatem Ben Yacoub <hatemben@gmail.com>; eGov IG (Public) <public-egov-ig@w3.org> Sent: Thursday, April 4, 2013 5:54 AM Subject: Re: Introducing Tabula (PDF to CSV conversion tool) Indeed looks good balance of simplicity and useful functionality, nice and reminds me of the 'tabulator' concept a bit more trimmed Wonder why there is no conversion to RDF? can we not also have a CSV to RDF button? would that not make sense? PDM On Thu, Apr 4, 2013 at 3:48 PM, John Erickson <olyerickson@gmail.com> wrote: Hatem, this is an extremely interesting tool! Note to everyone: even >though Mozilla was one of the supporters, it works in all browsers. >Or, at least also Chrome ;) > >A couple suggestions: >1. In addition to enabling the user to download and copy the selected >table segment, please provide a way (or at least start thinking about >a way) for there to be a permanent/re-usable/reliable URL to the >selected content. The reason is, some of us have RDF conversion >workflows that document the provenance, starting with the download URL >of the source CSV. >2. I can understand how headers present a problem..but it would be >extremely useful to have them working! Maybe you can extract them >first, then associate them with selected table segments on a follow-up >pass. But you'll need to have created a URL for the selected header >cells ;) NOTE: One compromise is to only do COMPLETE tables if the >headers are to be included. >3. Related to the above, you really need to encode provenance (see W3C >PROV) for this to really be useful to people using extracted tabular >data "in anger." > >Thanks again for this good work! > >John > > >On Wed, Apr 3, 2013 at 4:25 PM, Hatem Ben Yacoub <hatemben@gmail.com> wrote: >> Hi all, >> >> One of the problems that many Open Government data projects faces is >> the availability of tons of old documents in PDF format, which is not >> open and reusable format. Today, Mozilla announced Tabula, a new tool >> to help liberate tables trapped in PDFs. >> >> The online demo is amazing : http://tabula.nerdpower.org/ >> >> To use it simply make a rectangular selection over tables on the PDF >> pages. (Avoid headers) >> >> Sources https://github.com/jazzido/tabula >> >> Official announcement : >> http://source.mozillaopennews.org/en-US/articles/introducing-tabula/ >> >> >> Best, >> -- >> Eng. Hatem Ben Yacoub >> ICT & eGOV Consultant >> http://hbyconsultancy.com >> >> http://twitter.com/hatem >> http://facebook.com/hatemben >> > > > >-- >John S. Erickson, Ph.D. >Director, Web Science Operations >Tetherless World Constellation (RPI) ><http://tw.rpi.edu> <olyerickson@gmail.com> >Twitter & Skype: olyerickson > >
Received on Thursday, 4 April 2013 14:20:39 UTC