- From: Leonard Rosenthol <lrosenth@adobe.com>
- Date: Thu, 8 Sep 2016 05:15:35 +0000
- To: Larry Masinter <masinter@adobe.com>, "public-pdf-open-data@w3.org" <public-pdf-open-data@w3.org>
Yes, the PDF Fragment syntax has already been extended to support file attachments for exactly this type of use case. Leonard On 9/7/16, 12:11 PM, "Larry Masinter" <masinter@adobe.com> wrote: I had a good discussion yesterday with Gregg Kellogg (new group member) and I thought I would report it. Gregg worked on the CSV-on-the-web working group, and in some ways we’re trying to do for PDF what the CSV group did for CSV: find a way of letting PDF data be five star. There is all kinds of data one might want to get out of a PDF, but for lots and lots of use cases, the important data is in tables. CSV (comma-separated-values) is a common, simple way of communicating values in a table, can be read into a spreadsheet directly. The CSVW group defined a way of representing the metadata you need to know to transform the data in the CSV file into RDF triples. https://www.w3.org/standards/techs/csv Gregg developed a Note about embedding CSV inside HTML http://www.w3.org/TR/csvw-html/ for the same kinds of reasons… keep the data with the report that describes it, keep existing workflows which have grown up around having a single file. So: suppose, for each table in a PDF file with (useful) data in it, we add an attachment of CSV and JSON metadata. (There’s some question of which points to which, or if you could have multiple CSV fragments for one table, and some issue of what URL to use to get to parts of the CSV file, but these seem workable.) Gregg has some web utilities that do useful things with RDF and CSV. One takes a URI of a CSV and produces other formats. The CSVW github repo has a lot of examples and test cases. (under http://w3c.github.io/csvw/) Lots of samples were based on “palo alto trees” (one of the CSVW use cases). For example https://raw.githubusercontent.com/w3c/csvw/gh-pages/examples/tree-ops.csv-metadata.json has the metadata for tree-ops. I am curious as to whether the PDF fragment identifier syntax could be extended to allow pointers into file attachments. See https://tools.ietf.org/html/draft-hardy-pdf-mime-04.pdf We’re going to get together Friday afternoon in San Jose and I hope we can talk about the charter and see how far we can get with the PDFData tools https://github.com/Aiybe/PDFData. If you’d like to join, let me know. Larry -- http://larry.masinter.net
Received on Thursday, 8 September 2016 05:16:09 UTC