thoughts on file attachments from Larry Masinter on 2016-09-11 (public-pdf-open-data@w3.org from September 2016)

From: Larry Masinter <LMM@acm.org>
Date: Sun, 11 Sep 2016 12:27:30 -0700
To: <public-pdf-open-data@w3.org>
Message-ID: <D3D8E7EB-D5EA-4075-B1ED-5345BA7D6674@acm.org>

I met with Gregg Friday; I think we made some progress on the “embed CSV for tabular data” case.

We talked about where to put metadata for each table, and just 

I think we came to a preference for attaching multiple files for each table: the CSV(s) and a metadata file.

It looks like there are lots of utilities for manipulating (adding, extracting, deleting) PDF file attachments; you don’t need acrobat. (All seem to deal with them only at the top level? )

Anyway, let’s say we give data-metadata files a special pattern:
METADATA-<n>-<descriptive name>.json
The CSV files can be named anything. They’re linked from the metadata files.
(The data doesn’t even have to be in the PDF!)

To extract data, just run a “pull out file attachments” utility, or use something fancier.
Look for METADATA files, and use them to manipulate the data.

Embedded files can use relative URLs to talk about other embedded files.
You don’t need to set base.

 If you unpack all the attachments, they’ll work relative to ‘file’.

I’m being a little terse so I hope what I’m saying is clear.
I’ll make some examples.

Received on Sunday, 11 September 2016 19:28:00 UTC