Re: linked open data and PDF

On 2015-01-20 18:28, Larry Masinter wrote:
> There's some background that you might find helpful
> in the discussion.
>
> PDF is now defined by ISO 32000.
> PDF has profiles, including PDF/A-3
> http://www.digitalpreservation.gov/formats/fdd/fdd000360.shtml
> ISO 19005-3. PDF/A-3 defines how to add arbitrary
> file attachments to PDF.
>
> XMP http://en.wikipedia.org/wiki/Extensible_Metadata_Platform
> is (as of 2012) also an ISO standard, ISO 16684-1, a
> format-independent metadata representation
> that uses a restricted RDF/XML framework, but
> not arbitrary RDF/XML.
>
> A design from scratch today might make different
> choices, of course. But for those whose
> goal is deployment and integration
> with existing workflows, then reuse of what is widely
> deployed seems like a path worth investigating.
>
> And XMP is widely implemented not just for PDF but
> also for images, as a way of extending metadata
> beyond EXIF or IPTC.
>
> Putting linked data in compact form (CSV, for example)
> might makes sense, perhaps as a PDF/A-3 file attachment,
> if a document is a carrier of tabular data.
>
> Image formats like JPEG and PNG (for which there
> is support for XMP) don't have a standard, uniform
> way of attaching other files, though, so allowing
> data (or a pointer to external data) in the XMP
> would broaden the applicability.
>
> In choosing how to make five star open data work
> for file formats other than HTML, what other choices
> are there?

I would argue that declarative programs are most suitable. Others may 
disagree. AFAIK, there is no single widely accepted view on this.

re: "existing workflows", would you mind sharing your thoughts on how 
the 4th star, "use URIs to denote things, so that people can point at 
your stuff", may be achieved? Say we have:

http://example.org/foo.pdf

and that we go with XMP out of the box, irrespective of the RDF 
serialization it embeds. How can the 3rd LD design principle, "when 
someone looks up a URI, provide useful information, using the standards 
(RDF*, SPARQL)", be satisfied?

Example: I want to discover the variables that are declared in the 
hypothesis of papers.

What would the PDF/XMP look like?

How can I extract the information (without breaking my head) using off 
the shelf *open* tools?

> Sure, not all PDFs have good quality XMP metadata,
> but not all HTML has quality RDFa or metadata either.

I can agree to that. We can also look at it this way: majority of the 
Web pages are essentially "broken", yet, the Web somehow "just works". 
How would/does a PDF look or work on the Web if there is a non-trivial 
byte off - never mind the XMP?

-Sarven
http://csarven.ca/#i

Received on Wednesday, 21 January 2015 14:24:25 UTC