RE: pdf and the semantic web from Jeremy Carroll on 2009-02-11 (semantic-web@w3.org from February 2009)

From: Jeremy Carroll <jeremy@topquadrant.com>
Date: Wed, 11 Feb 2009 10:53:26 -0800
To: "'Hammond, Tony'" <t.hammond@nature.com>, "'Alexander Garcia Castro'" <alexgarciac@gmail.com>, <semantic-web@w3.org>
Message-ID: <004101c98c7a$07454a10$15cfde30$@com>

[[

> annotating PDFs, as in tagging not the file but the information within the file, is not possible by means different from those provided by ADOBE.

Not so. The standard means of annotating PDFs, i.e. adding metadata, is to use XMP, the Extensible Metadata Platform [2], an intiative from Adobe for labelling arbitrary binary (and text) files.
 [2] http://www.adobe.com/products/xmp/

]]

My understanding is that the following method generally works for reading XMP within an arbitrary file (e.g. a PDF file).

Scan the file looking for "<rdf:RDF " and then invoke an RDF/XML parser (til the closing </rdf:RDF>).

Not necessarily perfect - unclear how the metadata and the data relate for example, but ...

If I have ever actually used this method it was several years ago (and not lodged in my memory, I sort have a vague recollection ...).
In RDF Core WG we took care to ensure that RDF 2004 was compatible with XMP which was based on RDF 1999.

Jeremy

Received on Wednesday, 11 February 2009 18:54:06 UTC