W3C home > Mailing lists > Public > semantic-web@w3.org > February 2009

Re: pdf and the semantic web

From: John Graybeal <graybeal@mbari.org>
Date: Wed, 11 Feb 2009 16:06:56 -0800
Cc: "'Hammond, Tony'" <t.hammond@nature.com>, "'Alexander Garcia Castro'" <alexgarciac@gmail.com>, <semantic-web@w3.org>
Message-Id: <D0BEDD10-F68D-4481-AE3D-6F6A350B2AD7@mbari.org>
To: Jeremy Carroll <jeremy@topquadrant.com>

All the responses to date do not seem to address the thrust of the  
request, which is tagging *atomic content* of the PDF (not tagging the  
whole document).

XMP being a single separate component of the document, I don't see how  
it helps, unless there is an obvious way to refer to any element  
within the document.  But it would be nice to know of a way (other  
than "learn how to read/write PDF") that atomic PDF elements could be  
tagged.

john

--------------
John Graybeal   <mailto:graybeal@mbari.org>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org

On Feb 11, 2009, at 10:53 AM, Jeremy Carroll wrote:

>
> [[
>
>> annotating PDFs, as in tagging not the file but the information  
>> within the file, is not possible by means different from those  
>> provided by ADOBE.
>
> Not so. The standard means of annotating PDFs, i.e. adding metadata,  
> is to use XMP, the Extensible Metadata Platform [2], an intiative  
> from Adobe for labelling arbitrary binary (and text) files.
> [2] http://www.adobe.com/products/xmp/
>
> ]]
>
> My understanding is that the following method generally works for  
> reading XMP within an arbitrary file (e.g. a PDF file).
>
> Scan the file looking for "<rdf:RDF " and then invoke an RDF/XML  
> parser (til the closing </rdf:RDF>).
>
> Not necessarily perfect - unclear how the metadata and the data  
> relate for example, but ...
>
> If I have ever actually used this method it was several years ago  
> (and not lodged in my memory, I sort have a vague recollection ...).
> In RDF Core WG we took care to ensure that RDF 2004 was compatible  
> with XMP which was based on RDF 1999.
>
> Jeremy
>
>
>
Received on Thursday, 12 February 2009 00:07:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:45:35 GMT