Re: pdf and the semantic web

All the responses to date do not seem to address the thrust of the  
request, which is tagging *atomic content* of the PDF (not tagging the  
whole document).

XMP being a single separate component of the document, I don't see how  
it helps, unless there is an obvious way to refer to any element  
within the document.  But it would be nice to know of a way (other  
than "learn how to read/write PDF") that atomic PDF elements could be  
tagged.

john

--------------
John Graybeal   <mailto:graybeal@mbari.org>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org

On Feb 11, 2009, at 10:53 AM, Jeremy Carroll wrote:

>
> [[
>
>> annotating PDFs, as in tagging not the file but the information  
>> within the file, is not possible by means different from those  
>> provided by ADOBE.
>
> Not so. The standard means of annotating PDFs, i.e. adding metadata,  
> is to use XMP, the Extensible Metadata Platform [2], an intiative  
> from Adobe for labelling arbitrary binary (and text) files.
> [2] http://www.adobe.com/products/xmp/
>
> ]]
>
> My understanding is that the following method generally works for  
> reading XMP within an arbitrary file (e.g. a PDF file).
>
> Scan the file looking for "<rdf:RDF " and then invoke an RDF/XML  
> parser (til the closing </rdf:RDF>).
>
> Not necessarily perfect - unclear how the metadata and the data  
> relate for example, but ...
>
> If I have ever actually used this method it was several years ago  
> (and not lodged in my memory, I sort have a vague recollection ...).
> In RDF Core WG we took care to ensure that RDF 2004 was compatible  
> with XMP which was based on RDF 1999.
>
> Jeremy
>
>
>

Received on Thursday, 12 February 2009 00:07:42 UTC