W3C home > Mailing lists > Public > semantic-web@w3.org > February 2009

Re: pdf and the semantic web

From: Alexander Garcia Castro <alexgarciac@gmail.com>
Date: Thu, 12 Feb 2009 01:17:47 +0100
Message-ID: <1ba2d5730902111617o3a0ba940qac3b6339710ef10c@mail.gmail.com>
To: John Graybeal <graybeal@mbari.org>
Cc: Jeremy Carroll <jeremy@topquadrant.com>, "Hammond, Tony" <t.hammond@nature.com>, semantic-web@w3.org
Thanks to all of you for your replies. Thanks John, tagging the atomic
content, not the pdf as a whole, is exactly what I would like to do. How is
this related to the SW? easy, papers have concepts, concepts are in
ontologies, ontologies can point to resources capable of consuming those
concepts. This is particularly true in Life Sciences.

The actual "why" for my email: I am doing research on the intersection
between  folkwonomies and the semantic web in digital libraries. So far, I
have not found a realistic way to use a PDF in an open manner, similar to
the way one could use a latex file. All those libraries, APIs, XMLs, etc etc
are great, some of them facilitate by a lot whatever one wants to do with
the PDF. But so far, IMHO the PDF remains not so open, and also IMHO is not
part of what we could classify as generative technology -which is what could
make the difference in the scesess of the SW, see *futureoftheinternet*.org/
for generative tech.

again thanks a lot to all of you.

On Thu, Feb 12, 2009 at 1:06 AM, John Graybeal <graybeal@mbari.org> wrote:

> All the responses to date do not seem to address the thrust of the request,
> which is tagging *atomic content* of the PDF (not tagging the whole
> document).
> XMP being a single separate component of the document, I don't see how it
> helps, unless there is an obvious way to refer to any element within the
> document.  But it would be nice to know of a way (other than "learn how to
> read/write PDF") that atomic PDF elements could be tagged.
> john
> --------------
> John Graybeal   <mailto:graybeal@mbari.org>  -- 831-775-1956
> Monterey Bay Aquarium Research Institute
> Marine Metadata Interoperability Project: http://marinemetadata.org
> On Feb 11, 2009, at 10:53 AM, Jeremy Carroll wrote:
>> [[
>>  annotating PDFs, as in tagging not the file but the information within
>>> the file, is not possible by means different from those provided by ADOBE.
>> Not so. The standard means of annotating PDFs, i.e. adding metadata, is to
>> use XMP, the Extensible Metadata Platform [2], an intiative from Adobe for
>> labelling arbitrary binary (and text) files.
>> [2] http://www.adobe.com/products/xmp/
>> ]]
>> My understanding is that the following method generally works for reading
>> XMP within an arbitrary file (e.g. a PDF file).
>> Scan the file looking for "<rdf:RDF " and then invoke an RDF/XML parser
>> (til the closing </rdf:RDF>).
>> Not necessarily perfect - unclear how the metadata and the data relate for
>> example, but ...
>> If I have ever actually used this method it was several years ago (and not
>> lodged in my memory, I sort have a vague recollection ...).
>> In RDF Core WG we took care to ensure that RDF 2004 was compatible with
>> XMP which was based on RDF 1999.
>> Jeremy

Alexander Garcia
Received on Thursday, 12 February 2009 00:18:25 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:10 UTC