W3C home > Mailing lists > Public > semantic-web@w3.org > February 2009

Re: pdf and the semantic web

From: Hammond, Tony <t.hammond@nature.com>
Date: Wed, 11 Feb 2009 16:18:51 +0000
To: Alexander Garcia Castro <alexgarciac@gmail.com>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <C5B8AAEB.17ED2%t.hammond@nature.com>
Hi Alexander:

> The PDF format is closed;

Not so. PDF is now an ISO International Standard - ISO 32000-1, see last
yearıs press release[1].

PDF is thus an ³open² format ­ albeit a complex one to deal with.

> annotating PDFs, as in tagging not the file but the information within the
file, is not possible by means different from those provided by ADOBE.

Not so. The standard means of annotating PDFs, i.e. adding metadata, is to
use XMP, the Extensible Metadata Platform [2], an intiative from Adobe for
labelling arbitrary binary (and text) files.

While the Adobe tools have best support for reading/writing XMP there are
3rd-party tools. One such example is the Perl-based Exiftool [3] which
allows reading and writing of XMP packets into PDFs.

> Isn't the PDF obsolete within this context?

Not so. We are beginning to see academic publishers (among others) adding in
structured metadata (as XMP packets) into their PDFs. For example, Nature
magazine is now routinely serving up such information [4]. Nature is one of
(at least) two major scientific publishers now adding in XMP to their PDFs.

It may be more difficult to work with binary files than simple text markup
but thatıs not sufficient reason to perjure such formats. Binary files are
tractable ­ if ³difficult².

Cheers,

Tony

[1] http://www.iso.org/iso/pressrelease.htm?refid=Ref1141
[2] http://www.adobe.com/products/xmp/
[3] http://www.sno.phy.queensu.ca/~phil/exiftool/
[4] http://blogs.nature.com/wp/nascent/2008/12/xmp_labelling_for_nature.html


On 11/2/09 15:43, "Alexander Garcia Castro" <alexgarciac@gmail.com> wrote:

> I would like to know how applicable could the PDF format be within the context
> of the Semantic web? The PDF format is closed; annotating PDFs, as in tagging
> not the file but the information within the file, is not possible by means
> different from those provided by ADOBE. For instance, if I wanted to tag a
> word, or an image within, inside, a PDF I would have to do it with my acrobat
> reader -the latest version; But if I wanted to facilitate such operation via
> WEB I could only do it if and only if I had the XSLT so I could transform the
> PDF into XML. This limitation is, IMHO, a huge one within the context of the
> semantic web where we should be able to define links and use them.
> Furthermore, being forced to have a third party application just for
> displaying a file that should be displayed directly by the browser is not a
> nice feature. If PDF was open it could be rendered by the browser.  Aren't
> closed formats such as PDF viable within the context of the SW? After all the
> PDF was a solution within the context of portability and exchange of
> information; the main problem it was solving was a simple one "I want my
> document to look on display and once printed,  the same everywhere" and "I
> want people to be able to read my documents without loosing the format of the
> document and without having to consider the OS". Isn't the PDF obsolete within
> this context? 



********************************************************************************   
DISCLAIMER: This e-mail is confidential and should not be used by anyone who is
not the original intended recipient. If you have received this e-mail in error
please inform the sender and delete it from your mailbox or any other storage
mechanism. Neither Macmillan Publishers Limited nor any of its agents accept
liability for any statements made which are clearly the sender's own and not
expressly made on behalf of Macmillan Publishers Limited or one of its agents.
Please note that neither Macmillan Publishers Limited nor any of its agents
accept any responsibility for viruses that may be contained in this e-mail or
its attachments and it is your responsibility to scan the e-mail and 
attachments (if any). No contracts may be concluded on behalf of Macmillan 
Publishers Limited or its agents by means of e-mail communication. Macmillan 
Publishers Limited Registered in England and Wales with registered number 785998 
Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS   
********************************************************************************
Received on Wednesday, 11 February 2009 16:20:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:42:10 UTC