RE: PDF file conundrum / A way to think about IRs from Booth, David (HP Software - Boston) on 2008-06-10 (public-awwsw@w3.org from June 2008)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Tue, 10 Jun 2008 16:59:08 +0000
To: 'Jonathan Rees' <jar@creativecommons.org>, "public-awwsw@w3.org" <public-awwsw@w3.org>
Message-ID: <184112FE564ADF4F8F9C3FA01AE50009FCF27C4BD5@G1W0486.americas.hpqcorp.net>
> From: Jonathan Rees
>
> Just to reiterate what I was saying on yesterday's call:
>
> David has said that an IR (what I call a "Boothian IR") is a function.
> Tim has said that an IR (what I call a "Timothian IR") is an abstract
> document.
> Both have said that PDF files are IRs.

I may have said that, but more specifically what I meant is that one can reasonably think of them that way.

To address this conundrum, let's for the moment put aside the precise definition of IR and just assume that there is one, and an IR has certain characteristics.  Using the definition, we now look at various entities in your challenge, such as

>    journal article
>    DNA sequence
>    home page
>    blog
>    gzip file
>    number
>    form  (e.g. http://random.org/integers/ )
>    pubmed record
>    awww:Representation
>    web site

and think about whether those entities have the characteristics of an IR.  In doing so, we certainly have some latitude in how we choose to think of each entity.  In some cases, it seems very reasonable to think of the entity having the characteristics of an IR.  For example, if we only cared about the basic information content of a journal article, then it seems reasonable to think of it as having the characteristics of an IR.  In other cases it seems less reasonable, but still plausible.  For example, we *could* think of a DNA sequence as nothing more than some information.  On the other hand, we might also want to think of it as a class of physical molecules that exist in the real world.

Now, for a given entity, let's suppose we have decided how we choose to think of it and consider the three possibilities:

 1. the entity does not have the characteristics of an IR.  It is lacking at least one characteristic of an IR.

 2. the entity has all of the characteristics of an IR, but it also has other characteristics.

 3. the entity has exactly the characteristics of an IR and no more.

In case 1, the entity clearly is not an IR.  In case 3, the entity clearly is an IR.  But what about case 2?  In case 2 we have an entity that is *both* an IR *and* something else: it has characteristics of both.

Case 2 is the same situation as the AKT example in
http://dbooth.org/2007/splitting/#akt
and, from an architectural perspective, can be analyzed the same way.  In essence, the fact that the entity is both an IR and something else may be inconsequential to many applications, but it may matter to other applications that need to be more precise.

Finally, even if one person has chosen to think of a particular entity one way (as having certain characteristics), someone else may wish to think of it slightly differently, with finer or coarser granularity or otherwise slightly different characteristics.  In such a case, from an architectural perspective that other person is really thinking of a *different* but related resource.

This means that it is pointless to present a list of candidate entities like that above and ask whether each one *is* an IR: without a precise definition of each entity, it is impossible to determine.  However, one could reasonably ask whether that group wishes to *think* of a particular entity as an IR, and thus devise a precise definition, that the group chooses to use, of that entity.

What if an HTTP 200 response is returned?  When you mint a URI, if you configure your server to return a 200 OK response when the URI is dereferenced, then by issuing that 200 response your server has declared the URI as denoting an IR.  Indeed, if you believe the inference rule in lines 298-325 of
http://lists.w3.org/Archives/Public/public-awwsw/2008Apr/att-0003/rules.n3.txt
that is *all* the URI denotes, because that rule only contains two assertions about resource ?r in its conclusion:

318.            ?r a awww:InformationResource .
319.            ?r uri:hasURI ?u .

Those two assertions become the core assertions for that URI declaration.  Thus if someone nonetheless uses the URI as though it denotes something *more* than just an IR, such as using http://markbaker.ca to denote Mark Baker the person, then that person is using the URI as though it denotes a resource with ambiguous identity, which may conflict with how others use that URI.  Architecturally this is no different in principle than any other ancillary assertions that someone might make about a resource.

What if one *wanted* to mint a URI to denote both an IR and something else?  For example, what if one wanted a URI that denotes an IR but also has other characteristics -- a journal article, for example?   If you believe the inference rules above and you believe in the notion of URI declaration, then that cannot be done with a URI that dereferences to a 200 OK response.   However, that could be done with a 303 or hash URI such as http://example/myarticle#this that leads to a URI declaration having core assertions something like:

        <http://example/myarticle#this> a awww:InformationResource .
        <http://example/myarticle#this> uri:hasURI
                    <http://example/downloads/myarticle> .
        <http://example/myarticle#this> dc:author "David Booth" .

In fact, depending on one's interpretation of owl:sameAs, the URI declaration might even say:

        <http://example/myarticle#this> owl:sameAs
                    <http://example/downloads/myarticle> .




David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent the official views of HP unless explicitly stated otherwise.
Received on Tuesday, 10 June 2008 17:00:09 UTC