Re: comments on Web Architecture First Edition from Pat Hayes on 2004-05-07 (public-webarch-comments@w3.org from May 2004)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 7 May 2004 18:30:43 -0500
To: Dan Connolly <connolly@w3.org>
Cc: public-webarch-comments@w3.org, w3c-rdfcore-wg@w3.org
Message-Id: <p06001f14bcc009f7e3b8@[10.0.100.76]>
>On Wed, 2004-05-05 at 15:00, Pat Hayes wrote:
>>  > On Wed, 2004-03-17 at 16:38, Pat Hayes wrote:
>[...]
>>  > > Which could be paraphrased as "A resource can be anything, and
>>  > > everything is a resource".
>>  >
>>  > yes, quite.
>>
>>
>>  Well, then, it is hard to resist asking the question, why did y'all
>>  feel obliged to (mis)-use a word when there already were perfectly
>>  good words you could have used, such as "entity" or even the plainer
>>  "thing" ?
>
>Fair question...
>
>The term 'resource' was chosen as the universal set in web developer
>discussions so old that I can't even find the archives.
>The usage goes back to at least 1994, with
>discussion of URLs, URIs, and even URCs (a precursor to RDF).
>   http://lists.w3.org/Archives/Public/uri/1994Dec/
>
>The XLink spec followed suit, to some extent:
>
>[[
>The notion of resources is universal to the World Wide Web. [Definition:
>As discussed in [IETF RFC 2396], a resource is any addressable unit of
>information or service.] Examples include files, images, documents,
>programs, and query results.
>]]
>  -- http://www.w3.org/TR/xlink/#N789
>
>I suppose that just re-raises the question of whether 'resource'
>is the universal set or something smaller.

Seems pretty clear to me that it is much smaller, by that term 
'addressable unit'. Most things in heaven and earth are not 
addressable units of anything.

That definition makes sense. It is clearly the C sense in my C/D 
contrast, and it fits with the now venerable discussions of 
hyperlinking in Engelbart's stuff, cited in the architecture 
document. So why not just stick with that? Then it all makes perfect 
sense as an architecture document. All you need to do is elide or 
modify the few places where the text goes off on some kind of riff 
about resources being anything and everything being a resource. None 
of these have any bearing on the technical content of the document in 
any case: everybody knows you can't send a book or a galaxy (or the 
weather in Oaxaca) over a network, in fact, so who are y'all trying 
to kid?

>I suppose we could choose 'thing' as the universal set, but that
>would be a pretty major educational undertaking in at least some
>communities. Maybe those communities are smaller than the ones
>that use 'resource' as the universal set.

Well, re-education is better than confusion. I don't think, frankly, 
that "resource" was EVER supposed to mean the universal set. I'm not 
sure now where the historical roots of that idea come from, but they 
can't be found in the older material anywhere Ive looked.  For 
example, with the understanding of "resource"  outlined in the Xlink 
spec and the 1994 discussions you cite, which notably seem to think 
of the Web as a kind of world-wide *library* , that choice of word 
makes perfect sense: the things in a library, even an extended 
world-wide hyperlinked electronic library, can indeed be reasonably 
thought of as resources, things that are accessible for use, things 
that provide functionality. The mistake arises when the library is 
identified with the entire universe.

But OK, if you really want "resource" to mean the universal set, now, 
then please say so very clearly and explicitly, but then read over 
the document with that meaning in mind very critically, because with 
that reading, much of what it says about resources really does not 
make sense. You can't send weather over a network; grains of sand 
don't have (or need) URIs, most resources have no owners, etc..

[later]

Maybe I have found the source.

A bit more reading found this, 
http://ftp.ics.uci.edu/pub/ietf/uri/rfc1737.txt  written by Sollins & 
Masinter in Dec 1994, defining the URN idea and possibly one source 
of this confusion that I am trying to root out. It has some nice 
prose in it, let me number the sections for later comment:

1.
The requirements for Uniform Resource Names fit within the overall 
architecture of Uniform Resource Identification.  In order to build 
applications in the most general case, the user must be able to 
discover and identify the information, objects, or what we will call 
in this architecture resources, on which the application is to 
operate. Beyond this statement, the URI architecture does not define 
"resource."
2.
  A URN identifies a resource or
    unit of information.  It may identify, for example, intellectual
    content, a particular presentation of intellectual content, or
    whatever a name assignment authority determines is a distinctly
    namable entity.  A URL identifies the location or a container for an
    instance of a resource identified by a URN. The resource identified
    by a URN may reside in one or more locations at any given time, may
    move, or may not be available at all.  Of course, not all resources
    will move during their lifetimes, and not all resources, although
    identifiable and identified by a URN will be instantiated at any
    given time.  As such a URL is identifying a place where a resource
    may reside, or a container, as distinct from the resource itself
    identified by the URN.
3.
  Requirements for functional capabilities

    These are the requirements for URNs' functional capabilities:

    o Global scope: A URN is a name with global scope which does not
      imply a location.  It has the same meaning everywhere.

    o Global uniqueness: The same URN will never be assigned to two
      different resources.

    o Persistence: It is intended that the lifetime of a URN be
      permanent.  That is, the URN will be globally unique forever, and
      may well be used as a reference to a resource well beyond the
      lifetime of the resource it identifies or of any naming authority
      involved in the assignment of its name.

    o Scalability: URNs can be assigned to any resource that might
      conceivably be available on the network, for hundreds of years.
....

4.
    One of the ways in which naming authorities, the assigners of names,
    may choose to make themselves distinctive is by the algorithms by
    which they distinguish or do not distinguish resources from each
    other.  For example, a publisher may choose to distinguish among
    multiple printings of a book, in which minor spelling and
    typographical mistakes have been made, but a library may prefer not
    to make that distinction.  Furthermore, no one algorithm for testing
    for equality is likely to applicable to all sorts of information.
    For example, an algorithm based on testing the equality of two books
    is unlikely to be useful when testing the equality of two
    spreadsheets.  Thus, although this document requires that any
    particular naming authority use one algorithm for determining whether
    two resources it is comparing are the same or different, each naming
    authority can use a different such algorithm and a naming authority
    may restrict the set of resources it chooses to identify in any way
    at all.

Now, let me chew on this for a while. I claim that (1) makes sense 
only under the C reading, since it refers to resources as things on 
which *applications* can *operate*. You cannot operate on Santa 
Clause or unborn children or remote galaxies or sodium atoms, or 
indeed on the weather in Oaxaca, unless maybe you have a fleet of 
cloud-seeding aircraft. So "object" here must be understood as 
referring to a computational sense of "object", rather than entities 
in general: things that (have an identity, but) can be stored in RAM, 
sent over network connections, manipulated by software, tested for 
equality under various criteria defined by methods inherited from 
their defining class, etc.. These may well be objects in the sense of 
"object" used in OOP, for example, but this is still a very small 
(and highly special) subset of the universe of possible objects in 
the broader, non-computational sense.

(That last sentence, BTW, seems to be an early example of the almost 
obsessive stonewalling that the TAG group insists on doing when asked 
to define its terminology. Do y'all think that there is some actual 
intellectual merit in refusing to say what you mean in this way? 
Maybe it makes it all feel more like pure mathematics, or something?)

Let me use 'computational object' , or CO, for what this paragraph 
seems to be talking about, and 'object' more generally for, well, 
anything in or under heaven.

The wording of (2) starts to get weird. We could take it either way. 
It would be fine for URNs to be able to name anything, not just COs, 
of course: sure, names can name anything. But for most nameable 
things, the idea of their having an address, aka container, is 
strange, to put it mildly: in many cases it isn't even coherent. (I 
take it that this means 'address' as in location in memory or file 
system, not like 'street address': but its a damned odd thing to say 
in either sense about most things in the universe.) So this focus on 
things changing their address or being moved from place to place 
seems to be an implicit restriction of the topic to COs. And then 
there is that curious remark about 'instances' of resources. What can 
that possibly mean? Many of things we talk about are already concrete 
particulars, which have no instances: what would be an 'instance' of, 
say, you or me, or of the great nebula in Orion? But for COs it does 
have a way of being interpreted in many cases, since COs are often 
thought of as instances of a computational definition or 
specification (the basic intuition underlying the OOP model in many 
cases, where identity is determined by the class which provides the 
'methods' of manipulation.) So again, if this means anything then it 
seems to mean something in the context of COs rather than objects 
more generally.

Turning to (3), this could at first be read in either sense, though 
read in the wider, D, sense it seems almost insanely ambitious (there 
shall be only One true Name for any object for Ever and Ever... 
sounds like something from the Wizard of Earthsea) until you get to 
the fourth condition, which refers to resources being "available on 
the network". I can make sense of this only if "resource" is 
understood in the computational C sense: things like galaxies are 
not, in  my experience, usually available on networks. Certainly the 
things that my OWL quantifiers range over are not all on a network, 
in any sense of 'on' and 'network' that I can understand: for 
example, they may never (ever) have a identifier. But I don't think 
that this is what the authors intended.

Section (4), from later in the document, seems to me to be muddled. 
Is it talking about criteria for identity or about *algorithms* for 
determining identity? The text uses the latter terminology but the 
examples only make sense in the former. In the usual sense of 
equality used outside computer science, the idea of an algorithm for 
testing for equality is incoherent: equality simply means being the 
same thing. (What kind of algorithm could test me for equality with 
myself? What would need to be tested? Here I am, and there is one of 
me: what else needs to be said or done?) This whole discussion makes 
sense only if one thinks of the actual world as being a kind of 
algebraic object, like a giant datastructure, whose pieces look 
different depending on which category you think of them as belonging 
to. This is a fatal muddling of reality with description, a confusing 
of the map with the territory, of the model with the modelled.

I think I see what the authors are getting at: the concept of 'book' 
can be interpreted as meaning an abstraction of the actual world at 
different levels: physical objects on a shelf, print runs, editions, 
manuscripts, etc., all might be called "a book" and might have 
systematic identifiers assigned to them by various naming 
authorities: yes, indeed. But this only translates into criteria for 
*identity* if one thinks of these abstractions as *being* the 
reality: and that is a fatal slip, like identifying mathematical 
abstractions with the reality that they might model, or the Platonic 
world of ideals with the actual world of concrete referents, or (as I 
think in this case) the computational descriptions of something with 
the things described. And once that mistaken identification is made, 
it seems trivial to identify reference with linking, and naming with 
access. The whole world now seems to be made of the stuff that is 
sent over network links and processed by applications running on 
computers.

>As to 'entity'... the web specs (MIME, HTTP) use "entity" to mean
>a generalization of the RFC822 header/body structure.

So they do. Sigh.  OK, so just say 'thing'.  Plain language is 
usually best in any case.

>"entity
>         The information transferred as the payload of a request or
>         response."
>   -- http://www.w3.org/Protocols/rfc2616/rfc2616-sec1.html#sec1
>
>Hmm... that spec seems to use 'resource' for something smaller
>than the universal set...
>
>"resource
>         A network data object or service that can be identified by a
>         URI, as defined in section 3.2."

You omitted the next sentence, which is even more telling:

"Resources may be available in multiple representations (e.g. 
multiple languages, data formats, size, and resolutions) or vary in 
other ways."


>then in 3.2:
>
>"As far as HTTP is concerned, Uniform Resource Identifiers are simply
>formatted strings which identify--via name, location, or any other
>characteristic--a resource."
>

This is the familiar circular recursion that one finds in so many of 
the W3C glossaries. Foodles are whatever are identified by bazzies, 
and bazzies identify foodles. Thanks, that is SO helpful :-)

Pat

>[... much elided ...]
>
>--
>Dan Connolly, W3C http://www.w3.org/People/Connolly/
>see you at the WWW2004 in NY 17-22 May?


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 7 May 2004 19:30:54 UTC