Re: WebArch Ambiguity about Objects, PLUS Suggested Major Replacement from Sandro Hawke on 2003-01-04 (www-tag@w3.org from January 2003)

From: Sandro Hawke <sandro@w3.org>
Date: Sat, 04 Jan 2003 00:10:24 -0500
To: Tim Bray <tbray@textuality.com>
cc: www-tag@w3.org
Message-Id: <200301040510.h045AOI26251@wadimousa.hawke.org>
> Sandro Hawke wrote:
> > The "mind trick" in object-oriented design is to conflate some
> > "object" (which might be a person
> 
> Hm... the part of my brain that deals with using O-O classes to 
> systematize building procedural logic is a different part than the one 
> that writes text with <'s in it to describe instances of, well... 
> anything.   So most of Sandro's prefatory arguments are lost on me.

The heart of my message is that sometimes people use http URIs to
identify documents, and sometimes they use them to talk about the
things discussed or described in the documents.

One sign of this is that in hypertext (eg [1]) sometimes the link-text
names the thing described ("W3C", "Tim Berners-Lee") and other times
the link-text names a document ("issues list", "TAG charter", "IRC
log").  In natural language this isn't a hard and fast distinction,
and there's no need to indicate which kind of identification is being
done.  I never really noticed it before today.

But if you try to pick one correct and formal style, things get
sticky.  If you really ask, "What is denoted by the value of an
href?", you probably wont get the same kind of answer for every href.
So I'm glad there's been no consensus.   Still, it would be nice
to get more clarity and maybe some decent terminology.

Maybe httpRange-14 should be closed as the wrong question and replaced
with "How should we, in our writings, clearly distinguish between the
ways which URIs (especially http URIs) are used to identify things?"
After a lot of thought, I picked one approach for some RDF writing
today [2], but even I'm not really happy with it.   (Still it's better
than marking everything with a 33 or 102.  I think.  :-)

> I don't understand what you mean by "problem-domain object", and I don't 
> think I should have to in order to write Web software.  Resources are 
> things that have URIs and can be dereferenced to get representations. 
> If it helps you in your applications to think of them as problem-domain 
> objects that's fine, but not necessary.

This whole distinction only comes up when, as in RDF, web software
really starts to deal with things that have nothing to do with the
web.  As long as you're dealing with files, timestamps, certificates,
external entities, fonts, etc, etc, you're fine.  But when your
metadata (which started as simple information about web pages)
suddenly turns into information about people, places, mythical
animals, logical paradoxes, and infinite sets of web pages, ... then
it starts to matter.

> The Web is not at all like a set of linked in-memory data structures. 
> Attempts to think of it that way lead to the implementation of broken 
> software.
... 
> You're not the first to rail against the extreme generality of the 
> URI/resource/representation formalism.  However, industrial software 
> like you find in browsers and servers and caches and robots seems to 
> work as the architecture says it should without any angst.

My impression is that the web has been succesful in spite of this
formalism, not because of it.  At what levels of web expertise do you
see these terms being adopted?  I think that by the time people hear
these terms they already have a detailed (if not entired consistent!)
mental model of web addresses, pages, sites, fragments, etc, and are
not much damaged.  The main advantage of the terminology is that it's
so nebulous it allowed people to reach a sense of consensus.


> >      If something is important, there should be information about it
> >      on the web.  If you're creating or defining something, especially
> >      something conceptual and related to the web, you should pick a
> >      web address where information about it can be maintained for as
> >      long as the thing might be of interest.
> > 
> >      That web address can also be used to unambiguously identify the
> >      thing itself: people can say things like "My data is in the
> >      format defined at http://sample.org/format7."
> > 
> >      This secondary use of web addresses to identify things described
> >      on the web can be very attractive to designers of protocols and
> >      data formats.  Traditionally, designers have assigned names and
> >      numbers to identify elements of their system.  If the system was
> >      open, the assignments had to be managed through a public
> >      institution like IANA or ISO, or they could use UUIDs.  URIs make
> >      an excellent alternative because (1) they are cheap and easy to
> >      obtain, and (2) they readily lead people (and even machines) to
> >      more information.
> > 
> >      [These] designers should be careful, however, to distinguish
> >      between places where a web address is used to directly
> >      identify a web page and those where it is used in this
> >      indirect manner to identify something described on the web
> >      page.  (This is true regardless of the use of fragment
> >      identifiers in web addresses; they simply involve a portion
> >      of a web page.)

> The language in the first paragraph is good, but the argument runs into 
> trouble and falls apart.  At some point in time, I might want to name 
> something with a http:-class URI and not have any representations for 
> it; this works just fine in practice, there's a huge number of XML 
> namespaces like this.  Later on, I might decide that providing some 
> representations might be useful.  The existing architecture's 
> agnosticism about the different kinds of URIs makes this trivially easy, 
> but the proposal that URIs be sorted into the class that has 
> representations and the class that doesn't really gets in the way. -Tim

Ah, I wasn't clear in the second paragraph.  Does this new version
make more sense?   (I've also added a bit about fragments in the first
paragraph and dropped the last paragraph.)

      If something is important, there should be information about it
      on the web.  If you're creating or defining something,
      especially something conceptual and related to the web, you
      should pick a web address where information about it can be
      maintained for as long as the thing might be of interest or have
      users.  If you are defining a group of related things, you may
      want to assign them different addresses on the same page, using
      fragment identifiers.

      In addition to helping you communicate about your work,  
      the web address you pick can be used to unambiguously identify the
      thing itself.  You can refer to your thing as "the one thing
      which is (or may be) defined at" the address you have picked.
      Informally, this is like saying "My data is in the format
      defined at http://sample.org/format7," but it's a little more
      precise and flexible.  You don't actually need to provide text
      at the given address, and you still have a unique identifier.
 
      This secondary use of web addresses to identify things described
      on the web can be very attractive to designers of protocols and
      data formats.  Traditionally, designers have assigned names and
      numbers to identify elements of their system.  If the system was
      open, the assignments had to be managed through a public
      institution like IANA or ISO, or they could use UUIDs.  URIs make
      an excellent alternative because (1) they are cheap and easy to
      obtain, and (2) they readily lead people (and even machines) to
      more information.
 
My private hope is that I can help the TAG output be much more
concise and in doing so will make up for my overly-long e-mail.  :-)

    -- sandro

[1] http://www.w3.org/2001/tag/
[2] http://www.w3.org/2002/12/rdf-identifiers/
Received on Saturday, 4 January 2003 00:14:53 UTC