WebArch Ambiguity about Objects, PLUS Suggested Major Replacement from Sandro Hawke on 2002-12-30 (www-tag@w3.org from December 2002)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 30 Dec 2002 12:32:02 -0500
To: www-tag@w3.org
Message-Id: <200212301732.gBUHW2s13105@wadimousa.hawke.org>
The "mind trick" in object-oriented design is to conflate some
"object" (which might be a person, place, or conceptual entity, not
just a physical object) with a program data structure which holds
information about it.  The data structure itself is called the
"object" by OO programmers, even though it only *represents* the
original object.  The original object is in the "problem domain" or
"domain of discourse", while the data structure object is usually not.
The information about the problem-domain object, stored in the
data-structure object, is called the object's "state".

This conflation can be very useful and is done willfully.  It's often
much simpler for the programmer to think about maintaining a list of
students than to think about maintaining a list of data structures,
each of which stores information about a student.  Good
object-oriented programming involves making sure the analogy between
the problem-domain objects and the data-structure objects continues to
hold as the system is developed.

Reading the WebArch draft I see this conflation occurring repeatedly,
and sometimes it's a problem.  It certainly confused me until I
recognized it, and in some cases it still left me unclear what was
being recommended.  In the web world, objects are called "resources"
but the conflation is the same.  Sometimes you're talking about
problem-domain resources (people, cars, coffee pots) and sometimes
you're talking about data-structure resources (collections of
information).

Just to be clear, I'm not talking about "representations".  What you
call a "representation" is what the distributed systems literature
calls an "external data representation", "externalization", or
"serialization" and the knowledge representation literature calls a
"statement", "formula", "sentence", or "expression".  It's something
built out of a data structure by a sender, transmitted across a data
network, and used by a receiver to construct a second data structure
which is essentially identical to first.


Let's look at the text.  In the interest of brevity, I'm just going to
look at uses of the word "resource".

> Agents identify objects in the system (called "resources") with
> Uniform Resource Identifiers (URIs), defined in [RFC2396].

Sounds like data-structure objects, since they are "in the system".

> Agents represent resources using a nonexclusive set of data formats,
> separately or in combination (e.g., XHTML, CSS, PNG, XLink, RDF/XML,
> SMIL animation).

Sure, agents serialize and transmit the contents of data-structures
(the state of data-structure objects) using ....  Okay.

> All important resources SHOULD be identified by a URI.

This could be read either way.   More on this later.

> Owners of important resources SHOULD make available representations
> that describe the nature and purpose of those resources.

I'm not sure how to translate this.  I know one example here is media
types; you want IANA to publish information on the web about each
media type.  A media type is a problem-domain object, the information
about it is a data-structure object.  Are you saying the owner of the
media type itself should publish the information?  Does a media type
have an owner?  What if we're talking about the natural language
identifiers like "en-US"... who owns that?  Surely no one owns
English, but maybe someone owns that identifier.  Yes, that's it:,
URIs have owners, data-structure-objects have maintainers, and
problem-domain objects can have anything, depending on the object.

I think you mean something like:

  Owners of URIs, especially ones which may appear in a public
  context, SHOULD make available representations [information] that
  describe[s] the nature and purpose of those URIs.

> The Web is a universe of resources. 

This must mean data-structure objects.  The universe is a universe of
problem-domain objects.  The web is about information about those
problem-domain objects.

But the very next sentence:

> A resource is defined by [RFC2396] to be anything that has
> identity. Examples include documents, files, menu items, machines, and
> services, as well as people, organizations, and concepts.

Obviously these are problem-domain objects.

> One can append a fragment identifier to a URI to yield an identifier
> for part of, or a view of, a resource

Now we're back to data-structure objects.  It makes perfect sense to
talk about a view or fragment of a collection of information.

> When one resource refers to another via a URI, a link is formed.

Sounds like a data-structure object containing a pointer to another
data-structure object.

> When many resources are linked this way, the large-scale effect is a
> shared information space, where resources are identifiable by URI.

Sure, just like a program which has lots of objects in memory, all
linked together.  A URI (or web address) is just like an object
reference (which is really a memory address).

> The value of the Web increases with the number of resources identified
> by URI; this is due to the "network effect."

Yeah, the more data you can reach and work with, the more ... uh, data
you can work with.   Sounds like a good thing.   :-)

> In turn, resources are more valuable when they are identifiable on the
> Web. 

Hm.  Earlier you said "A resource is defined by [RFC2396] to be
anything that has identity."  Do you mean that problem-domain objects
are more valuable when information about them is available on the web?
(That's probably true as a rule of thumb, although surely it depends
on the information!  I'm not sure if my mother would become more
valuable if you were to make a fan site about her, but maybe!)  Or do
you mean that data-structure objects are more valuable when they can
be accessed via the web?  (That's true, yes.)  So this is probably
data-structure resources, but the issue is accessibility not
identifiability.  Of course you do need to be identifiable first.

> Hence:
>    Use URIs: All important resources SHOULD be identified by a URI.

Data-structure resources: yes, absolutely, as above.

Problem-domain resources, like people, places, concepts, and physical
objects?   Eh, I'm not so sure.

....

It goes on, but I'm sure we're all getting tired of it.

I wish you didn't use the term URI for talking about both http URIs
which clearly (IMHO) identify information repositories (data-structure
resources) and for talking about strings like
"mailto:nobody@example.org" and "urn:oasis:SAML:1.0" which can
identify anything.  The term "URL" is almost right for the first
group, but not quite.  I guess my favorite is "web address".  URI is
okay for the second group, except that lots of people turn "URI" into
"web address" in their head, because that's how it's usually used.
How about "internet identification string"?   [ Just joking. ]

Sometimes I think the TAG's mission would be much better served by
issuing a few simple statements instead of this ... big document.
Here's a statement that I think conveys about 50% of what the document
is trying to say.  I understand that simple statements like this don't
fit very well into the W3C process.

I started from 

> Use URIs: All important resources SHOULD be identified by a URI

and went on from there.  It's probably still too terse, but I bet half
the audience would understand it immediately, even if they might not
agree with it.

     If something is important, there should be information about it
     on the web.  If you're creating or defining something, especially
     something conceptual and related to the web, you should pick a
     web address where information about it can be maintained for as
     long as the thing might be of interest.

     That web address can also be used to unambiguously identify the
     thing itself: people can say things like "My data is in the
     format defined at http://sample.org/format7."

     This secondary use of web addresses to identify things described
     on the web can be very attractive to designers of protocols and
     data formats.  Traditionally, designers have assigned names and
     numbers to identify elements of their system.  If the system was
     open, the assignments had to be managed through a public
     institution like IANA or ISO, or they could use UUIDs.  URIs make
     an excellent alternative because (1) they are cheap and easy to
     obtain, and (2) they readily lead people (and even machines) to
     more information.

     Designers should be careful, however, to distinguish between
     places where a web address is used to directly identify a web
     page and those where it is used in this indirect manner to
     identify something described on the web page.  (This is true
     regardless of the use of fragment identifiers in web addresses;
     they simply involve a portion of a web page.)  

I wonder how much of this statement the TAG agrees with.....   I
wonder how the RDF community would feel about that last paragraph.

 Yours truly for a better Web,
   -- sandro                         http://www.w3.org/People/Sandro/
Received on Monday, 30 December 2002 12:35:47 UTC