- From: Sandro Hawke <sandro@w3.org>
- Date: Mon, 30 Dec 2002 12:32:02 -0500
- To: www-tag@w3.org
The "mind trick" in object-oriented design is to conflate some "object" (which might be a person, place, or conceptual entity, not just a physical object) with a program data structure which holds information about it. The data structure itself is called the "object" by OO programmers, even though it only *represents* the original object. The original object is in the "problem domain" or "domain of discourse", while the data structure object is usually not. The information about the problem-domain object, stored in the data-structure object, is called the object's "state". This conflation can be very useful and is done willfully. It's often much simpler for the programmer to think about maintaining a list of students than to think about maintaining a list of data structures, each of which stores information about a student. Good object-oriented programming involves making sure the analogy between the problem-domain objects and the data-structure objects continues to hold as the system is developed. Reading the WebArch draft I see this conflation occurring repeatedly, and sometimes it's a problem. It certainly confused me until I recognized it, and in some cases it still left me unclear what was being recommended. In the web world, objects are called "resources" but the conflation is the same. Sometimes you're talking about problem-domain resources (people, cars, coffee pots) and sometimes you're talking about data-structure resources (collections of information). Just to be clear, I'm not talking about "representations". What you call a "representation" is what the distributed systems literature calls an "external data representation", "externalization", or "serialization" and the knowledge representation literature calls a "statement", "formula", "sentence", or "expression". It's something built out of a data structure by a sender, transmitted across a data network, and used by a receiver to construct a second data structure which is essentially identical to first. Let's look at the text. In the interest of brevity, I'm just going to look at uses of the word "resource". > Agents identify objects in the system (called "resources") with > Uniform Resource Identifiers (URIs), defined in [RFC2396]. Sounds like data-structure objects, since they are "in the system". > Agents represent resources using a nonexclusive set of data formats, > separately or in combination (e.g., XHTML, CSS, PNG, XLink, RDF/XML, > SMIL animation). Sure, agents serialize and transmit the contents of data-structures (the state of data-structure objects) using .... Okay. > All important resources SHOULD be identified by a URI. This could be read either way. More on this later. > Owners of important resources SHOULD make available representations > that describe the nature and purpose of those resources. I'm not sure how to translate this. I know one example here is media types; you want IANA to publish information on the web about each media type. A media type is a problem-domain object, the information about it is a data-structure object. Are you saying the owner of the media type itself should publish the information? Does a media type have an owner? What if we're talking about the natural language identifiers like "en-US"... who owns that? Surely no one owns English, but maybe someone owns that identifier. Yes, that's it:, URIs have owners, data-structure-objects have maintainers, and problem-domain objects can have anything, depending on the object. I think you mean something like: Owners of URIs, especially ones which may appear in a public context, SHOULD make available representations [information] that describe[s] the nature and purpose of those URIs. > The Web is a universe of resources. This must mean data-structure objects. The universe is a universe of problem-domain objects. The web is about information about those problem-domain objects. But the very next sentence: > A resource is defined by [RFC2396] to be anything that has > identity. Examples include documents, files, menu items, machines, and > services, as well as people, organizations, and concepts. Obviously these are problem-domain objects. > One can append a fragment identifier to a URI to yield an identifier > for part of, or a view of, a resource Now we're back to data-structure objects. It makes perfect sense to talk about a view or fragment of a collection of information. > When one resource refers to another via a URI, a link is formed. Sounds like a data-structure object containing a pointer to another data-structure object. > When many resources are linked this way, the large-scale effect is a > shared information space, where resources are identifiable by URI. Sure, just like a program which has lots of objects in memory, all linked together. A URI (or web address) is just like an object reference (which is really a memory address). > The value of the Web increases with the number of resources identified > by URI; this is due to the "network effect." Yeah, the more data you can reach and work with, the more ... uh, data you can work with. Sounds like a good thing. :-) > In turn, resources are more valuable when they are identifiable on the > Web. Hm. Earlier you said "A resource is defined by [RFC2396] to be anything that has identity." Do you mean that problem-domain objects are more valuable when information about them is available on the web? (That's probably true as a rule of thumb, although surely it depends on the information! I'm not sure if my mother would become more valuable if you were to make a fan site about her, but maybe!) Or do you mean that data-structure objects are more valuable when they can be accessed via the web? (That's true, yes.) So this is probably data-structure resources, but the issue is accessibility not identifiability. Of course you do need to be identifiable first. > Hence: > Use URIs: All important resources SHOULD be identified by a URI. Data-structure resources: yes, absolutely, as above. Problem-domain resources, like people, places, concepts, and physical objects? Eh, I'm not so sure. .... It goes on, but I'm sure we're all getting tired of it. I wish you didn't use the term URI for talking about both http URIs which clearly (IMHO) identify information repositories (data-structure resources) and for talking about strings like "mailto:nobody@example.org" and "urn:oasis:SAML:1.0" which can identify anything. The term "URL" is almost right for the first group, but not quite. I guess my favorite is "web address". URI is okay for the second group, except that lots of people turn "URI" into "web address" in their head, because that's how it's usually used. How about "internet identification string"? [ Just joking. ] Sometimes I think the TAG's mission would be much better served by issuing a few simple statements instead of this ... big document. Here's a statement that I think conveys about 50% of what the document is trying to say. I understand that simple statements like this don't fit very well into the W3C process. I started from > Use URIs: All important resources SHOULD be identified by a URI and went on from there. It's probably still too terse, but I bet half the audience would understand it immediately, even if they might not agree with it. If something is important, there should be information about it on the web. If you're creating or defining something, especially something conceptual and related to the web, you should pick a web address where information about it can be maintained for as long as the thing might be of interest. That web address can also be used to unambiguously identify the thing itself: people can say things like "My data is in the format defined at http://sample.org/format7." This secondary use of web addresses to identify things described on the web can be very attractive to designers of protocols and data formats. Traditionally, designers have assigned names and numbers to identify elements of their system. If the system was open, the assignments had to be managed through a public institution like IANA or ISO, or they could use UUIDs. URIs make an excellent alternative because (1) they are cheap and easy to obtain, and (2) they readily lead people (and even machines) to more information. Designers should be careful, however, to distinguish between places where a web address is used to directly identify a web page and those where it is used in this indirect manner to identify something described on the web page. (This is true regardless of the use of fragment identifiers in web addresses; they simply involve a portion of a web page.) I wonder how much of this statement the TAG agrees with..... I wonder how the RDF community would feel about that last paragraph. Yours truly for a better Web, -- sandro http://www.w3.org/People/Sandro/
Received on Monday, 30 December 2002 12:35:47 UTC