- From: Frank Manola <fmanola@acm.org>
- Date: Wed, 10 Mar 2004 17:14:03 -0500
- To: public-webarch-comments@w3.org
- Cc: w3c-rdfcore-wg@w3.org
The following are some (personal) comments on http://www.w3.org/TR/2003/WD-webarch-20031209/ Sorry they're late. --Frank ==== Section 1: [[A travel scenario is used throughout this document to illustrate typical behavior of Web agents -- people or software (on behalf of a person, entity, or process) acting on this information space.]] This sentence defines "Web agents" as including both people and software (as opposed to just software). However, the usage of terms like "agent", "user agent", etc. throughout the document isn't always consistent with including people in addition to software in the definition (and sometimes, but not always, the phrase "software agent" is used explicitly in places where only software is clearly meant). Further comments will illustrate this point in specific instances. ==== [[This scenario illustrates the three architectural bases of the Web that are discussed in this document: 1. Identification. Each resource is identified by a URI. In this travel scenario, the resource is about the weather in Oaxaca and the URI is "http://weather.example.com/oaxaca".]] It would be more consistent with the rest of the example if "the resource is about the weather in Oaxaca" read "the resource is *a report* about the weather in Oaxaca." Further, given the potential generality of the things URIs can identify, it would be helpful if there were some discussion somewhere about distinguishing between URIs identifying such distinct things as: a. a report about the weather in Oaxaca b. another report about the weather in Oaxaca produced by a second organization c. "the weather in Oaxaca" (which no one "owns", but which may be described in multiple reports created (and owned) by multiple independent parties) d. data on the weather in Oaxaca reported by on-line weather instrumentation which may be accessible to anyone, and which the creators of (a) and (b) combine with additional information (satellite photos, radar sweeps) to produce the reports (a) and (b). More on this later. ==== Section 1.1: [[ * The addition a conformance section is not likely to increase the utility of the document. ]] The addition *of* a conformance section. ==== Section 1.1.2, first para: [[The section on Architectural Specifications includes references.]] This sentence seems to end abruptly. References to what? ==== Section 1.1.3: [[Authors of protocol specifications in particular should invest time in understanding the REST model and consider the role to which of its principles could guide their design: statelessness, clear assignment of roles to parties, uniform address space, and a limited, uniform set of verbs.]] This sentence has an "interesting" structure. For one thing, "the role to which of its principles could guide their design" seems to mix several more usual constructions, e.g., either "the role its principles could [or should] *play* in their designs" or "the *extent* to which each of its principles could [or should] guide their designs". For another, it seems as if the list of principles should follow "principles" rather than "design", as in something like: "Authors of protocol specifications in particular should invest time in understanding the REST model and consider the role its principles -- statelessness, clear assignment of roles to parties, uniform address space, and a limited, uniform set of verbs -- could play in their designs. ==== Section 1.2.1, second para: [[...The fact, for example, that the an image can be identified...]] "an image" rather than "the an image" ==== [[...PNG and SVG to evolve independent of...]] independently of ==== Section 1.2.2, second last para: [[For example, from early on in the Web, HTML agents followed the convention of ignoring unknown elements.]] "from early on in the Web" seems a little clumsy. How about "For example, HTML agents have historically followed..."? ==== Section 1.2.3: [[A user agent acts on behalf of the user and therefore is expected to help the user understand the nature of errors, and possibly overcome them. User agents that correct errors without the consent of the user are not acting on the user's behalf.]] Is "user agent" intended to be *any* kind of "agent" (human or software, as previously defined) acting on behalf of someone else (the "user", so far undefined), or just *software* that acts on behalf of a human? Also, the text seems to equate "act on behalf of the user" with that action necessarily being helpful, which is not necessarily the way "act on behalf of" is always interpreted. The real point would seem to be that user agents that correct errors in this way may in some sense be acting on the user's behalf, but they aren't helping the user by doing it. ==== Same section, third bullet: [[* An agent that encounters unrecognized content...]] Given the context, this seems a bit ambiguous, since it might be taken to refer to "user agent", as well as more generally to "agent" (assuming these are different; are they?) ==== Section 2: First para: [[Parties who wish to communicate must agree upon a shared set of identifiers and on their meanings.]] "identifiers (names for things)"? ==== Second para: [[It follows that a resource should be assigned a URI if a third party might reasonably want to link to it...]] Why "a third party" (what are the first two parties)? ==== Third para: [[Resources exist before URIs...]] This sounds like it should be "Resources exist independently of URIs". ==== [[Designers should expect that it will prove useful to be able to share a URI across applications, even if that utility is not initially evident.]] Why is the reference here to "designers"? Designers of what? This sounds like it should instead be "resource owners", as mentioned in the principle "URI assignment" just below. However, there is also a related issue, which is that the terms "resource owner", "URI owner", and "URI producer" are all used in further discussion. I can imagine situations in which these (or at least the first two) might have distinct meanings, but they don't necessarily seem to be used that way. If they are distinct, it would help to have precise definitions. If they aren't distinct, it would help to pick one term and use it consistently (along with some further explanation of why they are the same). In particular, "Resource owner" and "URI owner" would appear to be equivalent when discussing URIs that refer to resources with retrievable representations. However, given that URIs can be created to refer to other kinds of resources, it would seem that multiple URIs might be created to refer to the same resource, and those URIs would have different owners. For example, suppose I (the person Frank Manola) am the resource. It seems to me I can reasonably claim to be the owner of that resource (whether I have assigned myself a URI or not; recall that resources can have zero URIs). Independently, other people may create resources with retrievable representations (e.g., reports) that refer to me and, perhaps not knowing the URI I have assigned to myself (even if I *have* assigned one), can create URIs to refer to me (say, in RDF statements). It seems to me those other people can reasonably claim to be the owners of those latter URIs (e.g., they determine that those URIs denote me), even though they don't own the resource the URIs identify (me). Moreover, these other people (more effectively) can create URIs for the resources with retrievable representations (reports referring to, among other things, Frank Manola), and those resources (the reports) are distinct from the resource Frank Manola. In this case, it seems to me that those other people are the owners of both the resources (the reports) and the URIs that identify them. ==== [[Principle: URI assignment: A resource owner SHOULD assign a URI to each resource that others will expect to refer to.]] This seems as if it should read "A resource owner SHOULD assign a URI to each resource that the owner expects others will want to refer to." (How can others expect to refer to resources they don't necessarily know about?) ==== Section 2.1: [[URI producers should be conservative about the number of different URIs they produce for the same resource. For example, the parties responsible for weather.example.com should not use both "http://weather.example.com/Oaxaca" and "http://weather.example.com/oaxaca" to refer to the same resource; agents will not detect the equivalence relationship by following specifications. On the other hand, there may be good reasons for creating similar-looking URIs. For instance, one might reasonably create URIs that begin with "http://www.example.com/tempo" and "http://www.example.com/tiempo" to provide access to resources by users who speak Italian and Spanish.]] Why does the first sentence refer to "URI producers" that "produce" URIs rather than "resource owners" that "create" them (which would be more consistent with earlier text). I also note that words "assign", "create", and "produce" (and possibly others) are all used for what seems to be the same idea. Also, the rest of this illustration seems to have a funny interaction with the URI opacity principle in Section 2.5 (especially the discussion there about the travel example), since the Section 2.1 text above seems to suggest there is value in being able to convey information to an accessing "agent" (a human in this case) via the form of the URI itself (i.e., if URIs are to be totally opaque to the "agent", why would there be value in using one language over another?). Of course, this may be just another problem in allowing "agent" to refer to people. However, the problem seems somewhat more acute if the result of dereferencing URIs in different languages is the retrieval of the report in the corresponding languages because, while this kind of makes sense, it also invites determining the language of the report from the language of the URI. ==== Section 2.2: [[The requirement for URIs to be unambiguous demands that different agents do not assign the same URI...]] Now we have *agents* assigning URIs rather than, e.g., resource or URI owners. It's not clear that this is consistent with prior discussion. ==== [[The concept of URI ownership is especially visible in the case of the HTTP protocol, which enables the URI owner to serve authoritative representations of a resource.]] This text is pertinent to the point raised earlier about resource vs. URI ownership, and might be expanded on a bit to clarify that relationship. In particular, when dealing with URIs that have retrievable representations, it is straightforward to demonstrate ownership; non-owners can't determine what is returned when dereferencing such URIs, while owners can. ==== Section 2.3: [[URI ambiguity should not be confused with ambiguity in natural language. The English statement "'http://www.example.com/moby' identifies 'Moby Dick'" is ambiguous because one could understand the phrase "Moby Dick" to refer to distinct resources: a particular printing of this work, or the work itself in an abstract sense, or the fictional white whale, or a particular copy of the book on the shelves of a library (via the Web interface of the library's online catalog), or the record in the library's electronic catalog which contains the metadata about the work, or the Gutenberg project's online version]] This example illustrates an ambiguous natural language statement, but it's not clear that it doesn't also illustrate an ambiguous URI, since the text doesn't say anything about how example.org, or other parties citing http://www.example.com/moby, actually intepret it. ==== Section 2.3.1: [[In Web architecture, URIs identify resources. Outside the bounds of Web architecture specifications, URIs can be useful for other purposes, for example, as database keys...]] It seems to me this paragraph mixes a few things. Just because a URI is used as a database key doesn't necessarily mean it's being used for a different purpose. If a URI is used as a key in a relational table that associates metadata with the Web resources identified by those keys, and does so correctly (i.e., distinguishes between metadata about Nadia and metadata about her mailbox), it seems as if this is the *same* use of the URI (to identify a Web resource), even though it may also be used in the database to identify a distinct row in the table. Moreover, the database might exhibit URI ambiguity in the same way the Web might, e.g., by mixing metadata about both Nadia and her mailbox in the same row. At the same time, the use of "mailto:nadia@example.com" as an identifier for Nadia rather than her mailbox seems just as likely to occur in a Web context as in this database one (people seem to want to do it in RDF, for example; or is this not the part of the Web you're talking about?). Also, in the same paragraph: [[URI ambiguity arises a URI is used to identify two different Web resources.]] URI ambiguity arises *when* a URI... ==== Section 2.4: [[Because of these costs, if a URI scheme exists that meets the needs of an application, designers should use it rather than invent one.]] This sentence refers to "designers", but the "Good Practice" point below refers to "authors of specifications". Shouldn't the same terms be used in both places? ==== Section 2.5: [[It is tempting to guess the nature of a resource by inspection of a URI that identifies it. However, the Web is designed so that agents communicate resource state through representations, not identifiers.]] This is another place where including people in the definition of "agents" seems to create a possible difficulty. If agents include people, then people quite frequently communicate information about the nature of a resource by inspection of URIs, and it's very helpful. For example, "http://weather.example.com/oaxaca" certainly suggests that the resource it identifies has something to do with the weather in oaxaca (as is noted further on), and that's very useful information (e.g., when people pass those URIs around). That's certainly information about "the nature of a resource", and Internet Media Types aren't the only things relevant to people. This all, of course, reads much better if "agents" are restricted to software. Pursuing this point in the subsequent text: [[Agents making use of URIs MUST NOT attempt to infer properties of the referenced resource except as licensed by relevant specifications.]] This is good practice for software "agents". For people "agents", given the "must not", how do you propose to stop them? Further to this point, the text goes on: [[The example URI used in the travel scenario ("http://weather.example.com/oaxaca") suggests that the identified resource has something to do with the weather in Oaxaca. A site reporting the weather in Oaxaca could just as easily be identified by the URI "http://vjc.example.com/315". And the URI "http://weather.example.com/vancouver" might identify the resource "my photo album."]] This is certainly true. But while it's good practice for software to treat URIs opaquely, it seems to me that given the discussion in Section 2.1, which seems to license creating "descriptive" URIs in different languages to enable people speaking those languages to more easily access a resource (and which reflects the use of text in URIs as a means for conveying information to people), you might want to suggest that, given this "dual purpose" of URIs, it's *not* good practice to use the URI "http://weather.example.com/vancouver" to identify the resource "my photo album", even though one could, and it would be irrelevant to software. ==== Section 2.7.2: Reference [RDF10] cites the RDF M&S Recommendation, rather than the new Recommendation set, and should be updated (the OWL reference should be updated as well). If *one* of the new RDF documents is to be cited, I would suggest Concepts. ==== Section 3: First sentence: [[Communication between agents over a network about resources...]] Too many qualifying phrases here (e.g., "about resources" presumably qualifies "Communication", rather than "a network", but it's kind of distant...) ==== Section 3.3.1: [[Note that one can use a URI with a fragment identifier even if one does not have a representation available for interpreting the fragment identifier (one can compare two such URIs, for example). Parties that draw conclusions about the interpretation of a fragment identifier without retrieving a representation do so at their own risk; such interpretations are not authoritative.]] This is a place where some qualifying context about the nature of the Web to which this architecture applies would have been helpful. For example, suppose I have a collection of RDF or OWL statements having as subjects the URI "http://www.example.com/images/nadia#hat", and the RDF/OWL statements assert that the subject is of class "Hat" in some ontology, that it's blue, and so on. On one hand, it seems as if one could reasonably draw conclusions about the interpretation of this fragment identifier (or rather the whole URI including it) *from the RDF/OWL* without dereferencing the URI (using the URI to retrieve a representation, whose media type specifies the authoritative interpretation), assuming that the RDF/OWL itself is from a sufficiently "authoritative" (in some sense) representation somewhere. Saying "such intepretations are not authoritative" without any further qualification or discussion, while it makes perfect sense given the way the Web works now, doesn't seem to take such additional usage (which, after all, is described in W3C Recommendations) into account. ==== Section 3.3.2: It would be helpful if the text following the "story" explicitly answered the question posed in the "story". For example, if the idea here is that the fragment should always identify Nadia's hat in any graphic representation provided by dereferencing the URI containing the fragment, it would help to say that. ==== Section 3.4: First para: [[To give these parties the confidence that they are all talking about the same thing when they refer to "the resource identified by the following URI ..." the design choice for the Web is, in general, that the owner of a resource assigns the authoritative interpretation of representations of the resource.]] The text "owner of a resource" links to Section 2.2 titled "URI Ownership". So why say "owner of a resource" rather than "owner of a URI"? Also, Section 3.3 just got through telling us that if the URI contains a fragment identifier, then the Internet Media Type of the retrieved representation specifies the authoritative interpretation of the fragment identifier. I realize that in one case it's the authoritative interpretation *of the fragment* and in the other its the authoritative interpretation *of representations of the resource*, but the use of "authoritative interpretation" in both places (particularly when they're so close together) seems potentially confusing. ==== Section 3.4.1: [[User agents should detect such inconsistencies but should not resolve them without involving the user.]] Now the term is "user agent" rather than "agents". Is there some particular reason for distinguishing between these terms? ==== Section 3.5: [[Nadia's retrieval of weather information (an example of a read-only query or lookup) qualifies as a "safe" interaction; a safe interaction is one where the agent does not incur any obligation beyond the interaction. An agent may incur an obligation through other means (such as by signing a contract). If an agent does not have an obligation before a safe interaction, it does not have that obligation afterwards]] Here, "agent" is used in a sense where it might well be a person ("signing a contract"). Can software agents "incur obligations" in the sense used here? ==== [[Other Web interactions resemble orders more than queries.]] Is this "orders" in the sense of "placing an order", "that's an order, soldier", or both? ==== [[Principle: Safe retrieval Agents do not incur obligations by retrieving a representation.]] Shouldn't this be "should not" rather than "do not"? "Do not" suggests that it doesn't happen, rather than that it's incorrect if it does (as described in the next sentence). ==== Section 3.6: Following the story appears: [[The usefulness of a resource depends on good management by its owner. As is the case with many human interactions, confident interactions with a resource depend on stability and predictability. The value of a URI increases with the predictability of interactions using that URI. Avoiding unnecessary URI aliases is one aspect of proper resource management.]] While the last sentence above certainly seems true, it's not clear what it has to do with the story, since the sentence refers to URI aliases, but the story describes multiple uses of the *same* URI returning wildly different results. Is this supposed to refer to "URI ambiguity" instead? It's also not clear that the problem illustrated in the story is necessarily "inconsistent representation". Given the diagram in Section 1, which distinguishes the URI from the resource it identifies, it seems possible to distinguish (at least conceptually) between: (a) returning pages about weather information and auto insurance as inconsistent representations of the same resource, and (b) returning the same two pages as indicating that the URI owner has changed the resource the URI identifies ==== Section 3.6.3: It might help clarify the point made in this section if some examples of mistaken attempts to restrict the use of URIs were given, rather than just the building security analogy. Also, it's not clear whether or not the principle described here (and the further discussion in the "Deep Linking" finding) deals with all possible situations of this sort. For example, it certainly used to be the case (and may be the case now) that US Defense Department documents could not only have a security classification, but their *titles* might also have a security classification (that is, the *existence* of the document was classified). A classified document with an unclassified title could be referenced in the usual way, but a reader without the necessary clearance would be unable to access the referenced document (this would correspond to the situations already described). On the other hand, classifying the title of the document would prevent the reader from even seeing the reference without the necessary clearance. How would you suggest handling this situation (admittedly, opagueness of URIs would help!) ==== Section 4: First para [[The first data format used on the Web was HTML. Since then, data formats have grown in number. ]] The second sentence should probably say "...*Web* data formats..." (since data formats in general were growing well before HTML). ==== Second para: [[ Below we describe some characteristics of a data format make it easier to integrate into the Web architecture. ]] "...*that* make it easier..." ==== Section 4.2.1: [[There is typically a (long) transition period during which multiple versions of a format, protocol, or agent are simultaneously in use.]] This is another passage that reads strangely if "agent" is considered to include people (I know I'm not the same person I once was, but multiple versions simultaneously in use?!). ==== [[Good practice: Version information Format designers SHOULD provide for version information in language instances]] What are "language instances"? ==== Section 4.2.2: [[The policy sets expectations that the Working Group responsible for the namespace may modify it in any way until a certain point in the process ("Candidate Recommendation") at which point W3C constrains the set possible changes to the namespace in order to promote stable implementations.]] "...constrains the set *of* possible changes..." ==== Section 4.2.3: [[As part of defining an extensibility mechanism, a specification should set expectations about agent behavior in the face of unrecognized extensions.]] The following good practice then says [[Language designers SHOULD specify agent behavior in the face of unrecognized extensions.]] It's not clear that a specification "setting expectations about" agent behavior is the same as it "specifying" it. Why the difference in wording? ==== Section 4.2.4: Third bullet [[ * RDF allows well-defined mixing of vocabularies, and allows text and XML to be used as a data type values within a statement having clearly defined semantics. ]] "...allows text and XML to be used as data type values..." (delete the "a")? Within the same statement? What does "having clearly defined semantics" modify? Should this be "...within statements having clearly defined semantics"? ==== Section 4.5.2: [[XLink allows links to have multiple ends and to be expressed either inline or in "link bases" stored external to any or all of the resources identified by the links it contains.]] This sentence seems a tad complicated. ==== Section 4.5.3: [[How do the application designers ensure that there are no naming conflicts when they combine elements from different formats (for example, suppose that the "p" element is defined in two or more XML formats)? "Namespaces in XML" [XMLNS] provides a mechanism for establishing a globally unique name that can be understood in any context.]] I'd suggest rewriting this to avoid the rhetorical question. Also, this may or may not be the right place to cite this example, but the way the XML Schema data types define a namespace for those types, allowing them to be used in RDF and OWL, might also be cited. (This example illustrates the need sometimes to explicitly describe how URIs identifying language elements should be constructed using those namespace names). ==== Section 4.5.5: The link for "rdfmsQnameUriMapping-6" needs a fragment to identify the specific issue. ====
Received on Wednesday, 10 March 2004 17:18:12 UTC