- From: Graham Klyne <gk@ninebynine.org>
- Date: Fri, 05 Mar 2004 14:27:57 +0000
- To: public-webarch-comments@w3.org
Reviewing: http://www.w3.org/TR/2003/WD-webarch-20031209/ Modified: 08 December 2003 22:07:31 Generally, I think the document is looking in very good shape. Most of my comments are just editorial in nature. There are a very few comments that I regard as possibly more substantive, concerning: Section 3.4: Section 3.6.2: Section 4.2.4: I have included some suggested revisions not because I think they're necessarily better than the text already used, but to illustrate the points I am trying to raise. ... Section 1.2.4: [[ It is common for programmers working with the Web to write code that generates and parses these messages directly. It is less common, but not unusual, for end users to have direct exposure to these messages. This leads to the well-known "view source" effect, whereby users gain expertise in the workings of the systems by direct exposure to the underlying protocols. ]] It was not clear to me what is the intended significance of this with respect to Web Architecture. Suggest: explain the significance or drop this paragraph. [minor editorial] ... Section 2: [[ A URI must be assigned to a resource in order for agents to be able to refer to the resource. It follows that a resource should be assigned a URI if a third party might reasonably want to link to it, make or refute assertions about it, retrieve or cache a representation of it, include all or part of it by reference into another representation, annotate it, or perform other operations on it. ]] "or perform other operations on it" suggests a resource should be a very concrete thing. Suggest "or refer to it in some other way". [minor editorial] ... Section 2: [[ When a representation uses a URI (instead of a local identifier) as an identifier, then it gains great power from the vastness of the choice of resources to which it can refer. The phrase the "network effect" describes the fact that the usefulness of the technology is dependent on the size of the deployed Web. ]] The comment about "network effect" in the first para seems somewhat disjoint. What does it tell us about Web architecture? Suggest: "This vastness of choice gives rise to a "network effect", which refers to a technology's usefulness increasing more rapidly than the size of the network across which it is deployed" [minor editorial] ... Section 2: [[ A URI must be assigned to a resource in order for agents to be able to refer to the resource. It follows that a resource should be assigned a URI if a third party might reasonably want to link to it, make or refute assertions about it, retrieve or cache a representation of it, include all or part of it by reference into another representation, annotate it, or perform other operations on it. [...] Resources exist before URIs; a resource may be identified by zero URIs. However, there are many benefits to assigning a URI to a resource, including linking, bookmarking, caching, and indexing by search engines. Designers should expect that it will prove useful to be able to share a URI across applications, even if that utility is not initially evident. ]] There seems to be some overlap between these paragraphs. And I found the first sentence of the second paragraph to be potentially confusing. Suggest: a re-arrangement: [[ A URI must be assigned to a resource in order for agents to be able to refer to the resource. It follows that a resource should be assigned a URI if a third party might reasonably want to link to it, make or refute assertions about it, retrieve or cache a representation of it, include all or part of it by reference into another representation, annotate it, or refer to it in some other way. A resource may exist independently of whether or not it has a URI; one or more URIs may be used to identify a given resource. [...as before...] There are many benefits to assigning a URI to a resource, as noted above. Designers should expect that it will prove useful to be able to share a URI across applications, even if that utility is not initially evident. ]] [editorial] ... Section 2: [[ The scope of a URI is global; the resource identified by a URI does not depend on the context in which the URI appears (see also the section about URIs in other roles). Of course, what an agent does with a URI may vary. The TAG finding "URIs, Addressability, and the use of HTTP GET and POST" discusses additional benefits and considerations of URI addressability. ]] The term "global" here is not defined or qualified. Suggest "global across the Web". [editorial] ... Section 2.1: [[ ... For example, the parties responsible for weather.example.com should not use both "http://weather.example.com/Oaxaca" and "http://weather.example.com/oaxaca" to refer to the same resource; agents will not detect the equivalence relationship by following specifications. ... ]] and [[ ... Agents should not assume, for example, that "http://weather.example.com/Oaxaca" and "http://weather.example.com/oaxaca" identify the same resource, since none of the specifications involved states that the path part of an "http" URI is case-insensitive. ]] While correct, I felt this was potentially a little confusing. The first example did not seem well chosen to reflect the point I think is being made. Suggest: [[ ... For example, the parties responsible for weather.example.com should not use both "http://weather.example.com/Oaxaca" and "http://weather.example.com/Mexico?city=Oaxaca" to refer to the same resource; agents will not detect the equivalence relationship by following specifications. ... ]] Hmmm... maybe there's a third point to be made here, namely that the party responsible for some domain should avoid using different URIs with small, easily overlooked differences? [editorial] ... Section 2.2: [[ Hierarchical delegation of authority. This approach, exemplified by the "http" and "mailto" schemes, allows the assignment of a part of URI space to one party, reassignment of a piece of that space to another, and so forth. ]] While technically correct, I don't think 'mailto' is a useful example of hierarchical delegation of naming authority within a URI structure. I'd suggest 'ftp:' or 'urn:' or 'file:' or 'ldap:' [minor editorial] ... Section 2.3: [[ URI ambiguity should not be confused with ambiguity in natural language. ]] I'm not sure what this sentence is trying to say (what is meant here by "confused with"). From what follows, I think the intent is to say something like "justified by", in which case I think something like: [[ URIs should not be permitted the ambiguity that occurs in natural language. [...existing text...] This flexibility is not available to URIs, which should be defined to refer to a single concept. ]] [later] I ran across this from TimBL in one of the Tag IRC logs, which seems to capture the point more effectively. [[ Suggested text for 2.6: Whereas human communication tolerates such ambiguity, machine processing does not. Strictly, the above URI as identifies the information resource, some hypertext document. RDF applications which use it for describing properties of that page are in order; those who use its URL to directly assert properties of the whale are using it inconsistently. ]] -- http://www.w3.org/2003/07/22-tagmem-irc.html 22:06:17 [editorial] ... Section 2.4.1: [[ The use of unregistered URI schemes is discouraged for a number of reasons: ]] This doesn't seem to be strong enough. Suggest: [[ The use of unregistered URI schemes is not a permitted part of the Web architecture, for a number of reasons: ]] [substantial] ... Section 2.5: [[ Resource state may evolve over time. Requiring resource owners to change URIs to reflect resource state would lead to a significant number of broken links. For robustness, Web architecture promotes independence between an identifier and the identified resource. ]] I think a link to orthogonality (section 1.2.1) may be appropriate about here. [minor editorial] ... Section 3.1: [[ Although many URI schemes are named after protocols, this does not imply that use of such a URI will result in access to the resource via the named protocol. Even when an agent uses a URI to retrieve a representation, that access might be through gateways, proxies, caches, and name resolution services that are independent of the protocol associated with the scheme name. ]] As phrased, I find this to be at odds with the text that follows, cf. numbered items 4/5/6. Suggest replace "... use of such a URI will result ..." with "... use of such a URI will necessarily result ..." [editorial] ... Section 3.3.2: [[ For a given resource, an agent may have the choice between representation data in more than one data format (through HTTP content negotiation, for example). Since different data formats may define different fragment identifier semantics, it is important to note that by design, the secondary resource identified by a URI with a fragment identifier is expected to be the same across all representations. Thus, if a fragment has defined semantics in any one representation, the fragment is identified for all of them, even though a particular data format may not be able to represent it. ]] The term "by design" seems rather odd here. It seems to me that the (technical) design specifically does not achieve "the secondary resource identified by a URI with a fragment identifier is ... the same across all representations". I think the clause "by design" could be dropped without loss (or, maybe, replaced with something like "by intent"). [minor editorial] ... Section 3.2: [[ On the other hand, it is considered an error if the semantics of the fragment identifiers used in two representations of a secondary resource are inconsistent. ]] This seems a rather odd statement to make (specifically: "it is considered an error ...", because there is no specific way to determine if the would-be erroneous condition actually arises. Suggest: drop this paragraph; the intent is clear enough from the following good practice point. [editorial] ... Section 3.4: [[ Successful communication between two parties using a piece of information relies on shared understanding of the meaning of the information. Arbitrary numbers of independent parties can identify and communicate about a Web resource. To give these parties the confidence that they are all talking about the same thing when they refer to "the resource identified by the following URI ..." the design choice for the Web is, in general, that the owner of a resource assigns the authoritative interpretation of representations of the resource. ]] I recall that TimBL and Pat Hayes had a lengthy debate about something rather like this Thread starting: http://lists.w3.org/Archives/Public/www-tag/2003Jul/0022.html with some indication of consensus around: http://lists.w3.org/Archives/Public/www-tag/2003Jul/0316.html http://lists.w3.org/Archives/Public/www-tag/2003Jul/0344.html I am not sure that the above text really captures the subtlety of this discussion. As Pat Hayes noted: [[ >Note though that other non-RDF systems may and do use URIs. So the >principle can must be a general one of web architecture. Names are global in scope. OK, though (in the other branch of the discussion) I don't think this is going to be feasible, myself, if taken strictly. Still, I agree, its not a bad place to start, as long as we understand that we will eventually have to replace it with something more sophisticated. ]] -- http://lists.w3.org/Archives/Public/www-tag/2003Jul/0344.html [significant/editorial] ... Section 3.6: [[ Since Nadia finds the Oaxaca weather site useful, she emails a review to her friend Dirk recommending that he check out 'http://weather.example.com/oaxaca'. Dirk clicks on the link in the email he receives and is surprised to see his browser display a page about auto insurance. Dirk confirms the URI with Nadia, and they both conclude that the resource is unreliable. Although the managers of Oaxaca have chosen the Web as a communication medium, they have lost two customers due to ineffective resource management. ]] I think that "the managers of Oaxaca" should be "the managers of http://weather.example.com/". [editorial] ... Section 3.6.2: [[ There are strong social expectations that once a URI identifies a particular resource, it should continue indefinitely to refer to that resource; this is called URI persistence. URI persistence is a matter of policy and commitment on the part of authorities servicing URIs. The choice of a particular URI scheme provides no guarantee that those URIs will be persistent or that they will not be persistent. ]] The terminology "authorities servicing URIs" seems to be not consistent with that used elsewhere; e.g. "authority responsible for a resource" at the start of section 3.6.1., and "URI producers" in section 2.1. As I draft this, I think there's maybe a deeper omission here: a lack of separation between the owner or authority responsible for a resource, and the authority for a particular part of URI space that may be used to identify a resource. (cf. also my previous comment above.) If not clarified, I think this could be a source of continuing miscommunication. [significant/editorial] ... Section 3.6.2: [[ Inconsistent representations served. Note the difference between a resource owner changing representations predictably in light of the nature of the resource (the changing weather of Oaxaca) and the owner changing representations arbitrarily. ]] The term "predictably" here seems an odd choice given the nature of the illustrative example (thinks... butterflies flapping in Beijing, etc.). Suggest: rationally. [minor editorial] ... Section 3.6.2: [[ Improper use of content negotiation, such as serving two images as equivalent through HTTP content negotiation, where one image represents a square and the other a circle. ]] This doesn't seem like a particularly helpful example, because in some contexts a circle and square may be genuinely different representations of a common underlying concept (e.g. alternative GraphViz presentations of an RDF graph). Suggest: "... such as serving two images as equivalent through HTTP content negotiation, where one image represents a weather map of Oaxaca and the other a street map of Chihuahua" [minor editorial] ... Section 3.6.2: I made a note to myself at the end of this section: "Maye add a comment about metadata consistency and problems that may occur of a resource is not persistent" but now I not sure what it is I meant by this. I think I may have been thinking about a case where RDF is used to describe some resource, but the resource whose representation is served at a given URI is allowed to change over time. Then, any RDF that uses said URI to describe the resource at some point in time becomes completely incorrect if the URI is assigned to a different resource. Is it worth trying to make a point that the value of RDF descriptions depends to a considerable extent on the stability/persistence of the URIs used? [Significant?] ... Section 4.*, esp. 4.2.*: I notice that in this section, the terminology used slips from "data format" or just "format" to "language", without any explanation that they mean pretty much the same thing in this context (or, if they don't, without any explanation of the difference). ... Section 4.2.4: [[ RDF allows well-defined mixing of vocabularies, and allows text and XML to be used as a data type values within a statement having clearly defined semantics. ]] I couldn't figure precisely what this was trying to say. [editorial] ... Section 4.2.4: [[ Note however, that for general XML there is no semantic model that defines the interactions within XML documents with elements and/or attributes from a variety of namespaces. Each application must define how namespaces interact and what effect the namespace of an element has on the element's ancestors, siblings, and descendants. ]] I think that there may be an important point to be made here about the relationship of the "Semantic Web" with what I might call the "Hypertext Web" upon which it is built, that the "Semantic Web" provides a well-defined way to combine statements that draw upon an arbitrary number of different namespaces. (I regard this as one of the more important contributions of the Semantic Web.) Maybe this is what the subject of my previous comment was trying to say? [significant] ... Section 4.3: [[ Note that when content, presentation, and interaction are separated by design, agents need to recombine them. There is a recombination spectrum, with "client does all" at one end and "server does all" at the other. There are advantages to each: recombination on the server allows the server to send out generally smaller amounts of data that can be tailored to specific devices (such as mobile phones). However, such data will not be readily reusable by other clients and may not allow client-side agents to perform useful tasks unanticipated by the author. When a client does the work of recombination, content is likely to be more reusable by a broader audience and more robust. However, such data may be of greater size and may require more computation by the client. ]] I think there are also some scalability concerns that might be mentioned here; e.g. an application is, in general, more likely to operate at Internet scale if as much processing as possible is performed by user agents (often, clients) rather than centralized processing agents (often, servers). ... Section 4.4: [[ Language designers SHOULD incorporate hypertext links into a data format if hypertext is the expected user interface paradigm. ]] I found this statement a bit puzzling: many data formats have nothing to do with a user interface; the preceding text says "What agents do with a hypertext link is not constrained by Web architecture and may depend on application context". So what is this trying to say? ... Section 4.1.1: I found the text of this section less clear than was offered in an email from TimBL: [[ It is important to distinguish between the string which identifies something and the BNF for a string in a document which is used to specify the first string. The first is an identifier. The second has been called a "reference". A reference can use a relative form. ]] -- http://lists.w3.org/Archives/Public/www-tag/2002Sep/0043.html [editorial] ... Section 4.5: [[ ... While it is directed at Internet applications with specific reference to protocols, the discussion is generally applicable to Web scenarios as well. ]] I am uneasy with this phrasing, as it seems to suggest the Web is somehow apart from the Internet. Suggest: [[ ... While it is directed at Internet applications with specific reference to protocols, the discussion is also applicable to Web application formats. ]] [minor editorial] ... Section 4.5.1: Another reference with discussion relating to this topic of choosing to use XML can be found here: http://www.ietf.org/rfc/rfc3117.txt , section 5.1 [for information] ... Section 4.5.7: [[ These Internet Media Types create two problems: First, for data identified as "text/*", Web intermediaries are allowed to "transcode", i.e., convert one character encoding to another. Transcoding may make the self-description false or may cause the document to be not well-formed. ]] The statement "Web intermediaries are allowed to "transcode" ..." seemed to me to be rather broadly applied here. Is there a specification that asserts this in general? If not, I think the comment should be constrained to something like "in some Web applications, intermediaries are allowed to transcode ..." [editorial] ... Section 4.5.7: [[ Second, representations whose Internet Media Types begin with "text/" are required, unless the charset parameter is specified, to be considered to be encoded in US-ASCII. Since the syntax of XML is designed to make documents self-describing, it is good practice to omit the charset parameter, and since XML is very often not encoded in US-ASCII, the use of "text/" Internet Media Types effectively precludes this good practice. ]] I found this confusing, in that I wasn't clear what it was that was being said, and I couldn't see how it relates to the good practice point that immediately follows it. [editorial] ... That's all, folks! #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
Received on Friday, 5 March 2004 09:38:10 UTC