- From: Ian B. Jacobs <ij@w3.org>
- Date: Tue, 02 Nov 2004 00:05:22 -0600
- To: Norman.Walsh@Sun.COM
- Cc: www-archive@w3.org
- Message-Id: <1099375521.26465.160.camel@seabright>
Norm, Here are my comments about the 28 October 2004 Editor's Draft of Webarch: http://www.w3.org/2001/tag/2004/webarch-20041028/ I think all of these are editorial even if some changes involve more changing more bytes than others. I look forward to discussing them with you. - Ian --------------- Global comments --------------- s/web/Web s/re-use/reuse/ [Both are used; I picked the more common one] --------------- Status section --------------- See draft: http://lists.w3.org/Archives/Member/tag/2004Nov/0006 |1. Introduction [snip] |This scenario illustrates the three architectural bases of |the Web that are discussed in this document: | | 1. | | Identification (§2). URIs are used to identify | resources. In this travel scenario, the resource is a | periodically updated report on the weather in Oaxaca, and | the URI is \u201chttp://weather.example.com/oaxaca\u201d. | 2. | | Interaction (§3). Web agents communicate using | standardized protocols that enable interaction through | the exchange of messages that adhere to a defined syntax s/that/which (to avoid two "that"s). | and semantics. By entering a URI into a retrieval dialog | or selecting a hypertext link, Nadia tells her browser to | perform a retrieval action for the resource identified | by the URI. In this example, the browser sends an HTTP | GET request add: "(part of the HTTP protocol)" | to the server at "weather.example.com", | via TCP/IP port 80, and the server sends back a message | containing what it determines to be a representation | of the resource as of the time that representation was | generated. Note that this example is specific to hypertext | browsing of information\u2014other kinds of interaction | are possible, both within browsers and through the use | of other types of Web agent; our example is intended | to illustrate one common interaction, not define the | range of possible interactions or limit the ways in | which agents might use the Web. [snip] |The following illustration shows the relationship between |identifier, resource, and representation. | |A resource (Oaxaca Weather Info) is identified by a particular |URI and is represented by pseudo-HTML content | |In the remainder of this document, we highlight important |architectural points regarding Web identifiers, protocols, and |formats. We also discuss some important general architectural |principles (§5) How general are they? Are the general software architecture principles? Building architecture? I propose to limit to "some important general software architecture principles" |in the context of the Web. Change "in the context of the Web" to "and how they apply to the Web." |1.1. About this Document |This document describes the properties we desire of the |Web and the design choices that have been made to achieve |them. It promotes re-use I think it should be "promotes the reuse". |of existing standards when suitable, |and gives guidance on how to innovate in a manner consistent |with Web architecture. [snip] |1.1.2. Scope of this Document | |This document presents the general architecture of the |Web. Other groups inside and outside W3C also address |specialized aspects of Web architecture, including |accessibility, quality assurance, internationalization, device |independence, and Web Services. The section on Architectural |Specifications (§7.1) includes references to these related |specifications. | |This document strives for a balance between brevity and |precision while including illustrative examples. TAG findings |are informational documents that complement the current document |by providing more detail about selected topics. This document |includes some excerpts from the findings. Since the findings |evolve independently, this document also includes references I think s/also/only in the above sentence. |to approved TAG findings. For readability, change the whole thing to "this document only includes references to findings that the TAG has approved." | For other TAG issues covered by this |document but without an approved finding, references are to |entries in the TAG issues list. | |Many of the examples in this document that involve human |activity suppose the familiar Web interaction model where To tie this back to the introduction, after model add "(illustrated at the beginning of the Introduction)". |a person follows a link via a user agent, the user agent |retrieves and presents data, the user follows another link, |etc. This document does not discuss in any detail other |interaction models such as voice browsing (see, for example, |[VOICEXML2]). The transition between the previous sentence and the next is too abrupt. I propose to add the following filler: "The choice of interaction model may have an impact on expected agent behavior." [snip] |2. Identification | |In order to communicate internally, a community agrees (to a |reasonable extent) on a set of terms and their meanings. Since |its inception, one goal of the Web has been to build a global |community in which any party can share information with any |other party. I believe the previous sentence has an unclear subject, since it begins "since its inception" and then continues "one goal". I propose the following: "Since the inception of the Web, many people have sought to make it a global community in which any party can share information with any other party." [snip] | 2.1. Benefits of URIs | |The choice of syntax for global identifiers is somewhat |arbitrary; it is their global scope that is important. The |Uniform Resource Identifier, [URI], has been successfully |deployed since the creation of the Web. There are substantial |benefits to participating in the existing network of URIs, |including linking, bookmarking, caching, and indexing by search |engines, and there are substantial costs to creating a new |identification system that has the same properties as URIs. | |Good practice: Identify with URIs | |To benefit from and increase the value of the World Wide Web, |agents should provide URIs as identifiers for resources. | |A resource should have an associated URI if another party s/associated URI/assigned URI/ [snip] | 2.2. URI/Resource Relationships | |By design a URI identifies one resource. Add: The phrase "URI assignment" refers to the association of a URI with the resource it identifies. Below we discuss how this is accomplished. Then start a new paragraph. [snip] |[URI] is an agreement about how the Internet community |allocates names and associates them with the resources |they identify. URIs are divided into schemes (§2.4) that |define, via their scheme specification, the mechanism |by which scheme-specific identifiers are associated with |resources. s/associated with resources/assigned to resources/ | For example, the "http" URI scheme ([RFC2616]) uses |DNS and TCP-based HTTP servers for the purpose of identifier |allocation and resolution. As a result, identifiers such as |"http://example.com/somepath#someFrag" often take on meaning |through the community experience of performing an HTTP GET |request on the identifier and, if given a successful response, |interpreting the response as a representation of the identified |resource. (See also Fragment Identifiers (§2.6).) Of course, |a retrieval action like GET is not the only way to obtain |information about a resource. One might also publish a document |that purports to define the meaning of a particular URI. These |other sources of information may suggest meanings for such |identifiers, but it's a local policy decision whether those |suggestions should be heeded. | |Just as one might wish to refer to a person by different names |(by full name, first name only, sports nickname, romantic |nickname, and so forth), Web architecture allows the association |of more than one URI with a resource. URIs that identify the |same resource are called URI aliases. The section on URI aliases |(§2.3.1) discusses some of the potential costs of creating |multiple URIs for the same resource. | |Many sections of this document s/Many sections of this document/Several of the sections that follow/ |address questions about the relationship between URIs and resources, |including: | * How much can I tell about a resource by inspection of | a URI that identifies it? See in particular the sections | on URI schemes (§2.4) and URI opacity (§2.5). * Who | determines what resource a URI identifies? See the section | on URI allocation (§2.2.2). * Can the resource identified | by a URI change over time? See in particular the sections | on URI persistence (§3.5.1) and representation management | (§3.5). * Since more than one URI can identify the | same resource, how do I know which URIs identify the same | resource? See in particular the sections on URI comparison | (§2.3) and assertions that two URIs identify the same | resource (§2.7.2). |2.2.1. URI collision [The following rewrite of these sections is a modification of that proposed here: http://lists.w3.org/Archives/Member/tag/2004Nov/0008.html] As discussed above, a URI identifies one resource. Using the same URI to directly identify different resources produces a URI collision. Collision often imposes a cost in communication due to the effort required to resolve ambiguities. Suppose, for example, that one organization makes use of a URI to refer to the movie The Sting, and another organization uses the same URI to refer to a discussion forum about The Sting. To a third party, aware of both organizations, this collision creates confusion about what the URI identifies, undermining the value of the URI. If one wanted to talk about the creation date of the resource identified by the URI, for instance, it would not be clear whether this meant "when the movie was created" or "when the discussion forum about the movie was created." One way to help avoid URI collisions is to communicate effectively what resource a URI identifies. The following section examines approaches for establishing the authoritative source of information about what resource a URI identifies. URIs are sometimes used for [161]indirect identification (§2.2.3). This does not necessarily lead to URI collisions. 2.2.2. URI allocation Social and technical solutions have been devised to help avoid URI collisions. The success or failure of these different approaches depends on the extent to which there is consensus in the Internet community on abiding by the defining specifications. One approach for helping to avoid URI collisions is to divide up URI space (and the associated rights and responsibilities) among different social entities, be they people, organizations, or specifications. This is called URI allocation. Once an entity has been allocated URIs, that entity is said to be the "owner" of those URIs. URI ownership gives the relevant social entity certain rights, including the right to: 1. delegate ownership of some or all owned URIs to another owner; 2. assign owned URIs to resources. Note that the owner of a URI is not necessarily the owner of the resource identified by the URI. 3. associate a namespace policy with the URI if the URI is a namespace URI, and to publish the authoritative namespace document. In the case where the social entity that owns a URI is a specification, ownership ultimately lies with the community that maintains the specification. 2.2.2.1 URI allocation in practice By social convention, URI allocation starts with the IANA URI scheme registry [IANASchemes], itself a social entity. This registry divides URI space according to URI scheme and delegates ownership to IANA-registered URI scheme specifications. Some URI scheme specifications further delegate ownership to subordinate registries or to other nominated owners, who may further delegate ownership. Some allocation approaches adopted by URI scheme specifications include the following: 1) http URI scheme. The approach taken for the "http" URI scheme, for example, follows the pattern whereby the Internet community delegates authority, via the IANA URI scheme registry and the DNS, over a set of URIs with a common prefix to one particular owner. One consequence of this approach is the Web's heavy reliance on the central DNS registry. 2) URN Syntax scheme. The URN Syntax scheme [RFC2141] delegates ownership of portions of URN space to URN Namespace specifications which themselves are registered in an IANA-maintained registry of URN Namespace Identifiers. 3) data URL scheme. The specification for the data URL (sic) scheme [RFC2397] specifies that the resource identified by a data scheme URI has only one possible representation. The representation data makes up the URI that identifies that resource. Thus, the specification itself determines how data URIs are allocated; no delegation is possible. 4) Other schemes (such as "news:comp.text.xml") rely on a social process. URI owners are responsible for avoiding URI collisions. Thus, if a URI scheme specification does provide for the delegation of individual or organized sets of URIs, it should take pains to ensure that ownership ultimately resides in the hands of a single social entity. Allowing multiple owners increases the likelihood of URI collisions. URI owners may organize or deploy infrastruture to ensure that representations of associated resources are available and, where appropriate, interaction with the resource is possible through the exchange of representations. There are social expectations for responsible representation management (§3.5) by URI owners, discussed below. Additional social implications of URI ownership are not discussed here. See TAG issue siteData-36, which concerns the expropriation of naming authority. | 2.2.3. Indirect Identification | |To say that the URI "mailto:nadia@example.com" identifies both |an Internet mailbox and Nadia, the person, introduces a URI |collision. However, we can use the URI to indirectly identify |Nadia. Identifiers are commonly used in this way. | |Listening to a news broadcast, one might hear a report on |Britain that begins, "Today, 10 Downing Street announced |a series of new economic measures." Generally, "10 Downing |Street" identifies the official residence of Britain's Prime |Minister. In this context, the news reporter is using it (as |English rhetoric allows) to indirectly identify the British |government. Similarly, URIs identify resources, but they can |also be used in many constructs to indirectly identify other |resources. Globally adopted assignment policies make some |URIs appealing as general-purpose identifiers. Local policy |establishes what they indirectly identify. | |Suppose that nadia@example.com is Nadia's email address. The |organizers of a conference Nadia attends might use |"mailto:nadia@example.com" to refer indirectly to her (e.g., |using s/using/by using | the URI as a database key in their database of conference |participants). This does not introduce a URI collision. |2.3. URI Comparisons [snip] |2.3.1. URI aliases | |Although there are benefits (such as naming flexibility) to |URI aliases, there are also costs. URI aliases are harmful |when they divide the Web of related resources. A corollary |of Metcalfe's Principle (the "Network Effect") In the intro, it appears as "network effect"; pick one. It may also be possible to delete the parenthetical and have "Metcalfe's Principle" link back to the explanation in the intro. |is that the |value of a given resource can be measured by the number and |value of other resources in its network neighborhood, that is, |the resources that link to it. [snip] |2.7.1. Internationalized identifiers | |The integration of internationalized identifiers (i.e., |composed of characters beyond those allowed by [URI]) into the |Web architecture is an important and open issue. See TAG issue |IRIEverywhere-27 for discussion about work going on in this |area. Should something be said here about the advancement of the IRI spec? Or should IRIEverywhere-27 be where the status is reported? [snip] | 3. Interaction [snip] |3.1. Using a URI to Access a Resource [snip] |Many URI schemes define a default interaction protocol for |attempting access to the identified resource. That interaction |protocol is often the basis for allocating identifiers within Make "allocating identifiers a link back to the section on allocation of URIs. |that scheme, just as "http" URIs are defined in terms of |TCP-based HTTP servers. However, this does not imply that |all interaction with such resources is limited to the default |interaction protocol. For example, information retrieval systems |often make use of proxies to interact with a multitude of |URI schemes, such as HTTP proxies being used to access "ftp" |and "wais" resources. Proxies can also to provide enhanced |services, such as annotation proxies that combine normal |information retrieval with additional metadata retrieval |to provide a seamless, multidimensional view of resources |using the same protocols and user agents as the non-annotated |Web. Likewise, future protocols may be defined that encompass |our current systems, using entirely different interaction |mechanisms, without changing the existing identifier schemes. Add a cross-reference to the general architecture section on orthogonality. |3.1.1. Details of retrieving a representation | |Dereferencing a URI generally involves a succession of steps |as described in multiple specifications and implemented by |the agent. The following example illustrates the series |of specifications that govern the process when a user Add an "s" to govern. I think we are saying "illustrates the series that governs the process..." |agent is instructed to follow a hypertext link (§4.4) |that is part of an SVG document. In this example, the URI is |"http://weather.example.com/oaxaca" and the application context |calls for the user agent to retrieve and render a representation |of the identified resource. [snip] | 7. Section 1.4 | of [RFC2616] states "HTTP communication usually takes place | over TCP/IP connections." This example does not address | that step in the process, or other steps such as Domain | Name System (DNS) resolution. s/or other/nor other/ |3.2. Representation Types and Internet Media Types | |A Representation is data that encodes information about resource |state. Representations do not necessarily describe the resource, |or portray a likeness of the resource, or represent the resource |in other senses of the word "represent". | |Representations of a resource may be sent or received using |interaction protocols. These protocols in turn determine the |form in which representations are conveyed on the Web. HTTP, |for example, provides for transmission of representations as |octet streams typed using Internet media types [RFC2046]. | |Just as it is important to reuse existing schemes whenever |possible, there are significant benefits to using media |typed octet streams for representations even in the unusual |case where a new scheme and associated protocol is to be |defined. It is unclear to me at this point in reading what type of scheme we are talking about. I propose to change both instances to "URI scheme". [snip] |3.2.1. Representation types and fragment identifier semantics | |The Internet Media Type defines the syntax and semantics of the |fragment identifier, if any, that may be used in conjunction |with a representation. After "of the fragment identifier" add "(introduced in §2.6)". That will allow the deletion at the end of this section of the orphan "See also Fragment Identifiers (§2.6)." |Story | |In one of his XHTML pages, Dirk creates a hypertext |link to an image that Nadia has published on |the Web. He creates a hypertext link with <a |href="http://www.example.com/images/nadia#hat">Nadia's |hat</a>. Emma views Dirk's XHTML page in her Web browser |and follows the link. The HTML implementation in her |browser removes the fragment from the URI and requests |the image "http://www.example.com/images/nadia". Nadia |serves an SVG representation of the image (with |Internet media type "image/svg+xml"). Emma's Web browser |starts up an SVG implementation to view the image. It |passes it the original URI including the fragment, |"http://www.example.com/images/nadia#hat" to this |implementation, causing a view of the hat to be displayed |rather than the complete image. | |Note that the HTML implementation in Emma's browser did not |need to understand the syntax or semantics of the SVG fragment |(nor does the SVG implementation have to understand HTML, |WebCGM, RDF ... fragment syntax or semantics; it merely had to |recognize the # delimiter from the URI syntax [URI] and remove |the fragment when requesting the resource). Change "requesting the resource" with "accessing the resource" or "dereferencing the URI". I don't believe we ever say "request a resource". |See also Fragment Identifiers (§2.6). Delete the preceding orphan as proposed above. [snip] | 3.3. Inconsistencies between Representation Data and Metadata | |Successful communication between two parties depends |on a reasonably shared understanding of the semantics of |exchanged messages, both data and metadata. At times, there |may be inconsistencies between a message sender's data and |metadata. For instance, examples that have been observed in |practice of inconsistencies between representation data and |metadata include: "For instance, examples" sounds like one too many instances/examples. I propose instead: "Examples, observed on the Web, of inconsistencies between representation data and metadata include:" The commas aren't strictly necessary but it may help people avoid parsing the sentence as "observed on the Web of inconsistencies..." [snip] |The TAG finding "Authoritative Metadata" discusses in more |detail how to handle this type of inconsistency and how server |configuration can be used to avoid it. Change "this type of" to "data/metadata". |3.4. Safe Interactions [snip] |The fact that URI retrieval is safe does not imply that all |safe interactions must be done through URI retrieval. I don't believe we ever talk about "URI retrieval". Change to: "The fact that following a hypertext link is safe does not imply that all safe interactions must be done through hypertext links." That's not exactly the same thing, but it is a much better transition in light of the preceding text. Another possibility is to generalize and say: "The fact that some protocol methods are commonly used for safe interactions (such as HTTP GET) does not mean that all safe interactions must be carried out via these methods." [snip] |3.4.1. Unsafe interactions and accountability | |Story | |Nadia pays for her airline tickets online (through a POST |interaction as described above). She receives a Web page with |confirmation information and wishes to bookmark it so that she |can refer to it when she calculates her expenses. Although |Nadia can print out the results, or save them to a file, |she would also like to bookmark them. | |Transaction requests and results are valuable resources, |and like all valuable resources, it is useful to be able |to refer to them with a persistent URI (§3.5.1). However, |in practice, Nadia cannot bookmark her commitment to pay |(expressed via the POST request) or the airline company's |acknowledgment and commitment to provide her with a flight |(expressed via the response to the POST). I think the previous sentence should explain more why this is the case. What about something as simple as: "However, because of how most deployed agents operate, Nadia cannot bookmark her commitment to pay (expressed via the POST request) or the airline company's acknowledgment and commitment to provide her with a flight (expressed via the response to the POST)." Also, change "or the airline company's" to "nor the airline company's". [snip] |3.5. Representation Management | |Story | |Since Nadia finds the Oaxaca weather site useful, she emails |a review to her friend Dirk recommending that he check out |'http://weather.example.com/oaxaca'. Dirk clicks on the |resulting hypertext link in the email he receives and is |frustrated by a 404 (not found). Dirk tries again the next day |and receives a representation with "news" that is two-weeks |old. He tries one more time the next day only to receive |a representation that claims that the weather in Oaxaca is |sunny, even though his friends in Oaxaca tell him by phone |that in fact it is raining. Dirk and Nadia conclude that the |URI owners are unreliable or unpredictable. Although the URI |owner has chosen the Web as a communication medium, the owner |has lost two customers due to ineffective resource management. s/resource management/representation management. While "resource management" makes some sense, the title of the section is "Representation Management" and we don't say "resource management" anywhere else in the document. [snip] |An application developer or specification author SHOULD NOT |require networked retrieval to representations each time they |are referenced. s/to/of/ Also, I think it would be clearer if rewritten as: "An application developer or specification author SHOULD NOT require networked retrieval of a representation each time the representation is referenced." This is the first time I've heard of "referencing a representation" but I guess it's ok. [snip] |3.5.3. Supporting Navigation [snip] |Interactions conducted with HTTP POST (where HTTP GET could |have been used) also limit navigation possibilities. The user |cannot create a bookmark or share the URI with others since |the URI seen by the user does not change as the user moves |from page to page. The preceding sentence is awkward and does not clearly state that what it's talking about is what the user perceives in the GUI. I propose a more functional explanation: "Because most server managerse do not assign URIs to HTTP POST results, a user cannot bookmark the results or share them with others by pasting a URI in an email message." [snip] s/evolves/evolve in 4.2 at the end of the second paragraph. |4.2.2. Versioning and XML namespace policy | | |Note that since namespace names are URIs, the owner of a |namespace URI has the authority to decide the namespace |change policy. Note that I have included this as an example of a URI owner's right in the rewrite of section 2.2.2. [snip] |4.3. Separation of Content, Presentation, and Interaction [snip] |Of course, it may be desirable to limit the audience. The word "audience comes as a surprise here. I propose instead (which ties into earlier text): "Of course, it may be desirable not to reach the widest possible audience. |Designers |should consider appropriate technologies, such as encryption |and access control (§3.5.2), for limiting the audience. s/limiting the audience/limiting who has access to content/ [snip] |4.4. Hypertext | |A defining characteristic of the Web is that it allows |embedded references to other resources via URIs. The |simplicity of creating hypertext links using absolute URIs (<a |href="http://www.example.com/foo">) and relative URI references |(<a href="foo"> and <a href="foo#anchor">) is partly (perhaps |largely) responsible for the success of the hypertext Web as |we know it today. | |When one resource (representation) refers to another |resource with a URI, this constitutes a link between the two |resources. Additional metadata may also form part of the link |(see [XLink10], for example). Note: In this document, the |term "link" generally means "relationship", not "physical |connection". | |Good practice: Link identification | |A specification SHOULD provide ways to identify links to other |resources and to secondary resources (via fragment identifiers). I think "to other resources and to secondary resources (via fragment identifiers)" is confusing. I propose instead: "A specification SHOULD provide ways to identify links to other resources, including to secondary resources (via fragment identifiers)." |Formats that allow content authors to use URIs instead of |local identifiers promote the network effect: the value of |these formats grows with the size of the deployed Web. | |Good practice: Web linking | |A specification SHOULD allow Web-wide linking, not just internal |document linking. | |Good practice: Generic URIs | |A specification SHOULD allow content authors to use URIs |without constraining them to a limited set of URI schemes. | |What agents do with a hypertext link is not constrained by |Web architecture and may depend on application context. Users |of hypertext links expect to be able to navigate links among |representations. Change "to navigate links among representations" to "to navigate among representations by following hyperlinks". [snip] |4.5.2. Links in XML | |Sophisticated linking mechanisms have been invented for XML |formats. XPointer allows links to address content that does |not have an explicit, named anchor. [XLink] is an appropriate |specification for representing links in hypertext (§4.4) XML |applications. XLink allows links to have multiple ends and to |be expressed either inline or in "link bases" stored external to |any or all of the resources identified by the links it contains. | |Designers of XML-based formats should consider using XLink and, |for defining fragment identifier syntax, using the XPointer |framework and XPointer element() Schemes. | |XLink is not the only linking design that has been proposed |for XML, nor is it universally accepted as a good design. See |also TAG issue xlinkScope-23. Merge the preceding two paragraphs (and make other changes discussed at the 1 Nov transition call). |4.5.3. XML namespaces |The purpose of an XML namespace (defined in [XMLNS]) is to |allow the deployment of XML vocabularies (in which element and |attribute names are defined) in a global environment and to |reduce the risk of name collisions in a given document when |vocabularies are combined. For example, the MathML and SVG |specifications both define the set element. Although XML data |from different formats such as MathML and SVG can be combined |in a single document, in this case there could be ambiguity |about which set element was intended. XML namespaces reduce |the risk of name collisions by taking advantage of existing |systems for allocating globally scoped names: Link "allocating globally scoped names" to section 2.2.1. [snip] |4.5.4. Namespace documents [snip] |Another benefit of using URIs to build XML namespaces is that |the namespace URI can be used to identify an information |resource that contains useful information, machine-usable |and/or human-usable, about terms in the namespace. This type |of information resource is called a namespace document. When |a namespace URI owner provides a namespace document, it is |authoritative for the namespace. Note that I have included this as an example of a URI owner's right in the rewrite of section 2.2.2. -- Ian Jacobs (ij@w3.org) http://www.w3.org/People/Jacobs Tel: +1 718 260-9447
Received on Tuesday, 2 November 2004 06:05:32 UTC