- From: Henrik Frystyk Nielsen <frystyk@microsoft.com>
- Date: Mon, 19 Jun 2000 13:14:32 -0700
- To: <XML-uri@w3.org>
- Cc: "Andrew Layman" <andrewl@microsoft.com>, "David Turner" <dturner@microsoft.com>
This is a bit lengthy mail with two parts: * Properties of URIs vs URI spaces * Relationship to the proposal [5] that we sent out The reason is that I think we have to internalize the choices that we have and what they mean before we can get to any agreement so here goes: URIs and URI spaces ------------------- Many if not most of the discussions on this list talk about desired properties of a URI space used for identifying XML namespaces. In order for us to make progress in this discussion, it is essential to keep separate properties of the URI syntax from any of the properties of the URI space that it is used to encode. URIs don't define whether a URI space has properties like supporting indirection, being case sensitive or not, supporting relative URIs etc. RFC 2396 defines the *syntax* for encoding common properties but that doesn't mean that URIs *define* these properties As for what you want from a name, there are really only five choices: 1) You can use centrally assigned identifiers that don't support indirection (TCP port numbers and centrally agreed upon MIME header fields) 2) You can use decentrally assigned, non-unique identifiers that don't support indirection (general MIME header fields) 3) You can use decentrally assigned, unique identifiers that don't support indirection (GUIDs) 4) You can use decentrally assigned, unique identifiers that do support indirection (DNS hostnames) but only provides one result of the indirection (an IP address) 5) You can use decentrally assigned, unique identifiers that do support indirection and an open-ended set of results from that indirection (a document). Examples of URI spaces that support this are "http:" and "ftp:". URIs allow us to encode names from *all* these categories - but again, it doesn't mean that it *forces* which you have to use: * If you want a centralized name, pick 1) * If you want a decentralized name, pick 2)-5) * If you don't want indirection, pick 2) or 3) * If you don't want case sensitivity, pick 3) * If you do want indirection on the Internet, pick 5) * If you don't want relative URIs, pick one that doesn't support it (for example GUIDs) However, once you have picked, you have to live with the properties of that namespace but that has nothing to do with whether you encode it as URIs or not. The problem Daniel brings up is *not* a basic property of relative URIs but can happen in any decentralized system that supports indirection. It is inherently impossible to guarantee that the rule in section 5.3 about uniqueness of attributes is detected in all cases. Take for example this slightly different version of Daniel's example: ----------------- <x xmlns:n1="http://www.example.org/a" xmlns:n2="http://www.example.com/a"> <test n1:y="1" n2:y="2"/> </x> ----------------- This looks like a completely valid example, but let's say that I go to "http://www.example.org/a" and it gives back a redirect to "http://www.example.com/a". This is the exact same problem that Daniel pointed out but in this scenario, it doesn't depend on the location of the document. Does this mean that my document suddenly is invalid or is it even something that we should expect to ever be detected? Clearly it isn't. Instead of using the uniqueness of attributes as a binary decision between whether a document is correct or not, we should instead note that there may be times that inconsistencies can happen and that yes, these are faults, but that these may not be detected. The discussion of whether to limit the properties of the URI spaces that URIs can be used to encode (forbid relative URIs, forbid indirection etc.) is really a discussion of what properties you want for identifying your XML namespace. As a basic consumer of these names, the only thing that is needed on top of octet-by-octet comparison is to know about relative URIs so the difference on the consumer side is very little. This leaves the producer with a simple choice: pick your namespace with the properties you want. James Clark [1] has pointed out problems of clarity of the algorithm in comparing URIs and I think we need to think carefully about this and fix the URI spec where not clear. However, we should *not* try to design namespaces thinking if that we avoid URIs we avoid the problems of a decentralized system. Relations to Proposal --------------------- The proposal that we sent out [5] makes the choice very clear - the namespace identifier is a URI - end of story. It furthermore clarifies that in order to use relative URIs, you need to take into account the context you are working within. Let me clarify what is meant by "context": The common URI syntax has specific mechanisms for encoding some commonly used properties like naming authority and relative identifiers but others it doesn't: For example, there is no common way to encode persistence properties of a identifier or when it was created: "this identifier used Microsoft as of June 2000 as naming authority". For the specific case of relative URIs, the context is given by the rules defined in RFC 2396 section 5.1. What I think this section fails to point out is that it may not be necessary to determine a base URI in order to use relative URIs as identifiers if they are dealt with within the same context. This was the reason for the specific wording in the proposal. For other properties, the context is defined by the URI space itself and may not be explicit in the URI. Therefore, in order to know and use these properties of a name, it is necessary to know the context (ie properties) imposed by that URI space. In addition to this clarification, I have noted two other clarifications for the proposed wording which are: * We should encourage people generating documents to be consistent about the use of URIs so that simple mistakes are avoided [3] * We should ensure that the algorithm for comparing URIs which currently is in the HTTP spec is moved to the URI spec [1] We should work on this but not loose track of the problem space we are designing for. Henrik Frystyk Nielsen, mailto:frystyk@microsoft.com [1] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0619.html [2] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0667.html [3] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0678.html [4] http://lists.w3.org/Archives/Public/xml-uri/2000May/0282.html [5] http://lists.w3.org/Archives/Public/xml-uri/2000Jun/0406.html
Received on Monday, 19 June 2000 16:16:13 UTC