- From: Ray Whitmer <ray@xmission.com>
- Date: Fri, 19 May 2000 00:34:38 -0600 (MDT)
- To: Tim Berners-Lee <timbl@w3.org>
- cc: xml-uri@w3.org
On Thu, 18 May 2000, Tim Berners-Lee wrote: > This is your rhetoric. Retreival is useful but identity is important too. > However, > many schemes such as uuid: and mid: identify primarily and any URI-based > retrieval > is added on afterward. URI's simply do not have the ability to identify resources. They locate them. > You suspect wrong. Yes, as I admitted a day or so ago, I got that example wrong. I apologize again for not reading the spec before-hand. The principle is still true. Perhaps you would like to play a game of trying to guess whether several URIs really identify the same or different resources. URIs are interpreted by servers which have a lot of freedom to establish many-to-many relationships between URLs and the resources they retrieve. > I don't know why you should think that users understood practice and the RFC > should be different. > The RFC defines how things actually work which is what users are all used to > (and happens to basically match algoirthms used in unix for decades). In some cases yes, in some cases no. It encourages case insensetivity, for example -- it wasn't clear to me that it mandated it. Unix does not allow for it. This means that the client does not know whether the case may distinguish different resources or not. It does not dictate a character set for escaped characters, so that upper/lower case equivalence could be established if it clearly dictated that. saying http: versus https: on the same server may or may not identify equivalent resources, but generally only changes the procedure for accessing the resource. It does not identify equivalence of symbolic links and other Unix things that may make two URIs equivalent. It does not deal with caches which may mean that the location the file is retrieved from may not be the true identity, and so on. It represents a retrieval algorithm, not a unique identifier. > I assume this idea of "partial absolutization" is based on the misconception > above. There is no such thing defined. There is only absolutization. On the contrary, it came up many times during the discussion how much the client would be required to do. > Absolutely not! It is essential that the asolutization canbe done as a > simple string function without any knowledge of any specific schemes. Requiring that does not give enough knowledge of the specifics of a protocol to make it possible. > You are using the term absoluization in a manner different from the way it > has been used on this list. I have seen no one argue for involving the > server in the processes. Many URI schemes don't have a concept of a server. The point was, without the server to do some canonicalization, the absolutization described in the RFC is not sufficient to say whether two URIs identify the same resources or different resources. This point was brought up in several discussions I had, and always represented as a trivial form of absolutization that should occur, not performing most mappings of a URL server to identify the actual resource to return. > On the contrary, very often systems aloow separate access to the actual > attribute value > and to the interpreted object. xHTML editors typically preserve the choice > of relative or absolute URI used in a A HREF, but adjust the value when a > copy of the document is saved to a new URI. I would suggest that this > behaviour be the norm. I should have said, the primary / winning meaning. As I went on to describe, we already have one such duality in namespaces, which causes a fair amount of complexity and disagreement -- the prefix versus the URI. > > What is preserved when nodes are > >moved within the hierarchy -- the absolutized concept of type that is so > >important for specialized DOM implementations, or the relative info? You > >could choose: > > You are talking about an XML node moving within the DOM tree? > Or a document being moved through URI space? > In all cases the absolute URI should be presrved. To preserve the choice of > relative URI or absolute URI is important in practice too. > I am hoping that the same code is going to be used for all URIs of course. But when you serialize, only one can be preserved, unless they happen to exactly coincide. What is the point of having used relative URIs, if programmatically moving it to a new location does not cause the resolution to be different? I thought that was the point, no? > >a. The DOM parser absolutizes and DOM effectively does not permit relative > >URIs, except as convenience. > > > >b. Relative? Do we really intend to have the validation rules and > >implementing classes change as a node moves in the hierarchy to another or > as > >they are saved in a different location? It is very common to tie the > >implementing class and validation rules to the type. Type will now be in > >flux, whereas DOM has traditionally considered the type constant for these > >reasons? Are you willing to frequently re-absolutize the name so it can be > >compared? > > > >c. Absolute(ized): with relative preserved? How is the relative path > >preserved in the nodes? The relative specifier would be at best like the > >prefix which is considered syntactic sugar, which may be replaced by the > >serializer. In this case, a serialized document would likely lose the > relative > >nature of namespaces, so the relative properties of the namespace > >specifications could not be relied on for essential functionality (just > like > >a document becomes invalid according to DTDs when saved in canonical form). > >All the problems of prefixes arize again for relative URIs, for example an > >entity needs to know the base URI of the position in which it is > referenced, > >or an absolutized URI can not be cmputed, which may be different in > different > >references, so entity hierarchies would have no absolutized URIs. Do we > really > >want this additional complexity on namespaces? > > > >If so, perhaps another group should make a different naming standard for > those > >who just want namespaces, which could look like Java packages so no one > gets so > >confused to marry an arbitrary resource to the type declaration. > > > Please, if a server returns something when queried for the namespace > resource, > then that is now an arbitrary resource. The server is controlled by the > owner of the > name and the document returned is therefore definitive. It is not arbirary. The point of XML to me (and SGML before that) is many processing models for the same information. Schemas, DTDs, and other syntactic controls tell nothing about the meaning, and may compete as other standards do. It is unreasonable, IMO, to expect to find one particular one at the end of a namespace declaration, or to expect most applications to pay attention to it. Only extremely-general purpose applications might look at it. Most only care about transforming in and out of objects with more specific / proprietary / local meaning. If someone changes the syntax, they can decide what to do with the unrecognized part: ignore it, pass it along, or raise an error. These non-general specific-purpose applications are the ones that interpret the syntax with meaning. Hence, the transform is far more important to the processing recipient than the schema, and the proper transform to lend local meaning will be different for different recipients. > To use java class names I say what I said about FPIs. If you really think > they solve problems > other URI schemes don't solve, then you can propose that they be a new URI > scheme. > Then all new designes will be able to take advantage of them. But then you might insist on absolutizing them, or otherwise insisting that they represent a retrieval pointer rather than an identity. > >And what do you do when multiple purposes conflict? > > What multiple purposes, and can you give an example of conflict? I just did. The schema is largely irrelevant. It is the transformation that is important, which the schema gives little insight about. Different processing models require different transformations. There are some who would like to dictate a single processing model for all data. They have been doing it for years. AKA the war of proprietary formats. But they have failed to make their solutions reusable because no two sets of requirements are equal. > Oh, you mean that one schema document should not contain syntax-related and > semantic information? That doesn't seem to be a problem to me. > Specification douments do. Of course > one could separate them and refe to one from the other. I am not convinced that one document can even represent all desired syntactical restrictions. People keep inventing new microparsers for things that XML is considered too verbose for, like xpath, svg, many url protocols, even things like currency and dates are microsyntaxes, and could be easly reduced to simpler things like integer attributes if people were not too inventive of shorthand syntaxes. Are you anticipating creating a meta-schema that transforms all microsyntaxes into XML, and then somehow makes all forms equivalent? Equivalent to what? And even then, you are a long ways from giving things meaning. Since no one here has apparently had his fill of analogies;-) let me recite one I heard with respect to Java Beans: People are trying to make a set of definitions that automatically snap together -- like a jigsaw puzzle, which is fine if you are satisfied with the picture on the front of the box. To go much beyond that -- a dictatorship of ideas -- the approach is quite different. The pieces are more individualistic, simple, and make no pretenses to perfectly fit the surrounding pieces. You use "glue" -- or manually-created mappings to hold it together and mold it into the local meaning and relevance. The pieces are made to work together in many new settings and situations that were not anticipated, because the pieces are expected to be manually mapped into relationships. > Could you plese indicate the standards groups which have run screaming from > URIs? Look around you. I will not point to them publicly right now. > I would point out that the web is humming with relative URIs which lend > managability to huge amounts of the data. Our own website would be > very much more difficult to manage without them. W3c would have to make do like everyone else if the website supported dynamic data relying on multidimensional parameterized URI syntax, which is where much of the web is going, rather than file-system-like URIs. Many go to great lengths writing relative navigation schemes that have nothing to do with the URI RFC, because it deals only with trivial name hierarchies and discards parameters. For example, you might have different parameters describing: The working group The stage of the work The publicness of the work The type of resource The user's desired medium (html, xml, cell phone markup, etc.) The user's language The user's locale It is generally desirable to be able to switch any of these relatively while holding the others constant. But the URI scheme which would force relativism to occur within a single hierarchy would force you to give each item priority rather than allowing them to be equal dimensions, and discard everything of lesser priority. This makes the RFC-based relativism unworkable for common uses. It was designed for legacy file systems, which sometimes can capture part of the identity of a resource. Ray Whitmer ray@xmission.com
Received on Friday, 19 May 2000 02:34:44 UTC