- From: W. E. Perry <wperry@fiduciary.com>
- Date: Mon, 15 May 2000 15:47:32 -0400
- To: Tim Berners-Lee <timbl@w3.org>, xml-uri@w3.org, xml-dev@xml.org
Tim Berners-Lee wrote: > There are those who would maintain that a > namespace should have no semantics, but I would say that then documents will > have no semantics, and will be useless to man or machine. [You can go > through the philosophical process of defining all semantics in terms of > syntactic operations, of course, in which case the pedant can take the view > that all is syntax, but it is not a helpful view when making this decision, > as it leaves the main points the same and just gives a more difficult > framework for most people to think of it]. I do not flatter myself that the Director reads my postings, but I have argued for many months (e.g. http://xml.org/archives/xml-dev/2000/03/0380.html ) that semantics are local to the node where instance markup is processed and that the XML family of specifications could (should!) aspire to no more than the specification of syntax. In an Internet topology, the effective definition of a process is the form of its execution at a particular occasion on a 'client-side' node. Some processes--the html browser's processing of a link; the node's request to DNS to resolve a name not in its host table--are performed in standard ways by common software which first had to be distributed to and installed individually on 200 million plus nodes. A goal (or if stating it that way is now seen as historical revisionism, then an advantage) of XML from the start was that interoperability would not require updating software on those 200 million plus nodes to conform to some new procedure. The trivial example was that a new bit of browser behavior would require neither another iteration of the html spec nor new non-standard vendor-specific code, either of which then had to be distributed out onto all those nodes. The chief question raised--and left unanswered--in XML 1.0 was how the specific local functionality, required to implement the behavior implied by new markup, would be implemented. One very early book on XML essentially assumed that Java code would need to be written at every node to realize the functionality described by every new use of markup. In retrospect, this may have shown a clearer understanding than recent XML specifications of the nature of--and the place of XML markup in--a decentralized peer-to-peer Internet topology. At least it was clear that the implementation of behavior was idiosyncratic and local to the node. In the past two years, consensus opinion implies that there will be standard XML processors at each node--'standard' in this case implying that a processor implements features of the XML family of specifications in a predictable and non-self-contradictory manner. On that assumption, XML specifications have been written, at least since the 'Namespaces in XML Recommendation', not simply as syntactic prescription, but with very definite opinions of how defined syntactic structures of markup should be processed at the local node, and of what the semantic outcome of that processing should predictably be. In my (admittedly heretical) opinion, the burden of semantic expectation upon XML specifications has increased exponentially from the days of the original PI namespace processing to the current Schema draft, and will very likely do so again by the time Query reaches PR status. It is therefore little surprise that the XML community has reached the current "coordination" hassle. The definition of equivalence in namespaces as simple character-by-character matching is a vestigial remnant of the time when the acknowledged purview of XML was text, and that is embarrassingly primitive to those who have since gone on to specify much of the arbitrariness of text out of XML. Let us honestly admit that the fundamental objection to simple character matching is that it is insufficiently dense in semantics for the current taste and practice in specification making. That admission recognizes the trend which has reduced regexp tools to a decidedly second-class status in the XML world, precisely because prescribed markup, and content, now bears semantic meaning well beyond the reach of mere text manipulation tools. The coordination hassle, however, is not confined to namespaces, and if it is sufficient to halt work while the contradictions it has introduced to namespaces are resolved, it should be worth all of our while to take this time to look at the general form of the problem and to consider its general solution. I have spent all of the year thus far wrestling with the problems of designing a system in which excerpts of running text, some quite long, must be committed to an XML-specific database. Within that database, the text may not be BLOB'ed but must always remain directly accessible as running text. At the same time, either the user who originally commits that text, or any other user of it, may by embedding markup into it note his own understanding of the significance and internal relationships of that text, or its relationships to external database objects, some of them also arbitrary text, which might not be accessible to any other users of the original text. In other words, the text must both remain simple text and also serve as the vehicle for whatever semantics a particular user may have an interest in. Notice that such an interest is effectively expressed only in the use which a particular user might make of the semantics (introduced by his own markup, or by the markup of others which he might have access to) and of the text itself, for a particular purpose on a particular occasion. In order to effect that use, the user must apply processing. That processing must be in large part idiosyncratic. He cannot simply invoke the processes of another user to handle the semantics introduced by that user because in combining that user's semantics with his own and that of others which that user knows nothing of, he may well have altered that user's semantics beyond what that user's processes ever contemplated, or could handle. Actually, the problem is often not even as complex as that, but is still a problem: the user on any particular occasion will often have an entirely different intent for the semantics, or even for the simple text, of another user. That difference of intent utterly alters what processing must be applied, and alters it in a way which can only be known at the specific node, in the specific instance. We already have an example of this general problem in the specific case of namespaces: the question of whether anything should be retrievable from a URI when that URI is used solely as a namespace reference. Frankly, I don't see how we can adequately resolve the coordination hassle without revisiting that question, as well. > A document is a communication between publisher and reader. Its significance > is the product of its contents and the definitions of the terms it uses. As > we have increasingly powerful schema languages, we can say now syntactic > (with xml-schema) and later semantic things about those terms, until > eventually we will, in a machine-processable document, be able to relate one > XML namespace to others in a web so as to allow machine conversion between > systems using different namespaces, and searches and inference across many > different applications. There is, therefore a great deal to be said for > using, for namespaces, a URI which allows one to look up some definitive (or > non-definitive) information about it. This applies the power of the web at > a new level: in bootstrapping from one language into another. The significance of a document is the product of its contents and the definitions of the terms it uses *as applied in the instance, by the reader, to the document*. Where the reader fetches those definitions from is decidedly secondary to the process by which they are applied in the instance, and to the outcome of that process. The framer of one set of such definitions (be it W3C WG or vertical industry consortium defining an industry transactional data vocabulary) cannot know the specifics of that instance unless it exercises a cartel power to prescribe the circumstances in which its definitions are permitted to be used. Let us assume that we are committed to openness and extensibility, and so rule out reliance on that restrictive cartel power. If, then, the framers of definitions cannot know the specific circumstances in which those definitions will be applied, they cannot predicate their design of those definitions on the expected semantic outcome of their use. That change of perspective would alter utterly not only the terms of the present coordination hassle, but the dozens of analogous hassles which lie hidden in specifications whose syntax is designed to effect an expected semantic result. That change of perspective is a much bigger solution than, I suspect, was wanted when this problem was opened for discussion, but it does provide an intellectually defensible way out of the problem. Might we debate, now, the specifics of how processing is to be implemented at the individual autonomous node so that the semantic intentions of the definers do not matter, and any or all of the allowed syntactic forms might be successfully processed? Respectfully, Walter Perry
Received on Monday, 15 May 2000 15:47:37 UTC