- From: W. E. Perry <wperry@fiduciary.com>
- Date: Sat, 03 Jun 2000 18:08:46 -0400
- To: xml-uri@w3.org
"Simon St.Laurent" wrote: > At 01:04 AM 6/3/00 -0400, Tim Berners-Lee wrote (in reply to Simon St.Laurent): > SSL>It doesn't mean that I'm right and you're wrong, but I think you have a > SSL>fundamentally different perspective of what a namespace is than a lot of > > SSL>people on this list. > > > TBL>That may be so. However, as without that perspective there can be nothing > > TBL>built on XML, for me it is important. > > Okay... let's step back a little, since that perspective is not universally > shared. If you want to build further work on that perspective, you need to > formalize and develop that perspective, getting buy-in and participation on > both the overall perspective and the details from the larger communities > where you mean to deploy that perspective. More than that, all of us should realize that the alternative perspective (in my, admittedly broad, terms: XML as syntax rather than XML as agreed or expected semantics) is not going to go away. It is the basis of pre-Infoset XML 1.0, and many of us have taken that REC at its (literal) word and built production applications on it. Now in the terms that John Cowan has phrased the 'moral' question, my syntax-centric software is expendable because it is documents which are precious. But I suspect that as a practical (and maybe even a moral) matter, my software will continue to be useful, and used, because it performs commercially necessary processes in an efficient and predictable way and has an auditable history of producing correct results. In the world of production commercial software those are precious qualities. At the same time, I agree absolutely with John that documents are precious. So much so, in fact, that I'll assert I treat documents--even what others might regard as the ephemera with which I work: securities orders, execution reportings, cashiering tickets--with greater deference than do those who care more about the meaning, the semantics, the intent of documents than about their syntax and literal content. I have to: it is the literal indications of quantity, price, terms of execution, identity of the trade counterparty, identity of the custody account, etc. on which all of my processing and all of my particular semantic elaborations must depend. If a transaction is questioned, the investigation and determination of that challenge will turn ultimately on its literal terms as literally expressed in the particular syntax of an instance document, not on the semantics I elaborated from that syntax, nor my assumptions of its 'intent', however usual or industry-standard my semantic expectations might be. Add to this that in the past fifteen years the business that I do has become thoroughly global. One result is that what might once have been a reasonably defensible argument that expected industry practice is to elaborate particular semantics from given syntax would now most charitably be considered quaintly provincial. The phenomenon which financial services has had to confront sooner than other industries is that things with similar names (after natural language translation) and apparently similar properties do get processed in very different ways in different places. That assertion should look very familiar to this list: it is, after all, the fundamental rationale for namespaces. The problem is that, in my empirical experience, XML namespaces attempt to solve this problem the wrong way around. The semantically-elaborated understanding of namespaces advocated by Tim Berners-Lee, Dan Connolly, et. al., even before it progresses to the expectation of dereferencing those namespaces to schemas, and perhaps later to 'standard' processing methods or who-knows-what-else, is mistaking the data for the uses of the data. If I am a cash settlement processing node in Thailand, I know how to perform the locally-expected process for paying or receiving the cash side of a securities transaction, and inherent (and probably encapsulated) in that process I know what data I require to do it. If you are an order ticket or an execution reporting on the trade which I am processing the cash settlement for, you should not even know that I am using you as one of my several inputs to this process. Considered semantically, you were designed or intended to convey a particular message: 'execute a trade on these terms' if you are an order ticket, or 'close this order (or portion of it) on the enclosed terms of execution' if you are an execution reporting. However, when you as one such document are routed to me because I am an interested party downstream in the pipeline of process from your initial function, your original intent and the semantics elaborated to convey it are meaningless. As a document you are now simply a container for the stark facts, the atomic data which you convey. Let me emphasize that you know nothing about me: you were created to convey data to a particular process (and perhaps semantically-understood, to command that process) which has now completed; as one outcome of the successful completion of that process you were routed to me. You shouldn't even know that I am in Thailand, or specific to the processing of Thai cash settlements. Properly designed as, say, an order ticket in XML, you should convey the ontology of an order as understood by whoever places it, without regard to the particular presentation of an order as expected in Thailand. In fact, as a properly designed XML order ticket, you should not exhibit the particular presentation of an order as the data structure expected by anyone, not only because good design of XML seeks to separate ontology from presentation, but also because you want your order ticket to be generally applicable, to all of the national markets in which you do business as well as to those in which you one day might, but do not yet know anything of the practices and expectations of. Therefore, if you as a document are sent out conveying a hard-coded namespace which dereferences to a schema, or any other statement of your intent or of your expected presentation, it is reasonable to ask for whom that might possibly be intended. The first place where you as an order ticket are sent might be your firm's own trading desk, whose semantic expectations you might know very well and with whom you might be comfortable in presenting a schema of your own data structure or an invocation of the processing you expect. Nevertheless, the processing which that trading desk node performs is entirely its affair. It must be able to extend and alter its procedures as its particular circumstances dictate, without looking for the agreement of the order-writing process or any other node with which it interacts. As the previous example indicates, that approval would be impossible to obtain anyway, since any processing node knows nothing of other nodes two or three steps downstream in the pipeline of process, to which its outputs might eventually be routed. The issue here is precisely the abstraction of data which XML was supposed to facilitate: the physical instantiation of any datum must be as the processing node requires and can effect for its own purposes. Yes, a processing nodeX may need to distinguish the form of a <price> arriving from nodeA from that of a <price> arriving from nodeB. That is, however, not something that either nodeA nor nodeB can do for nodeX, nor even give it much help with. A is unlikely even to know of B's existence, and vice versa. For this particular problem, the only nexus of A and B is X. Only X is in a position to distinguish the process by which it instantiates, for its own particular processing, the contents of A's <price> from that by which it instantiates the contents of B's. More importantly, only X is in a position to determine that *for the purposes of its own processing* A's <price> and B's <price> are ontologically equivalent, whatever their particular GI's may be and however their particular content may be differently presented! This means that the controlling authority for the schemas of A's price and of B's price at X will be X's processing needs against X's experience of the form in which relevant data arrives from A and B, not any schema which either A or B might assert. All this said, it is certainly possible that a schema presented by a document from A may be helpful to X in deciding how to instantiate, out of the entire data structure the document exhibits, the particular items of interest to X in this processing instance. It is, however, X's decision precisely because the purpose of the particular data instantiation is to serve the needs of X's processing. As a practical matter, that decision is not taken away from X even when X and A both nominally subscribe to a vertical industry standardized data vocabulary. Even if that standard vocabulary is utterly comprehensive and is kept complete through constant update--which none are, because 1) it takes time to get assent after the fact, even if it is possible (witness the premise of this very discussion!), for changes apparently required by unforeseen developments or by previously unappreciated contradictions; so 2) standard data vocabularies (the better ones, anyway) are designed to be tools general enough to express any data which should need to be communicated within their field of specialization, without obviating the possibility of presentational change to accommodate foreseeable possibilities: a <price> field, for example, might have a currency indication, an integer part, a fractional part, and a defined (or referenced) integer/fraction separator, but without specifying the decimal size of either the integer or the fraction, lest either economic hyperinflation or hypercontraction distort either to a magnitude unimagined at the time of its definition; but 3) the problem with such neither-general-nor-specific vocabularies is that they cannot form the basis of schemas which can direct, by themselves, the instantiation of data at the point of processing, since they must defer to the processor itself to render the data, however such properties as its numeric magnitude are presented, into a form which is computationally manageable for the processing node; so 4) the determinative evaluation of the data is supplied in the instance by the processor, as it would have done even in the absence of an agreed industry data vocabulary; and 5) this does not even contemplate the case--both fostered and promised by the Internet topology itself--that the universe of those who might transact within a given vertical market is expanding at an increasing rate, as the expansion of the network opens connections to participants not easily accessible before, while at the same time nodes previously unaware of that vertical market realize that they might traffic in it, but have no history of the shared assumptions which previously characterized it and, indeed, expect [and act straightway upon the expectation!--consider the vertical markets which have changed utterly since the collapse of the Soviet Union introduced new players who from day one simply did business in a way it had not been done before] that their own very different assumptions and practices will be accommodated by that market; which leads, if the standardized vocabularies try to keep up at all, back to (1). . . The point is that schemas, standardized data structures, and other agreed semantic baggage are not of themselves final nor absolute, but depend upon the outcome of process against instance data in a particular environment. The same is true of XML namespaces. Whether in relative or absolute form from the point of view of the document which presents them, namespaces from the point of view of the processor which must act on them are relative to a third party viewpoint, that of the document. David Carlisle has already succinctly illustrated that truth with his point: > If you decide that the DTDs base URI should be used (somehow) then > you have the fun deciding what is the base URI is for a DTD found > via a PUBLIC identifier. > The processor must, of course, engage in a process which in many real world circumstances includes resolving namespaces, even ones presented in apparently absolute form. And after that resolution the processor must regard those resolved namespaces as 'via nodeA', as distinct from 'via nodeB' or, more completely, as 'via nodeA/datastructure "foo"/instance #uniqueID'. The canon of name usage, in other words, is not the namespace+GI which a document might present, but the historical database at the processing node of the namespace/datastructure/GI/instance form asserted by the document, paired with the actual form of its instantiation by the processor on that occasion. I have gone on at great length, but I have particular ongoing processing experience which is utterly at odds with the Director's apparent vision of what is possible for absolute identification and schematic description in XML, but which I sincerely hope is useful in this discussion. On the particular question of whether namespace identifiers are simply text of a given syntactic form, or have an independent identity of more elaborate semantics, I realize that I have given an answer larger than the question. I believe that the vision of a namespace either absolutely specified or permanently and uniquely identified is unattainable in the environment which it is intended to serve: the place and moment of processing instance XML. Simple text identifiers which require resolution--either a character-by-character evaluation in the simpler case, or the instance resolution of a relative URI in the more complex one--lie close to the nature of XML processing, in their inherent acknowledgment that text must be processed in the instance to yield a uniquely instance semantics. And in the end that is the larger message which I have gotten from this discussion, and would try to persuade others of: the power of XML as a vehicle for data is the abstraction of that data from a particular instance form, and from its elaboration with particular semantics, until the moment of processing. Attempts to absolutize (horrid word!) that content before that moment, to specify schematic forms and fixed data structures, or to heap semantics, particularly of processing intent, upon syntax by pre-arrangement are contrary to the nature of XML and vitiate its power and uniqueness. Respectfully, Walter Perry
Received on Saturday, 3 June 2000 18:08:54 UTC