Re: fundamental difference? from W. E. Perry on 2000-06-03 (xml-uri@w3.org from June 2000)

From: W. E. Perry <wperry@fiduciary.com>
Date: Sat, 03 Jun 2000 18:08:46 -0400
To: xml-uri@w3.org
Message-ID: <393981ED.7688415D@fiduciary.com>
"Simon St.Laurent" wrote:

> At 01:04 AM 6/3/00 -0400, Tim Berners-Lee wrote (in reply to Simon St.Laurent):
>     SSL>It doesn't mean that I'm right and you're wrong, but I think you have a
>     SSL>fundamentally different perspective of what a namespace is than a lot of
>
>     SSL>people on this list.
> >
>  TBL>That may be so.   However, as without that perspective there can be nothing
>
>  TBL>built on XML,  for me it is important.
>
> Okay... let's step back a little, since that perspective is not universally
> shared.  If you want to build further work on that perspective, you need to
> formalize and develop that perspective, getting buy-in and participation on
> both the overall perspective and the details from the larger communities
> where you mean to deploy that perspective.

More than that, all of us should realize that the alternative perspective (in my,
admittedly broad, terms:  XML as syntax rather than XML as agreed or expected
semantics) is not going to go away. It is the basis of pre-Infoset XML 1.0, and
many of us have taken that REC at its (literal) word and built production
applications on it. Now in the terms that John Cowan has phrased the 'moral'
question, my syntax-centric software is expendable because it is documents which
are precious. But I suspect that as a practical (and maybe even a moral) matter,
my software will continue to be useful, and used, because it performs commercially
necessary processes in an efficient and predictable way and has an auditable
history of producing correct results. In the world of production commercial
software those are precious qualities.

At the same time, I agree absolutely with John that documents are precious. So
much so, in fact, that I'll assert I treat documents--even what others might
regard as the ephemera with which I work:  securities orders, execution
reportings, cashiering tickets--with greater deference than do those who care more
about the meaning, the semantics, the intent of documents than about their syntax
and literal content. I have to:  it is the literal indications of quantity, price,
terms of execution, identity of the trade counterparty, identity of the custody
account, etc. on which all of my processing and all of my particular semantic
elaborations must depend. If a transaction is questioned, the investigation and
determination of that challenge will turn ultimately on its literal terms as
literally expressed in the particular syntax of an instance document, not on the
semantics I elaborated from that syntax, nor my assumptions of its 'intent',
however usual or industry-standard my semantic expectations might be. Add to this
that in the past fifteen years the business that I do has become thoroughly
global. One result is that what might once have been a reasonably defensible
argument that expected industry practice is to elaborate particular semantics from
given syntax would now most charitably be considered quaintly provincial.

The phenomenon which financial services has had to confront sooner than other
industries is that things with similar names (after natural language translation)
and apparently similar properties do get processed in very different ways in
different places. That assertion should look very familiar to this list:  it is,
after all, the fundamental rationale for namespaces. The problem is that, in my
empirical experience, XML namespaces attempt to solve this problem the wrong way
around. The semantically-elaborated understanding of namespaces advocated by Tim
Berners-Lee, Dan Connolly, et. al., even before it progresses to the expectation
of dereferencing those namespaces to schemas, and perhaps later to 'standard'
processing methods or who-knows-what-else, is mistaking the data for the uses of
the data. If I am a cash settlement processing node in Thailand, I know how to
perform the locally-expected process for paying or receiving the cash side of a
securities transaction, and inherent (and probably encapsulated) in that process I
know what data I require to do it. If you are an order ticket or an execution
reporting on the trade which I am processing the cash settlement for, you should
not even know that I am using you as one of my several inputs to this process.
Considered semantically, you were designed or intended to convey a particular
message:  'execute a trade on these terms' if you are an order ticket, or 'close
this order (or portion of it) on the enclosed terms of execution' if you are an
execution reporting. However, when you as one such document are routed to me
because I am an interested party downstream in the pipeline of process from your
initial function, your original intent and the semantics elaborated to convey it
are meaningless. As a document you are now simply a container for the stark facts,
the atomic data which you convey. Let me emphasize that you know nothing about
me:  you were created to convey data to a particular process (and perhaps
semantically-understood, to command that process) which has now completed; as one
outcome of the successful completion of that process you were routed to me. You
shouldn't even know that I am in Thailand, or specific to the processing of Thai
cash settlements. Properly designed as, say, an order ticket in XML, you should
convey the ontology of an order as understood by whoever places it, without regard
to the particular presentation of an order as expected in Thailand. In fact, as a
properly designed XML order ticket, you should not exhibit the particular
presentation of an order as the data structure expected by anyone, not only
because good design of XML seeks to separate ontology from presentation, but also
because you want your order ticket to be generally applicable, to all of the
national markets in which you do business as well as to those in which you one day
might, but do not yet know anything of the practices and expectations of.

Therefore, if you as a document are sent out conveying a hard-coded namespace
which dereferences to a schema, or any other statement of your intent or of your
expected presentation, it is reasonable to ask for whom that might possibly be
intended. The first place where you as an order ticket are sent might be your
firm's own trading desk, whose semantic expectations you might know very well and
with whom you might be comfortable in presenting a schema of your own data
structure or an invocation of the processing you expect. Nevertheless, the
processing which that trading desk node performs is entirely its affair. It must
be able to extend and alter its procedures as its particular circumstances
dictate, without looking for the agreement of the order-writing process or any
other node with which it interacts. As the previous example indicates, that
approval would be impossible to obtain anyway, since any processing node knows
nothing of other nodes two or three steps downstream in the pipeline of process,
to which its outputs might eventually be routed.

The issue here is precisely the abstraction of data which XML was supposed to
facilitate:  the physical instantiation of any datum must be as the processing
node requires and can effect for its own purposes. Yes, a processing nodeX may
need to distinguish the form of a <price> arriving from nodeA from that of a
<price> arriving from nodeB. That is, however, not something that either nodeA nor
nodeB can do for nodeX, nor even give it much help with. A is unlikely even to
know of B's existence, and vice versa. For this particular problem, the only nexus
of A and B is X. Only X is in a position to distinguish the process by which it
instantiates, for its own particular processing, the contents of A's <price> from
that by which it instantiates the contents of B's. More importantly, only X is in
a position to determine that *for the purposes of its own processing* A's <price>
and B's <price> are ontologically equivalent, whatever their particular GI's may
be and however their particular content may be differently presented! This means
that the controlling authority for the schemas of A's price and of B's price at X
will be X's processing needs against X's experience of the form in which relevant
data arrives from A and B, not any schema which either A or B might assert.

All this said, it is certainly possible that a schema presented by a document from
A may be helpful to X in deciding how to instantiate, out of the entire data
structure the document exhibits, the particular items of interest to X in this
processing instance. It is, however, X's decision precisely because the purpose of
the particular data instantiation is to serve the needs of X's processing. As a
practical matter, that decision is not taken away from X even when X and A both
nominally subscribe to a vertical industry standardized data vocabulary. Even if
that standard vocabulary is utterly comprehensive and is kept complete through
constant update--which none are, because  1) it takes time to get assent after the
fact, even if it is possible (witness the premise of this very discussion!), for
changes apparently required by unforeseen developments or by previously
unappreciated contradictions;  so  2) standard data vocabularies (the better ones,
anyway) are designed to be tools general enough to express any data which should
need to be communicated within their field of specialization, without obviating
the possibility of presentational change to accommodate foreseeable
possibilities:  a <price> field, for example, might have a currency indication, an
integer part, a fractional part, and a defined (or referenced) integer/fraction
separator, but without specifying the decimal size of either the integer or the
fraction, lest either economic hyperinflation or hypercontraction distort either
to a magnitude unimagined at the time of its definition;  but  3) the problem with
such neither-general-nor-specific vocabularies is that they cannot form the basis
of schemas which can direct, by themselves, the instantiation of data at the point
of processing, since they must defer to the processor itself to render the data,
however such properties as its numeric magnitude are presented, into a form which
is computationally manageable for the processing node;  so  4) the determinative
evaluation of the data is supplied in the instance by the processor, as it would
have done even in the absence of an agreed industry data vocabulary;  and  5) this
does not even contemplate the case--both fostered and promised by the Internet
topology itself--that the universe of those who might transact within a given
vertical market is expanding at an increasing rate, as the expansion of the
network opens connections to participants not easily accessible before, while at
the same time nodes previously unaware of that vertical market realize that they
might traffic in it, but have no history of the shared assumptions which
previously characterized it and, indeed, expect [and act straightway upon the
expectation!--consider the vertical markets which have changed utterly since the
collapse of the Soviet Union introduced new players who from day one simply did
business in a way it had not been done before] that their own very different
assumptions and practices will be accommodated by that market;  which leads, if
the standardized vocabularies try to keep up at all, back to (1). . .

The point is that schemas, standardized data structures, and other agreed semantic
baggage are not of themselves final nor absolute, but depend upon the outcome of
process against instance data in a particular environment. The same is true of XML
namespaces. Whether in relative or absolute form from the point of view of the
document which presents them, namespaces from the point of view of the processor
which must act on them are relative to a third party viewpoint, that of the
document. David Carlisle has already succinctly illustrated that truth with his
point:

> If you decide that the DTDs base URI should be used (somehow) then
> you have the fun deciding what is the base URI is for a DTD found
> via a PUBLIC identifier.
>
The processor must, of course, engage in a process which in many real world
circumstances includes resolving namespaces, even ones presented in apparently
absolute form. And after that resolution the processor must regard those resolved
namespaces as 'via nodeA', as distinct from 'via nodeB' or, more completely, as
'via nodeA/datastructure "foo"/instance #uniqueID'. The canon of name usage, in
other words, is not the namespace+GI which a document might present, but the
historical database at the processing node of the
namespace/datastructure/GI/instance form asserted by the document, paired with the
actual form of its instantiation by the processor on that occasion.

I have gone on at great length, but I have particular ongoing processing
experience which is utterly at odds with the Director's apparent vision of what is
possible for absolute identification and schematic description in XML, but which I
sincerely hope is useful in this discussion. On the particular question of whether
namespace identifiers are simply text of a given syntactic form, or have an
independent identity of more elaborate semantics, I realize that I have given an
answer larger than the question. I believe that the vision of a namespace either
absolutely specified or permanently and uniquely identified is unattainable in the
environment which it is intended to serve:  the place and moment of processing
instance XML. Simple text identifiers which require resolution--either a
character-by-character evaluation in the simpler case, or the instance resolution
of a relative URI in the more complex one--lie close to the nature of XML
processing, in their inherent acknowledgment that text must be processed in the
instance to yield a uniquely instance semantics. And in the end that is the larger
message which I have gotten from this discussion, and would try to persuade others
of:  the power of XML as a vehicle for data is the abstraction of that data from a
particular instance form, and from its elaboration with particular semantics,
until the moment of processing. Attempts to absolutize (horrid word!) that content
before that moment, to specify schematic forms and fixed data structures, or to
heap semantics, particularly of processing intent, upon syntax by pre-arrangement
are contrary to the nature of XML and vitiate its power and uniqueness.


Respectfully,

Walter Perry
Received on Saturday, 3 June 2000 18:08:54 UTC