Re: XML Schemas: the wrong name from Steven R. Newcomb on 2000-10-25 (www-xml-schema-comments@w3.org from October to December 2000)

From: Steven R. Newcomb <srn@coolheads.com>
Date: Wed, 25 Oct 2000 13:06:16 -0500
To: clbullar@ingr.com
CC: elharo@metalab.unc.edu, xml-dev@xml.org, www-xml-schema-comments@w3.org
Message-Id: <200010251806.NAA03898@bruno.techno.com>
[Len Bullard:]
> [a bunch of interesting words about layered systems
> that I didn't really understand]

> ...Perhaps you could detail the concept of 
> ready to run information.

Consider the interchange form (an XML message) of a purchase order.
It would be a bad idea to include a total amount to be paid, since an
explicit total would be redundant.  If somebody tweaked the
interchange form, the total would be inconsistent with the rest of the
message, and there would probably be no easy way for the recipient to
determine which information is invalid.  In general, then, it's a bad
idea to include redundant information in interchange messages, not
just because it uses bandwidth, but more importantly because it is
very likely to cause ambiguity.

Consider the form of the purchase order when it is "ready to run" --
when an API is provided to the information it contains.  It's very
reasonable to provide such an API with a "total()" method.  Redundancy
in APIs is good; APIs are supposed to be convenient to use.  total()
gives access to an "emergent property" (as opposed to an explicit
syntactic property) of the information set found in purchase orders.
Of course, while total() makes sense for purchase orders, it doesn't
apply to many other kinds of XML messages.

The grove paradigm fully recognizes that information can have multiple
levels of interpretation applied to it, and that such interpretations
have the effect of making implicit information explicit.

* When an XML document is processed, a grove of the syntax is the
  result (we usually call this grove a "DOM tree").  In grove land,
  each node is an addressable information component.  The tree
  structure that was implicit in the interchange form of the
  information has become explicit.

* But wait, there's more.  Vocabularies are used in XML documents,
  and, depending on the semantics of those vocabularies, there can be
  properties that "emerge" from the information, when the information
  is understood in terms of the intended semantics of the
  vocabularies.  In grove-land, these emergent properties appear in
  additional groves, and those properties, too, are reliably
  addressable.  Thus, the "total" property of a purchase order can
  become explicit and addressable, even though it was only implicit in
  the interchange form of the information.

The purchase order example is a trivial one that's good for teaching
purposes, but it's not very compelling, I think.

I find the example of topic map processing much more compelling.  The
syntactic components of a topic map document are not, and they can
never be, fully indicative of their own significance.  They can only
be fully understood in terms of their connections to many other things
whose syntactic whereabouts are necessarily arbitrary.  The *whole*
topic map document must be understood -- processed -- before the
significance of any of it can be fully and reliably understood.  Topic
map processing causes topic map documents to become things that
resemble ready-to-run semantic nets.  (Groves are one way to think
about these nets -- a way that has the advantage of offering reliable
addressability based on international standards -- but the truth is
that groves are just one way.)  The reason you create topic map
documents is to allow these semantic net-like things to be
interchanged and merged with one another by their end users and by
people who wish to add more value to them in various ways.  The nets
don't and can't resemble the interchange documents, because of their
own very highly interconnected and interdependent nature, and because
of the fact that the nature of an interchangeable document is quite
different from that of a semantic net.  An interchangeable document is
nothing more or less than a sequence of characters.

So, I repeat what I said in my earlier note: 

  There is this common wisdom out there that the structure of
  interchanged information should also be, in effect, the API to that
  same information.  But, in fact, it's only true for a simple subset
  of the kinds of information that need to be interchanged, and to
  which APIs must be provided.

Len, does this speak to what you were saying about layered systems?

-Steve

--
Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

405 Flagler Court
Allen, Texas 75013-2821 USA

"We're not exactly anti-schema, but we're sure pro-DTD."
 -- doctypes.org
Received on Wednesday, 25 October 2000 14:07:26 UTC