XML observation

Thinking about the issue we have been discussing, it occurs to me 
that XML has been holding a tiger by the tail and is now getting 
bitten, and this debate is a symptom of that.

XML started life as a generalized text-markup system, and for that 
purpose it is wonderful. But it has been touted and used as something 
much more that just text markup: it has been announced as a kind of 
universal solvent for transmitting any kind of structure, a universal 
general-purpose structure-description system. Unfortunately, several 
of its features (most notably the restriction of attribute values to 
strings, cf http://www.waterlang.org/doc/trouble_with_xml.htm) are 
clearly serious design faults when seen from this more general point 
of view.  But more to the present point, the use of a *language* to 
describe structures requires us to clearly distinguish the text of 
the description from the thing - the structure - being described. 
Making a distinction like this is so second-nature to programmers, 
mathematicians, logicians and linguists - in fact anyone who uses 
technical languages professionally - that it takes a while in dealing 
with XML to realize that XML conspicuously fails to make it, and that 
in fact that the entire design of XML is predicated on denying it. 
XML documents describe structure by *displaying* it, in effect: they 
*are* the structure they describe. And of course this is entirely 
appropriate for a markup language: it is the very essence of markup 
that the markup *labels* the text it is the markup 'of'.

To put the same point another way, markup is inherently indexical: 
what it means depends on where it is. If you write <title>The Way 
Things Were</title>, what the enclosing markup says, in effect, is: 
'THIS enclosed text is a title'.  The same piece of markup 
surrounding some other piece of text will implicitly refer to that 
other piece: its meaning - what it is talking *about* - depends on 
where in the text the markup occurs. It's location in the text is 
part of its meaning; and when it is used with no text to mark up, 
simply as a structural description language, this indexicality is 
retained in the *descriptive* conventions of the resulting language: 
so XML as a structural description convention has a built-in 
confusion between describing structure and displaying or exhibiting 
it, a built-in ambiguity between being a description and being a kind 
of diagram or map, a built-in tendency to confuse use and mention.

This is clearly seen in the discussion we have been having. Martin 
(view X) sees a piece of RDF/XML as being a kind of XML text, and the 
resulting document as *displaying* the RDF structure in the XML. He 
expects that RDF/XML will satisfy the textual scoping mechanisms 
which arise naturally in any kind of layout display: in particular, 
attributes should apply to all of the items which are in the 
*textual* scope of the XML element.  That is the XML 'structure as 
textual display' assumption, of course.  Patrick (view G) sees a 
structural description language rendered (in a fairly ad-hoc way) 
into XML syntax; the actual XML document is of relatively little 
importance: on this view, it is the structure described by the 
document that defines the significant, meaningful notions of scope 
and context.  And the RDF/XML conventions clearly isolate the XML 
'inside' a parseType-attributed element from the XML surrounding the 
element, so it is 'obvious' that the lang tags that may be relevant 
to the outer context do not apply to the inner one.

In my earlier metaphor, Parick here is the teeth of the tiger. Once 
XML is sold, and bought, as a general-purpose structural description 
language, and is used as such by professionals who are familiar with 
the conventions of such languages, the XML scoping conventions which 
are inherited from its role as a markup language are no longer 
appropriate: in fact, they are *ludicrous*: they are like a 
children's toy in an engineering shop.  Expecting professional 
programmers to conform to descriptive conventions defined by 
text-markup languages is whistling at the wind.  Programmers have 
been using more sophisticated scoping conventions for over half a 
century; not because they didn't know better, but because they 
*needed* to.  You can't display recursion using indexical markup, for 
a start.

The XML publicists have bitten off more than they know how to chew. 
If the result is XML that disobeys the XML 'conventions' and is 
unreadable by non-programmers, should anyone be surprised?

Pat Hayes
-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes

Received on Thursday, 3 July 2003 20:01:48 UTC