Re: XML observation

Hi Martin

>I would like to answer in more detail, and hope to do so next week,

I look forward to it (seriously).  A few responses to give you an 
idea of where the arguments lie:

>but here just a few points.
>
>- Yes, the usage of XML for both textual markup and data is confusing
>   to many, even many who are working on the specs.

That wasn't quite my point; it was more that XML is inherently poorly 
designed for (some kinds of) data representation, in part *because* 
of its inheritance from textual markup; and that this XML-literal 
debate we are engaged in is a very neat illustration of this.

>- Please note that for parseType="Literal", we are actually mostly
>   dealing with textual markup, not with data, and it is not only not
>   adequate, but plain backwards to raise the 'data issue' to suppress
>   the conventions that XML has for textual markup.

But the point here is that it is the *surrounding* RDF/XML text that 
is causing the problem, not the XML inside the parseType="Literal". 
Everyone agrees that the XML inside that is what it is, taken in 
isolation, and that it is textual in nature. All the trouble arises 
from seeing it as being in some sense 'enclosed' inside the 
surrounding XML (where it would be expected, on view X, to inherit 
such attributes as any other piece of XML would in that situation, 
such as lang tags.) In RDF/XML , on view G, that textual inclusion is 
merely a lexical accident. On this view, for RDF/XML to take XML 
scoping across the parseType="Literal" boundary marker would be an 
error, and the 'naturalness' of the attribute inheritance rules in 
the examples you give is a bug. The point is that the XML inside the 
parseType="Literal" is indeed marked-up text in the graph, but in a 
strong sense it is not part of the surrounding XML in  the RDF/XML 
document.

RDF/XML uses XML as a structural description notation most everywhere 
except inside parseType="Literal" where, as the name implies, it 
reverts to being literal, and treats that XML as simply marked-up 
text. The XML in the literal is being displayed: the rest of the XML 
is being *used*.  These two uses of XML are incommensurate and obey 
different natural scoping conventions, because RDF's syntax isn't 
marked-up text: it is a graph, not a nested bracket structure. Nested 
scoping just doesn't work properly when applied to a textual 
description of a graph (or an Sexpression, or lambda-notation, or 
almost any OO language code, or any language which calls remotely 
defined procedures, or which uses link structures, or relies on 
continuations to define dynamic scopes, or uses complex tabular data; 
or to cartographic representations; or many diagrammatic conventions, 
or....)

My proposed quick fix, by the way, only works because RDF scoping is 
so rudimentary that it is kind of harmless to let the error go 
through and accept the bug as a feature. But this would be a rather 
big change to make at this stage (so is unlikely to happen, is my 
estimate); and in any case Patrick's basic point still holds: the XML 
textual scoping rules really have got almost nothing to do with the 
RDF graph syntax, and this is really just patching over the cracks. 
So maybe it is best after all if users are forced into thinking about 
this seriously. They will have to at some point, no matter how we 
wriggle to make things seem better.

>- I myself see the fact that XML can deal with both text and data
>   as a great potential, but a potential that has to be worked on
>   quite a bit to be achieved.

I see little potential in trying to deal with complex data using a 
notation which is so obviously inadequate to the task, particularly 
when we have about half a century's accumulated experience in better 
ways to do it. Don't get me wrong: XML is great at doing what it is 
good at doing, and that can be used for a large part of the world's 
business.  But there are some things that it is inherently unsuitable 
for in its present form. We KNOW this: we know it in detail, we know 
why, and it isn't hard to invent extensions to XML which can handle 
things better.  But to try to make XML 1.0 into the answer to every 
maiden's prayer seems like a losing task to me.

>- Whereas the XML conventions for real datatypes in many ways can be
>   taken as just a notational convention for abstract concepts such as
>   'integer' that RDF treats as abstract concepts, in the case of
>   XML literals, we are dealing with marked-up text, and so there the
>   abstraction we are dealing with is XML, not just the notation.

See above.

>   (if RDF would want to create their own abstraction of marked-up
>   text, that would be a different thing, but currently, it doesn't)
>- XML Schema is confusing in many ways, because it was created by a
>   group of people that contained both people with a great deal of
>   expertize in textual markup, and people with excellent experience
>   in traditional datatypes, but did not spend too much effort to
>   combine these things. There is a top-level split into simple and
>   complex types, and then two separate worlds. The question 'of which
>   simple type is the text content in complex types' was never asked.
>   Also, it was not realized that 'string' is very special, because it
>   is the simple type for 'everything for which we don't have a type yet'
>   and also the type of the actual representation of all the other
>   datatypes.

Right, we have noticed these stress lines in XSD. It also suffers 
from a disastrous confusion between unions and disjoint unions, but I 
gather that will likely be corrected in the next draft.

Pat


>
>Regards,    Martin.
>
>
>At 19:01 03/07/03 -0500, pat hayes wrote:
>>Thinking about the issue we have been discussing, it occurs to me 
>>that XML has been holding a tiger by the tail and is now getting 
>>bitten, and this debate is a symptom of that.
>>
>>XML started life as a generalized text-markup system, and for that 
>>purpose it is wonderful. But it has been touted and used as 
>>something much more that just text markup: it has been announced as 
>>a kind of universal solvent for transmitting any kind of structure, 
>>a universal general-purpose structure-description system. 
>>Unfortunately, several of its features (most notably the 
>>restriction of attribute values to strings, cf 
>>http://www.waterlang.org/doc/trouble_with_xml.htm) are clearly 
>>serious design faults when seen from this more general point of 
>>view.  But more to the present point, the use of a *language* to 
>>describe structures requires us to clearly distinguish the text of 
>>the description from the thing - the structure - being described. 
>>Making a distinction like this is so second-nature to programmers, 
>>mathematicians, logicians and linguists - in fact anyone who uses 
>>technical languages professionally - that it takes a while in 
>>dealing with XML to realize that XML conspicuously fails to make 
>>it, and that in fact that the entire design of XML is predicated on 
>>denying it. XML documents describe structure by *displaying* it, in 
>>effect: they *are* the structure they describe. And of course this 
>>is entirely appropriate for a markup language: it is the very 
>>essence of markup that the markup *labels* the text it is the 
>>markup 'of'.
>>
>>To put the same point another way, markup is inherently indexical: 
>>what it means depends on where it is. If you write <title>The Way 
>>Things Were</title>, what the enclosing markup says, in effect, is: 
>>'THIS enclosed text is a title'.  The same piece of markup 
>>surrounding some other piece of text will implicitly refer to that 
>>other piece: its meaning - what it is talking *about* - depends on 
>>where in the text the markup occurs. It's location in the text is 
>>part of its meaning; and when it is used with no text to mark up, 
>>simply as a structural description language, this indexicality is 
>>retained in the *descriptive* conventions of the resulting 
>>language: so XML as a structural description convention has a 
>>built-in confusion between describing structure and displaying or 
>>exhibiting it, a built-in ambiguity between being a description and 
>>being a kind of diagram or map, a built-in tendency to confuse use 
>>and mention.
>>
>>This is clearly seen in the discussion we have been having. Martin 
>>(view X) sees a piece of RDF/XML as being a kind of XML text, and 
>>the resulting document as *displaying* the RDF structure in the 
>>XML. He expects that RDF/XML will satisfy the textual scoping 
>>mechanisms which arise naturally in any kind of layout display: in 
>>particular, attributes should apply to all of the items which are 
>>in the *textual* scope of the XML element.  That is the XML 
>>'structure as textual display' assumption, of course.  Patrick 
>>(view G) sees a structural description language rendered (in a 
>>fairly ad-hoc way) into XML syntax; the actual XML document is of 
>>relatively little importance: on this view, it is the structure 
>>described by the document that defines the significant, meaningful 
>>notions of scope and context.  And the RDF/XML conventions clearly 
>>isolate the XML 'inside' a parseType-attributed element from the 
>>XML surrounding the element, so it is 'obvious' that the lang tags 
>>that may be relevant to the outer context do not apply to the inner 
>>one.
>>
>>In my earlier metaphor, Parick here is the teeth of the tiger. Once 
>>XML is sold, and bought, as a general-purpose structural 
>>description language, and is used as such by professionals who are 
>>familiar with the conventions of such languages, the XML scoping 
>>conventions which are inherited from its role as a markup language 
>>are no longer appropriate: in fact, they are *ludicrous*: they are 
>>like a children's toy in an engineering shop.  Expecting 
>>professional programmers to conform to descriptive conventions 
>>defined by text-markup languages is whistling at the wind. 
>>Programmers have been using more sophisticated scoping conventions 
>>for over half a century; not because they didn't know better, but 
>>because they *needed* to.  You can't display recursion using 
>>indexical markup, for a start.
>>
>>The XML publicists have bitten off more than they know how to chew. 
>>If the result is XML that disobeys the XML 'conventions' and is 
>>unreadable by non-programmers, should anyone be surprised?
>>
>>Pat Hayes
>>--
>>---------------------------------------------------------------------
>>IHMC    (850)434 8903 or (650)494 3973   home
>>40 South Alcaniz St.    (850)202 4416   office
>>Pensacola                       (850)202 4440   fax
>>FL 32501                        (850)291 0667    cell
>>phayes@ihmc.us       http://www.ihmc.us/users/phayes


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes

Received on Friday, 4 July 2003 15:45:24 UTC