RE: XML observation from Patrick.Stickler@nokia.com on 2003-07-04 (w3c-rdfcore-wg@w3.org from July 2003)

From: <Patrick.Stickler@nokia.com>
Date: Fri, 4 Jul 2003 10:41:38 +0300
To: <phayes@ihmc.us>, <w3c-rdfcore-wg@w3.org>, <duerst@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBBEA@trebe006.europe.nokia.com>
Thank you, Pat. Alot. Very well put.

This was one of the key points I was trying
to get across to Martin.

RDF/XML is a means to an end. Not the end. And there are many,
many examples of using XML for data interchange which disregard
much if not most of the machinery that XML offers to those for
whom the XML is the final and primary form of consumption.

Years back, when XML was first coming on the scene, I noted
that there was arising (and still exists, well entrenched)
a tension between two very different uses of XML, as you point
out, (a) use as a markup formalism for document modelling and
(b) use as a markup formalism for data structures. The former (a)
are extensions to the XML framework, where it is expected that
the full richness of the XML machinery applies. The latter (b) use 
XML to essentially "tunnel" data between systems through XML-savvy
processes and tools, and the XML machinery used for such interchange 
relates to the encapsulation of the data, not the abstract model by 
which the data itself is ultimately interpreted. I.e. the XML
machinery applies only to the semantics of the encapsulation, not
to the content encapsulated.

These tensions come to a head in cases such as RDF/XML when you 
use XML as a markup formalism for a data structure which encapsulates 
data that itself uses XML as a markup formalism for document modelling, 
since, as Pat points out, XML itself fails to provide any 
distinction between the encapsulation and the encapsulated.

In RDF/XML, as Pat again points out, rdf:parseType="Literal"
indicates that boundary between the data structure markup scope
and some internal, literal-specific, document markup scope;
between the encapsulation and the encapsulated.

An RDF literal is precisely that, a *literal*. It should not
be infected with the contextual characteristics of the language
used to describe the data structures which encapsulate it.

I've long disliked lang tags on any of the literals, including
plain literals, and have long hoped that a standardized solution
for general scoping of statements, including language scope as
simply one of an infinite number of possible kinds of scopes,
would emerge.

I myself have been working on using reification for this (and
for other essential needs such as indicating source, authority,
etc.). There are challenges there too, of course. But all in all
it is an approach that is far more in the spirit of RDF than
tacking on a lang tag to a literal in the abstract syntax.

And, Martin, in case you get the impression that I am simply 
callous to multilingual issues, let me point out that (a) I have
a Master's Degree in computational linguistics, (b) am multilingual,
(c) work with over 5 million pages of modular documentation and
billions of individually managed resources which represent over
40 different languages, and (d) that language scoping is a key
component of most of what I do. So when I disregard xml:lang in
RDF/XML, it is because doing so is both a sound engineering and
also practical management decision.

The bottom line, as I see it, is that literals should have
been untidy and allowed to be subjects, which would have allowed 
one to directly qualify their occurrences for language, datatype,
whatever. If I could go back in time two years, knowing what I
know now, I would have been brutally relentless in my efforts to 
have made that happen rather than trusting that the obvious optimal
solution would emerge on its own. Well, cest la vie. 

Now, alas, we are left with the constraint that literals are 
only objects and we do indeed need to accomodate effective support
(not just the bare minimal support) for the qualification of 
language for literals. Fair enough. My answer to this is:

(a) the proper way to do language scoping is to use generic RDF
machinery such as reification, which is applicable to any statement,
and hence any literal. The only major challenge to doing this is
the ackwardness of the RDF/XML when edited manually in a text 
editor, but the same can be said for table markup in XML, and just
as XML editors alleviate the pain of table markup so too will RDF
editors alleviate the pain of statement qualification.

(e.g. InferED, http://www.intellidimension.com does this)

(b) the presence of lang tags on plain literals is a legacy feature
from M&S and is mandated by existing applications. I consider this
a design error in the original M&S that we are simply stuck with.

In my own (internal) applications, I plan to infer reification based
language scoping from any lang tags and deal with language qualification
in a proper RDF and OWL manner. I.e. 

{
  _:subj _:pred "LLL"@XX .
}
implies
{
  _:statement rdf:subject   _:subj .
  _:statement rdf:predicate _:pred .
  _:statement rdf:object    "LLL" .
  _:statement xml:lang      "XX"^^xsd:lang .
}

where xml:lang is (correctly) interpreted as a language scope,
within which the statement is asserted.

No, that's not part of the RDF standard. Oh well... 

It probably would have been better for the parser to autogenerate 
the above reification based on the xml:lang scope in the first place
rather than having lang tags in the abstract syntax, but we're way
beyond modifying the design now...

(c) the present modelling of XML literals as typed literals is 
forward looking and provides the basis for full support of all
XML Schema types in RDF models. Reverting to a treatment of XML
literals as other than typed literals is a step backwards, not
forwards.

--

So, as someone who has worked on real-world applications of document
models from pre-SGML days, including full scale IETMs, data conversion,
document modelling, and large scale modular documentation, has a
large amount of experience regarding multilingual markup and management,
has 16+ years as a software engineer, and who has at least a reasonable
understanding of RDF, I consider the present solution to be the most
optimal -- all things considered.

No, it's not perfect. Things rarely are.

I'll say no more on this topic, unless the WG decides to vote on
reopening this issue. Until then, I'll not devote any more time to it.

Regards,

Patrick

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com



> -----Original Message-----
> From: ext pat hayes [mailto:phayes@ihmc.us]
> Sent: 04 July, 2003 03:02
> To: w3c-rdfcore-wg@w3.org; Martin Duerst
> Subject: XML observation
> 
> 
> 
> Thinking about the issue we have been discussing, it occurs to me 
> that XML has been holding a tiger by the tail and is now getting 
> bitten, and this debate is a symptom of that.
> 
> XML started life as a generalized text-markup system, and for that 
> purpose it is wonderful. But it has been touted and used as something 
> much more that just text markup: it has been announced as a kind of 
> universal solvent for transmitting any kind of structure, a universal 
> general-purpose structure-description system. Unfortunately, several 
> of its features (most notably the restriction of attribute values to 
> strings, cf http://www.waterlang.org/doc/trouble_with_xml.htm) are 
> clearly serious design faults when seen from this more general point 
> of view.  But more to the present point, the use of a *language* to 
> describe structures requires us to clearly distinguish the text of 
> the description from the thing - the structure - being described. 
> Making a distinction like this is so second-nature to programmers, 
> mathematicians, logicians and linguists - in fact anyone who uses 
> technical languages professionally - that it takes a while in dealing 
> with XML to realize that XML conspicuously fails to make it, and that 
> in fact that the entire design of XML is predicated on denying it. 
> XML documents describe structure by *displaying* it, in effect: they 
> *are* the structure they describe. And of course this is entirely 
> appropriate for a markup language: it is the very essence of markup 
> that the markup *labels* the text it is the markup 'of'.
> 
> To put the same point another way, markup is inherently indexical: 
> what it means depends on where it is. If you write <title>The Way 
> Things Were</title>, what the enclosing markup says, in effect, is: 
> 'THIS enclosed text is a title'.  The same piece of markup 
> surrounding some other piece of text will implicitly refer to that 
> other piece: its meaning - what it is talking *about* - depends on 
> where in the text the markup occurs. It's location in the text is 
> part of its meaning; and when it is used with no text to mark up, 
> simply as a structural description language, this indexicality is 
> retained in the *descriptive* conventions of the resulting language: 
> so XML as a structural description convention has a built-in 
> confusion between describing structure and displaying or exhibiting 
> it, a built-in ambiguity between being a description and being a kind 
> of diagram or map, a built-in tendency to confuse use and mention.
> 
> This is clearly seen in the discussion we have been having. Martin 
> (view X) sees a piece of RDF/XML as being a kind of XML text, and the 
> resulting document as *displaying* the RDF structure in the XML. He 
> expects that RDF/XML will satisfy the textual scoping mechanisms 
> which arise naturally in any kind of layout display: in particular, 
> attributes should apply to all of the items which are in the 
> *textual* scope of the XML element.  That is the XML 'structure as 
> textual display' assumption, of course.  Patrick (view G) sees a 
> structural description language rendered (in a fairly ad-hoc way) 
> into XML syntax; the actual XML document is of relatively little 
> importance: on this view, it is the structure described by the 
> document that defines the significant, meaningful notions of scope 
> and context.  And the RDF/XML conventions clearly isolate the XML 
> 'inside' a parseType-attributed element from the XML surrounding the 
> element, so it is 'obvious' that the lang tags that may be relevant 
> to the outer context do not apply to the inner one.
> 
> In my earlier metaphor, Parick here is the teeth of the tiger. Once 
> XML is sold, and bought, as a general-purpose structural description 
> language, and is used as such by professionals who are familiar with 
> the conventions of such languages, the XML scoping conventions which 
> are inherited from its role as a markup language are no longer 
> appropriate: in fact, they are *ludicrous*: they are like a 
> children's toy in an engineering shop.  Expecting professional 
> programmers to conform to descriptive conventions defined by 
> text-markup languages is whistling at the wind.  Programmers have 
> been using more sophisticated scoping conventions for over half a 
> century; not because they didn't know better, but because they 
> *needed* to.  You can't display recursion using indexical markup, for 
> a start.
> 
> The XML publicists have bitten off more than they know how to chew. 
> If the result is XML that disobeys the XML 'conventions' and is 
> unreadable by non-programmers, should anyone be surprised?
> 
> Pat Hayes
> -- 
> ---------------------------------------------------------------------
> IHMC	(850)434 8903 or (650)494 3973   home
> 40 South Alcaniz St.	(850)202 4416   office
> Pensacola			(850)202 4440   fax
> FL 32501			(850)291 0667    cell
> phayes@ihmc.us       http://www.ihmc.us/users/phayes
> 
> 
>
Received on Friday, 4 July 2003 03:41:41 UTC