- From: <Patrick.Stickler@nokia.com>
- Date: Fri, 4 Jul 2003 10:41:38 +0300
- To: <phayes@ihmc.us>, <w3c-rdfcore-wg@w3.org>, <duerst@w3.org>
Thank you, Pat. Alot. Very well put. This was one of the key points I was trying to get across to Martin. RDF/XML is a means to an end. Not the end. And there are many, many examples of using XML for data interchange which disregard much if not most of the machinery that XML offers to those for whom the XML is the final and primary form of consumption. Years back, when XML was first coming on the scene, I noted that there was arising (and still exists, well entrenched) a tension between two very different uses of XML, as you point out, (a) use as a markup formalism for document modelling and (b) use as a markup formalism for data structures. The former (a) are extensions to the XML framework, where it is expected that the full richness of the XML machinery applies. The latter (b) use XML to essentially "tunnel" data between systems through XML-savvy processes and tools, and the XML machinery used for such interchange relates to the encapsulation of the data, not the abstract model by which the data itself is ultimately interpreted. I.e. the XML machinery applies only to the semantics of the encapsulation, not to the content encapsulated. These tensions come to a head in cases such as RDF/XML when you use XML as a markup formalism for a data structure which encapsulates data that itself uses XML as a markup formalism for document modelling, since, as Pat points out, XML itself fails to provide any distinction between the encapsulation and the encapsulated. In RDF/XML, as Pat again points out, rdf:parseType="Literal" indicates that boundary between the data structure markup scope and some internal, literal-specific, document markup scope; between the encapsulation and the encapsulated. An RDF literal is precisely that, a *literal*. It should not be infected with the contextual characteristics of the language used to describe the data structures which encapsulate it. I've long disliked lang tags on any of the literals, including plain literals, and have long hoped that a standardized solution for general scoping of statements, including language scope as simply one of an infinite number of possible kinds of scopes, would emerge. I myself have been working on using reification for this (and for other essential needs such as indicating source, authority, etc.). There are challenges there too, of course. But all in all it is an approach that is far more in the spirit of RDF than tacking on a lang tag to a literal in the abstract syntax. And, Martin, in case you get the impression that I am simply callous to multilingual issues, let me point out that (a) I have a Master's Degree in computational linguistics, (b) am multilingual, (c) work with over 5 million pages of modular documentation and billions of individually managed resources which represent over 40 different languages, and (d) that language scoping is a key component of most of what I do. So when I disregard xml:lang in RDF/XML, it is because doing so is both a sound engineering and also practical management decision. The bottom line, as I see it, is that literals should have been untidy and allowed to be subjects, which would have allowed one to directly qualify their occurrences for language, datatype, whatever. If I could go back in time two years, knowing what I know now, I would have been brutally relentless in my efforts to have made that happen rather than trusting that the obvious optimal solution would emerge on its own. Well, cest la vie. Now, alas, we are left with the constraint that literals are only objects and we do indeed need to accomodate effective support (not just the bare minimal support) for the qualification of language for literals. Fair enough. My answer to this is: (a) the proper way to do language scoping is to use generic RDF machinery such as reification, which is applicable to any statement, and hence any literal. The only major challenge to doing this is the ackwardness of the RDF/XML when edited manually in a text editor, but the same can be said for table markup in XML, and just as XML editors alleviate the pain of table markup so too will RDF editors alleviate the pain of statement qualification. (e.g. InferED, http://www.intellidimension.com does this) (b) the presence of lang tags on plain literals is a legacy feature from M&S and is mandated by existing applications. I consider this a design error in the original M&S that we are simply stuck with. In my own (internal) applications, I plan to infer reification based language scoping from any lang tags and deal with language qualification in a proper RDF and OWL manner. I.e. { _:subj _:pred "LLL"@XX . } implies { _:statement rdf:subject _:subj . _:statement rdf:predicate _:pred . _:statement rdf:object "LLL" . _:statement xml:lang "XX"^^xsd:lang . } where xml:lang is (correctly) interpreted as a language scope, within which the statement is asserted. No, that's not part of the RDF standard. Oh well... It probably would have been better for the parser to autogenerate the above reification based on the xml:lang scope in the first place rather than having lang tags in the abstract syntax, but we're way beyond modifying the design now... (c) the present modelling of XML literals as typed literals is forward looking and provides the basis for full support of all XML Schema types in RDF models. Reverting to a treatment of XML literals as other than typed literals is a step backwards, not forwards. -- So, as someone who has worked on real-world applications of document models from pre-SGML days, including full scale IETMs, data conversion, document modelling, and large scale modular documentation, has a large amount of experience regarding multilingual markup and management, has 16+ years as a software engineer, and who has at least a reasonable understanding of RDF, I consider the present solution to be the most optimal -- all things considered. No, it's not perfect. Things rarely are. I'll say no more on this topic, unless the WG decides to vote on reopening this issue. Until then, I'll not devote any more time to it. Regards, Patrick -- Patrick Stickler Nokia, Finland patrick.stickler@nokia.com > -----Original Message----- > From: ext pat hayes [mailto:phayes@ihmc.us] > Sent: 04 July, 2003 03:02 > To: w3c-rdfcore-wg@w3.org; Martin Duerst > Subject: XML observation > > > > Thinking about the issue we have been discussing, it occurs to me > that XML has been holding a tiger by the tail and is now getting > bitten, and this debate is a symptom of that. > > XML started life as a generalized text-markup system, and for that > purpose it is wonderful. But it has been touted and used as something > much more that just text markup: it has been announced as a kind of > universal solvent for transmitting any kind of structure, a universal > general-purpose structure-description system. Unfortunately, several > of its features (most notably the restriction of attribute values to > strings, cf http://www.waterlang.org/doc/trouble_with_xml.htm) are > clearly serious design faults when seen from this more general point > of view. But more to the present point, the use of a *language* to > describe structures requires us to clearly distinguish the text of > the description from the thing - the structure - being described. > Making a distinction like this is so second-nature to programmers, > mathematicians, logicians and linguists - in fact anyone who uses > technical languages professionally - that it takes a while in dealing > with XML to realize that XML conspicuously fails to make it, and that > in fact that the entire design of XML is predicated on denying it. > XML documents describe structure by *displaying* it, in effect: they > *are* the structure they describe. And of course this is entirely > appropriate for a markup language: it is the very essence of markup > that the markup *labels* the text it is the markup 'of'. > > To put the same point another way, markup is inherently indexical: > what it means depends on where it is. If you write <title>The Way > Things Were</title>, what the enclosing markup says, in effect, is: > 'THIS enclosed text is a title'. The same piece of markup > surrounding some other piece of text will implicitly refer to that > other piece: its meaning - what it is talking *about* - depends on > where in the text the markup occurs. It's location in the text is > part of its meaning; and when it is used with no text to mark up, > simply as a structural description language, this indexicality is > retained in the *descriptive* conventions of the resulting language: > so XML as a structural description convention has a built-in > confusion between describing structure and displaying or exhibiting > it, a built-in ambiguity between being a description and being a kind > of diagram or map, a built-in tendency to confuse use and mention. > > This is clearly seen in the discussion we have been having. Martin > (view X) sees a piece of RDF/XML as being a kind of XML text, and the > resulting document as *displaying* the RDF structure in the XML. He > expects that RDF/XML will satisfy the textual scoping mechanisms > which arise naturally in any kind of layout display: in particular, > attributes should apply to all of the items which are in the > *textual* scope of the XML element. That is the XML 'structure as > textual display' assumption, of course. Patrick (view G) sees a > structural description language rendered (in a fairly ad-hoc way) > into XML syntax; the actual XML document is of relatively little > importance: on this view, it is the structure described by the > document that defines the significant, meaningful notions of scope > and context. And the RDF/XML conventions clearly isolate the XML > 'inside' a parseType-attributed element from the XML surrounding the > element, so it is 'obvious' that the lang tags that may be relevant > to the outer context do not apply to the inner one. > > In my earlier metaphor, Parick here is the teeth of the tiger. Once > XML is sold, and bought, as a general-purpose structural description > language, and is used as such by professionals who are familiar with > the conventions of such languages, the XML scoping conventions which > are inherited from its role as a markup language are no longer > appropriate: in fact, they are *ludicrous*: they are like a > children's toy in an engineering shop. Expecting professional > programmers to conform to descriptive conventions defined by > text-markup languages is whistling at the wind. Programmers have > been using more sophisticated scoping conventions for over half a > century; not because they didn't know better, but because they > *needed* to. You can't display recursion using indexical markup, for > a start. > > The XML publicists have bitten off more than they know how to chew. > If the result is XML that disobeys the XML 'conventions' and is > unreadable by non-programmers, should anyone be surprised? > > Pat Hayes > -- > --------------------------------------------------------------------- > IHMC (850)434 8903 or (650)494 3973 home > 40 South Alcaniz St. (850)202 4416 office > Pensacola (850)202 4440 fax > FL 32501 (850)291 0667 cell > phayes@ihmc.us http://www.ihmc.us/users/phayes > > >
Received on Friday, 4 July 2003 03:41:41 UTC