RE: XML observation from pat hayes on 2003-07-10 (w3c-rdfcore-wg@w3.org from July 2003)

From: pat hayes <phayes@ihmc.us>
Date: Thu, 10 Jul 2003 17:28:34 -0500
To: Martin Duerst <duerst@w3.org>
Cc: :
Message-Id: <p06001209bb33350b6106@[10.0.100.11]>
>At 14:43 03/07/08 -0500, pat hayes wrote:
>>>At 11:59 03/07/07 -0500, pat hayes wrote:
>
>>>>In fact, the very existence of RDF/XML illustrates this. Like it 
>>>>or not, RDF/XML is legal XML, so can itself be enclosed in an RDF 
>>>>XML literal; but one would not expect that RDF/XML to inherit any 
>>>>attributes of the outer RDF/XML.
>>>
>>>Yes, you can. But that's not the primary goal of XML literals, and
>>>that's not what they are usually used for. Let's not design things
>>>so that we can make a point, but so that they are most useful for
>>>what they are most used for.
>>
>>Well, point taken, but we really have to design the semantics so 
>>that they are at least internally coherent for *all* uses, not just 
>>the currently popular ones. If RDF only gets used for things that 
>>it is usually used for right now then it will have been rather a 
>>failure.
>
>Understood. But as far as I'm aware of, nobody has claimed that XML
>literals with language tags would be in any kind of serious conflict
>with these other uses.

But surely it is obvious that in this case one would not expect that 
the 'enclosed' RDF/XML would inherit the XML attributes - in 
particular the lang tag - from the enclosing outer XML document, 
right? Which was my point. For example, suppose that the RDF/XML in 
the XML literal were part of an OWL imported ontology originally 
transmitted from France as RDF/XML, but the ontology which imported 
it was in the USA.  Now consider a literal in the original ontology 
which inherited its lang tag from *its* enclosing XML, but has now 
been inserted into an XML document with a different lang tag. (BTW, 
it seems to me that exactly the same reasoning applies to arbitrary 
XML markup. Enclosed quoted XML should inherit the language 
information associated with its original source, not that used by the 
descriptive element in which it happens to appear.)

There were serious conflicts which arose when we treated XML literals 
as datatyped literals with lang tags. The issue is uniformity. In 
extensions of RDF (such as OWL) which allow identity statements 
between resources, one wants to be able to infer that if two types 
are identical, then they can be used interchangeably. However, 
declaring something to be a datatype requires that the lexical spaces 
of its literals conform to some general model of lexical spaces. Now 
we have a choice.  Either *all* lexical spaces allow language tags, 
or *none* do. We tried the first option, but it rapidly gets 
unmanageable, since for example XSD requires that lang tags be 
ignored in applying datatyping rules; and I gather (though I am not 
the local guru on such matters) that there are, er, W3C philosophical 
grounds for trying to keep issues of language tagging and structural 
description separate. Certainly it became semantically and 
operationally unwieldy, to be sure.  For example, at one point we had 
the situation where lang tags were allowed in typed literal forms, 
but there had to be explicit inference rules stated for all datatypes 
except rdf:XMLLiteral which require them to be effectively ignored; 
and then we run into the equality problem,since if someone using OWL 
asserts that
  ex:hisdatatype owl:sameAs rdf:XMLLiteral .
then the typed literals

"<ex>foo</ex>"^^ex:hisdatatype
"<ex>foo</ex>"^^rdf:XMLLiteral

must be treated identically; but the first is not an XML literal so 
must obey non-XML lang tagging rules.
So we abandoned that, and decided for the second alternative.

In  my 'wet fish' message I discussed the alternative route of 
treating XML literals as not being typed at all: as you may have 
noticed, that idea met with some resistance.

>>>And by the way, coming back to one of the main points, plain literals
>>>do inherit language information from the context (if there is such
>>>information),
>>
>>True; that functionality was explicitly requested by one of our 
>>user communities who needed it for deployed large systems.
>
>Very interesting. Any pointers?

As I recall that was a gentleman from Reuters, who made a passionate 
defense of literal lang tagging in a comment to the WG made after a 
plenary meeting maybe 2 years (?) ago, referring to Reuters' use of 
RDF to attach information to paragraphs of news text encoded as RDF 
literals, where of course the language tagging is of critical 
importance. I cannot now recall his name or find the exact message in 
the archives.

>>We supplied it as requested, but with some misgivings.
>
>Does this mean that you (personally or as a group) did not like
>the idea of attaching language information to literals?

I personally and several others in the group (at that time). I have 
since seen the utility of such tags for applications where literals 
are being treated as text (with or without markup), so my personal 
misgivings on this score have been transferred to the idea of 
treating text as structural data.

I note that XSD datatyping is curiously ambivalent about language 
tagging of strings, which are arguably merely a form of text (a case 
which has been made very strongly by some members of Webont and the 
RDF WG, who feel that plain literals and literals typed with 
xsd:string should be indistinguishable.) I think we - that is the 
entire world, not just the RDF WG -

All of this trouble comes from including the lang tag as in some 
sense a 'part' of the literal itself, so that the same string with a 
different lang tag is a different literal.  IMO, a much better 
design, one that unfortunately was not available to us for legacy and 
charter reasons, would have been to have allowed literals as subjects 
and to have treated lang tags simply as an RDF property of the 
literal.  (This was considered unworkable in large part because the 
limitations of XML made it impossible to represent such RDF graphs in 
RDF/XML, by the way.) In this case, we could have insisted that plain 
literals were simply strings and therefore indistinguishable from 
xsd:strings, and could even have treated plain text and markedup text 
without any markup in it as identical.  I note that the resulting 
RDF/XML rendering would have been even less readable as XML, however, 
so might not have appealed to your i18n sensibilities.

>  Could
>that mean that you were in some way just happy to find a reason
>(or excuse) to remove them from XML literals when some people
>complained about some problems?

No, we tried valiantly to keep them attached to the XML literals, but 
could not find a workable mechanism for doing that coherently 
satisfied everyone. In case you were thinking otherwise, we did not 
set out to mischievously kill off the lang tags. I think that I may 
have written close to 20 versions of the RDF model theory document 
which differ materially only in how they handled lang tags and/or XML 
literals; this entire issue has been a thorn in our side for many 
months.

In retrospect I now think (personal opinion) that the older design 
which I outlined in my recent 'wet fish' message might have been 
better, in which XML literals are seen as basically similar to plain 
literals with an extra 'XML bit' added, rather than being subsumed 
under the datatyping rules.  However, I also see that it is rather 
late in the day to introduce such a major change to the RDF design, 
particularly as this creates new syntactic categories in the RDF 
graph and hence breaks deployed code; and since this aspect of the 
design has been in place now for over a year and has attracted no 
hostile comments until now.  I also note that such a decision to 
revert to an older design would almost certainly spark hostile 
comments from other user communities, most notably Webont, since it 
would have knock-on effects on the design of OWL in parallel ways; 
and they have (after a *great* deal of discussion) expressed 
satisfaction with our current design.

Pat

PS. I (personally) also now think that this entire XMLliteral mess, 
and several other messes we were left with and found ourselves unable 
to fully clean up (literals as subjects, range datatyping, clarifying 
literal categories) and that the XML Schema group have been mired in 
(status of strings as a datatype, identity conditions on datatype 
value spaces) are all symptoms of a basic inadequacy of XML as a 
structural specification language. They all follow pretty directly 
from XML's inability to treat its own expressions as objects - the 
fact that one cannot naturally talk about XML using XML - and this, 
IMO, can be traced fairly directly to failure of the XML designers to 
provide for the distinction between displaying text and describing 
it.  Its not as if this was difficult or revolutionary: the idea of 
using quotation as a text-objectifying device has been a standard 
part of the typesetter's art for several centuries now and the 
use/mention distinction has been part of the undergraduate training 
of every linguist, philosopher or logician since about 1940.  If XML 
had an elementary quoting mechanism, for example, or if XML 
attributes could themselves contain XML markup, then having literals 
as subjects would have been as trivial as it always should have been 
and none of these problems would ever have arisen. Perhaps this is 
all best left for a future XML WG to deal with.


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 10 July 2003 18:28:35 UTC