- From: pat hayes <phayes@ihmc.us>
- Date: Thu, 10 Jul 2003 17:28:34 -0500
- To: Martin Duerst <duerst@w3.org>
- Cc: :
>At 14:43 03/07/08 -0500, pat hayes wrote: >>>At 11:59 03/07/07 -0500, pat hayes wrote: > >>>>In fact, the very existence of RDF/XML illustrates this. Like it >>>>or not, RDF/XML is legal XML, so can itself be enclosed in an RDF >>>>XML literal; but one would not expect that RDF/XML to inherit any >>>>attributes of the outer RDF/XML. >>> >>>Yes, you can. But that's not the primary goal of XML literals, and >>>that's not what they are usually used for. Let's not design things >>>so that we can make a point, but so that they are most useful for >>>what they are most used for. >> >>Well, point taken, but we really have to design the semantics so >>that they are at least internally coherent for *all* uses, not just >>the currently popular ones. If RDF only gets used for things that >>it is usually used for right now then it will have been rather a >>failure. > >Understood. But as far as I'm aware of, nobody has claimed that XML >literals with language tags would be in any kind of serious conflict >with these other uses. But surely it is obvious that in this case one would not expect that the 'enclosed' RDF/XML would inherit the XML attributes - in particular the lang tag - from the enclosing outer XML document, right? Which was my point. For example, suppose that the RDF/XML in the XML literal were part of an OWL imported ontology originally transmitted from France as RDF/XML, but the ontology which imported it was in the USA. Now consider a literal in the original ontology which inherited its lang tag from *its* enclosing XML, but has now been inserted into an XML document with a different lang tag. (BTW, it seems to me that exactly the same reasoning applies to arbitrary XML markup. Enclosed quoted XML should inherit the language information associated with its original source, not that used by the descriptive element in which it happens to appear.) There were serious conflicts which arose when we treated XML literals as datatyped literals with lang tags. The issue is uniformity. In extensions of RDF (such as OWL) which allow identity statements between resources, one wants to be able to infer that if two types are identical, then they can be used interchangeably. However, declaring something to be a datatype requires that the lexical spaces of its literals conform to some general model of lexical spaces. Now we have a choice. Either *all* lexical spaces allow language tags, or *none* do. We tried the first option, but it rapidly gets unmanageable, since for example XSD requires that lang tags be ignored in applying datatyping rules; and I gather (though I am not the local guru on such matters) that there are, er, W3C philosophical grounds for trying to keep issues of language tagging and structural description separate. Certainly it became semantically and operationally unwieldy, to be sure. For example, at one point we had the situation where lang tags were allowed in typed literal forms, but there had to be explicit inference rules stated for all datatypes except rdf:XMLLiteral which require them to be effectively ignored; and then we run into the equality problem,since if someone using OWL asserts that ex:hisdatatype owl:sameAs rdf:XMLLiteral . then the typed literals "<ex>foo</ex>"^^ex:hisdatatype "<ex>foo</ex>"^^rdf:XMLLiteral must be treated identically; but the first is not an XML literal so must obey non-XML lang tagging rules. So we abandoned that, and decided for the second alternative. In my 'wet fish' message I discussed the alternative route of treating XML literals as not being typed at all: as you may have noticed, that idea met with some resistance. >>>And by the way, coming back to one of the main points, plain literals >>>do inherit language information from the context (if there is such >>>information), >> >>True; that functionality was explicitly requested by one of our >>user communities who needed it for deployed large systems. > >Very interesting. Any pointers? As I recall that was a gentleman from Reuters, who made a passionate defense of literal lang tagging in a comment to the WG made after a plenary meeting maybe 2 years (?) ago, referring to Reuters' use of RDF to attach information to paragraphs of news text encoded as RDF literals, where of course the language tagging is of critical importance. I cannot now recall his name or find the exact message in the archives. >>We supplied it as requested, but with some misgivings. > >Does this mean that you (personally or as a group) did not like >the idea of attaching language information to literals? I personally and several others in the group (at that time). I have since seen the utility of such tags for applications where literals are being treated as text (with or without markup), so my personal misgivings on this score have been transferred to the idea of treating text as structural data. I note that XSD datatyping is curiously ambivalent about language tagging of strings, which are arguably merely a form of text (a case which has been made very strongly by some members of Webont and the RDF WG, who feel that plain literals and literals typed with xsd:string should be indistinguishable.) I think we - that is the entire world, not just the RDF WG - All of this trouble comes from including the lang tag as in some sense a 'part' of the literal itself, so that the same string with a different lang tag is a different literal. IMO, a much better design, one that unfortunately was not available to us for legacy and charter reasons, would have been to have allowed literals as subjects and to have treated lang tags simply as an RDF property of the literal. (This was considered unworkable in large part because the limitations of XML made it impossible to represent such RDF graphs in RDF/XML, by the way.) In this case, we could have insisted that plain literals were simply strings and therefore indistinguishable from xsd:strings, and could even have treated plain text and markedup text without any markup in it as identical. I note that the resulting RDF/XML rendering would have been even less readable as XML, however, so might not have appealed to your i18n sensibilities. > Could >that mean that you were in some way just happy to find a reason >(or excuse) to remove them from XML literals when some people >complained about some problems? No, we tried valiantly to keep them attached to the XML literals, but could not find a workable mechanism for doing that coherently satisfied everyone. In case you were thinking otherwise, we did not set out to mischievously kill off the lang tags. I think that I may have written close to 20 versions of the RDF model theory document which differ materially only in how they handled lang tags and/or XML literals; this entire issue has been a thorn in our side for many months. In retrospect I now think (personal opinion) that the older design which I outlined in my recent 'wet fish' message might have been better, in which XML literals are seen as basically similar to plain literals with an extra 'XML bit' added, rather than being subsumed under the datatyping rules. However, I also see that it is rather late in the day to introduce such a major change to the RDF design, particularly as this creates new syntactic categories in the RDF graph and hence breaks deployed code; and since this aspect of the design has been in place now for over a year and has attracted no hostile comments until now. I also note that such a decision to revert to an older design would almost certainly spark hostile comments from other user communities, most notably Webont, since it would have knock-on effects on the design of OWL in parallel ways; and they have (after a *great* deal of discussion) expressed satisfaction with our current design. Pat PS. I (personally) also now think that this entire XMLliteral mess, and several other messes we were left with and found ourselves unable to fully clean up (literals as subjects, range datatyping, clarifying literal categories) and that the XML Schema group have been mired in (status of strings as a datatype, identity conditions on datatype value spaces) are all symptoms of a basic inadequacy of XML as a structural specification language. They all follow pretty directly from XML's inability to treat its own expressions as objects - the fact that one cannot naturally talk about XML using XML - and this, IMO, can be traced fairly directly to failure of the XML designers to provide for the distinction between displaying text and describing it. Its not as if this was difficult or revolutionary: the idea of using quotation as a text-objectifying device has been a standard part of the typesetter's art for several centuries now and the use/mention distinction has been part of the undergraduate training of every linguist, philosopher or logician since about 1940. If XML had an elementary quoting mechanism, for example, or if XML attributes could themselves contain XML markup, then having literals as subjects would have been as trivial as it always should have been and none of these problems would ever have arisen. Perhaps this is all best left for a future XML WG to deal with. -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32501 (850)291 0667 cell phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Thursday, 10 July 2003 18:28:35 UTC