- From: pat hayes <phayes@ihmc.us>
- Date: Wed, 13 Aug 2003 22:10:13 -0700
- To: Sandro Hawke <sandro@w3.org>
- Cc: www-archive@w3.org
>I'm trying to understand and be able to explain why the Last Call >design for XML Literals doesn't work. But I just don't see it. The problem is that we have to treat all datatypes uniformly in the sense that either they all can have language tags or none of them can (or else OWL will break). Allowing them everywhere means that all datatypes except rdf:XMLLiteral have to have inference rules that allow language tags to be added or omitted and it makes no difference to anything. This is inelegant, to say the least, but it is also inefficient, and it is kind of irrational. The part that breaks is when OWL asserts that rdf:XMLliteral is equal to some other datatype; then this one datatype has to both be tag-sensitive and tag-insensitive at the same time. >Jeremy gave an apparent paradox [1], but it seems to me to be based on >the faulty assumption that any unknown datatype is distinct from >rdf:XMLLiteral. > >That is, > >(1) <eg:a> <eg:p> "foo"@en^^<eg:d> . > >does NOT entail > >(2) <eg:a> <eg:p> "foo"@fr^^<eg:d> . > >where he said it did. It does in the LC design since datatypes (other than rdf:XMLLiteral) are *required* to be language-tag-insensitive. In fact the only reason the tags are there at all is because they were felt to be needed for XML and because we have to treat all datatypes uniformly. > >I would say that > >(3a) <eg:a> <eg:p> "foo"@en^^<eg:d> . >(3b) <eg:d> owl:differentFrom <rdf:XMLLiteral>. > >does entail (2) Why? Maybe <ed:d> is language-sensitive as well, right? (If not, then the problem arises when you are told that it is equal to rdf:XMLliteral.) >Of course that entailment only holds in OWL Full, but the spirit of it >-- that its valid to infer (2) only if you somehow know the datatype >is distinct from rdf:XMLLiteral -- makes perfect sense in simple RDF. The issue arises the other way round: if you know that <eg:dd> owl:sameAs rdf:XMLLiteral then it ought to be the case that you can intersubstitute one for another; but they obey different rules regarding language tagging, since <eg:dd> ignores language tagging but rdf:XMLliteral does not (in the LC design, that is.) >I don't see anything counter-intuitive or problematic here. When >reasoning about datatypes about which one has no knowledge, one will >not be able to naively discard language tagging, but I really don't >see the problem with that. I see many problems with that. Take the XSD types, for example: are they lang-tag-sensitive? The spec just says that the lexical spaces are sets of strings; it doesn't mention languages: is that enough to know that they are tag-insensitive? In fact, there are requirements on integer representations that they *not* be language sensitive, for example: things like using a comma instead of dot to indicate the decimal point, often done in Germany, are considered to be locale rather than language issues. We discussed this in the WG long ago and were told fairly firmly that it would be a fundamental mistake to consider lexical forms of datatypes to be language-sensitive, ans that we should definitely not consider things like "12.34"@en^^xsd:decimal "12,34"@ge^^xsd:decimal to be equivalent. Thus rdf:XMLliteral was always an exception in this regard. >Personally, I would rather see rdf:XMLLiteral be considered one >instance of a class of language-sensitive datatypes It is unique in that class. In fact, one could reasonably argue that it is best not considered a datatype *for this very reason*; but that is not a direction that is now open to us without controversy. >, so that instead >of (3b) we'd have something like > >(4b) <eg:d> rdf:type <rdf:LanguageInsensitiveDatatype>. > >which would be a part of the theory for <eg:d>. The theory for >rdf:XMLLiteral would of course say it was an instance of >rdf:LanguageSensitiveDatatype. > >Do either of these designs work for you? Not really. That is, they might work for ME, but I am sure that there will be many others for whom they will not work. (My own prefernce would be to revert to an even older design in which rdf:XMLLiteral is not considered a datatype at all, and XML literals are a distinct literal form closely similar to plain literals, which could then have language tags without causing confusion. This idea however was resoundingly rejected by other WG members.) > The first has the tremendous >advantage of differing from the Last Call semantics only as much as >needed to fix the error. Well, but it seems to me (and I was already rather feeling this at LC, but it was too great a change to make in a short time) that the error was allowing language tags in typed literals in the first place. We corrected that error, I am glad to say. >The second is perhaps a greater change, but >it's hard to imagine anyone objecting, and it avoids the potential >disaster of someday finding another language-sensitive datatype. If I thought there was the slightest chance of this ever happening I might be more inclined to take the idea seriously, but I do not. The whole discussion has become warped by confusing two different issues: the representation of data, and the representation of text. XML embodies this confusion in its very design., but I am confident that the world will get this business sorted out reasonably clearly in the relatively near future. Pat > > -- sandro > >[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0314.html -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32501 (850)291 0667 cell phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Thursday, 14 August 2003 01:10:09 UTC