Re: (kinda urgent) cheap solution to XML Literals problem... from pat hayes on 2003-08-14 (www-archive@w3.org from August 2003)

From: pat hayes <phayes@ihmc.us>
Date: Wed, 13 Aug 2003 22:10:13 -0700
To: Sandro Hawke <sandro@w3.org>
Cc: www-archive@w3.org
Message-Id: <p06001a09bb60c1cfedcd@[10.0.1.2]>
>I'm trying to understand and be able to explain why the Last Call
>design for XML Literals doesn't work.   But I just don't see it.

The problem is that we have to treat all datatypes uniformly in the 
sense that either they all can have language tags or none of them can 
(or else OWL will break). Allowing them everywhere means that all 
datatypes except rdf:XMLLiteral have to have inference rules that 
allow language tags to be added or omitted and it makes no difference 
to anything. This is inelegant, to say the least, but it is also 
inefficient, and it is kind of irrational.

The part that breaks is when OWL asserts that rdf:XMLliteral is equal 
to some other datatype; then this one datatype has to both be 
tag-sensitive and tag-insensitive at the same time.

>Jeremy gave an apparent paradox [1], but it seems to me to be based on
>the faulty assumption that any unknown datatype is distinct from
>rdf:XMLLiteral.
>
>That is,
>
>(1)    <eg:a> <eg:p> "foo"@en^^<eg:d> .
>
>does NOT entail
>
>(2)    <eg:a> <eg:p> "foo"@fr^^<eg:d> .
>
>where he said it did.

It does in the LC design since datatypes (other than rdf:XMLLiteral) 
are *required* to be language-tag-insensitive.  In fact the only 
reason the tags are there at all is because they were felt to be 
needed for XML and because we have to treat all datatypes uniformly.

>
>I would say that
>
>(3a)   <eg:a> <eg:p> "foo"@en^^<eg:d> .
>(3b)   <eg:d> owl:differentFrom <rdf:XMLLiteral>.
>
>does entail (2)

Why? Maybe <ed:d> is language-sensitive as well, right? (If not, then 
the problem arises when you are told that it is equal to 
rdf:XMLliteral.)

>Of course that entailment only holds in OWL Full, but the spirit of it
>-- that its valid to infer (2) only if you somehow know the datatype
>is distinct from rdf:XMLLiteral -- makes perfect sense in simple RDF.

The issue arises the other way round: if you know that <eg:dd> 
owl:sameAs rdf:XMLLiteral then it ought to be the case that you can 
intersubstitute one for another; but they obey different rules 
regarding language tagging, since <eg:dd> ignores language tagging 
but rdf:XMLliteral does not (in the LC design, that is.)

>I don't see anything counter-intuitive or problematic here.  When
>reasoning about datatypes about which one has no knowledge, one will
>not be able to naively discard language tagging, but I really don't
>see the problem with that.

I see many problems with that.  Take the XSD types, for example: are 
they lang-tag-sensitive? The spec just says that the lexical spaces 
are sets of strings; it doesn't mention languages: is that enough to 
know that they are tag-insensitive?

In fact, there are requirements on integer representations that they 
*not* be language sensitive, for example: things like using a comma 
instead of dot to indicate the decimal point, often done in Germany, 
are considered to be locale rather than language issues. We discussed 
this in the WG long ago and were told fairly firmly that it would be 
a fundamental mistake to consider lexical forms of datatypes to be 
language-sensitive, ans that we should definitely not consider things 
like

"12.34"@en^^xsd:decimal
"12,34"@ge^^xsd:decimal

to be equivalent. Thus rdf:XMLliteral was always an exception in this regard.

>Personally, I would rather see rdf:XMLLiteral be considered one
>instance of a class of language-sensitive datatypes

It is unique in that class. In fact, one could reasonably argue that 
it is best not considered a datatype *for this very reason*; but that 
is not a direction that is now open to us without controversy.

>, so that instead
>of (3b) we'd have something like
>
>(4b)   <eg:d> rdf:type <rdf:LanguageInsensitiveDatatype>.
>
>which would be a part of the theory for <eg:d>.   The theory for
>rdf:XMLLiteral would of course say it was an instance of
>rdf:LanguageSensitiveDatatype.
>
>Do either of these designs work for you?

Not really. That is, they might work for ME, but I am sure that there 
will be many others for whom they will not work. (My own prefernce 
would be to revert to an even older design in which rdf:XMLLiteral is 
not considered a datatype at all, and XML literals are a distinct 
literal form closely similar to plain literals, which could then have 
language tags without causing confusion. This idea however was 
resoundingly rejected by other WG members.)

>  The first has the tremendous
>advantage of differing from the Last Call semantics only as much as
>needed to fix the error.

Well, but it seems to me (and I was already rather feeling this at 
LC, but it was too great a change to make in a short time) that the 
error was allowing language tags in typed literals in the first 
place. We corrected that error, I am glad to say.

>The second is perhaps a greater change, but
>it's hard to imagine anyone objecting, and it avoids the potential
>disaster of someday finding another language-sensitive datatype.

If I thought there was the slightest chance of this ever happening I 
might be more inclined to take the idea seriously, but I do not. The 
whole discussion has become warped by confusing two different issues: 
the representation of data, and the representation of text. XML 
embodies this confusion in its very design., but I am confident that 
the world will get this business sorted out reasonably clearly in the 
relatively near future.

Pat

>
>      -- sandro
>
>[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Apr/0314.html


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 14 August 2003 01:10:09 UTC