Re: Datatype disjointness implemented

Trimming chris from the discussion.

On 18 Mar 2009, at 09:18, Alan Ruttenberg wrote:

> Hi Bijan,
>
> I want there to be an explicit treatment of this in our specification.

There is one already. The notion of copy pretty accurately  
characterizes in a way that's sufficient (and natural, in my  
experience) for implementors. Users don't need to know about the  
underlying representation as it's an implementation detail. They do  
need to know that they are disjoint, but such disjointness is common  
in programming languages and other numeric contexts, so I don't think  
giving special further attention to it is appropriate or likely to be  
more helpful than confusing.

For example, the tagged treatment you proposed is *exactly* a  
treatment that relatively unsophisticated people will be confused on.  
Indeed, in your very email proposing it you proposed it as a "fix" to  
a "contradiction" in the schema spec that would have *consequences*  
on whether we thought we were aligning with the schema spec.

So, I would oppose any treatment that smuggled in irrelevant,  
essentially implementation, details into the semantics. Our model  
theory is a *presentation* of the semantics. There are many  
equivalent presentations (e.g., translation to FOL, axiomatic, game  
theoretic, etc.) and we should not allow mere details of the  
presentation to be read as *part* of the specification.

> I'll note that there are other comments that seem to ask for this
> level of concern - for example Boris recent note on datetime makes
> reference to Xpath functions on dates that aren't applicable to OWL as
> we don't specify their behavior. Or consider the care in defining the
> value space of rdf:text.

I object to the idea that somehow less care has gone into the  
numerics or more care has gone into rdf:text. That's just wrong.

For example, it's clear now that some people will read the value  
space as requiring pairs per se. (E.g., in an implementation.) There  
is, of course, no requirement for that. I can implement it as a  
string with a certain syntax. I can even reason about the value space  
by reasoning about such a set of strings, since they are isomorphic.

This is totally standard in the Semantic Web Stack. It's well  
understood that you don't have to literally represent data literals  
as a triple (type, lexical form, value form). Furthermore the order  
of that triple doesn't matter, etc. etc.

If you are reading the rdf:text in a way that suggests otherwise,  
then there is a potential problem with the rdf:text spec.

Second, the first criterion of a spec is correctness and unambiguity.  
The current spec has those. Next, I would argue, is intelligibility  
to the key audience. The key audience for this level of detail are  
implementors and semanticist (e.g., those designing new services,  
etc.) The current spec is more than adequate. Then, it's great if the  
general public can follow without problem. But there will always be  
parts that require acquiring more sophistications.

> Moreover while I agree that there are a number of ways to define the
> value space such that disjointness is achieved,

The simplest way is to just say it. "They're disjoint". That  
sufficient for XML Schema and, I think, sufficient for us.

> and that within
> certain circles this is well understood, experience discussing this
> with a number of colleagues confirms that it can not be assumed that
> the average reader will read the XML schema specification with the
> level of sophistication that you do.

But we don't need to presume that. As long as the spec is correct and  
there is sufficient common understanding (esp among implementors)  
then we are in good shape.

To properly avoid *all* confusion would require a discussion like my  
prior email. There's just too much to swap in.

But, perhaps this will suffice?

"Please note that the value space of float is a subset of a set of  
entities which are isomorphic to the decimals between ... and ... but  
numerically distinct from them. As a consequence, 1.0^double and  
1.0^decimal denote analogous numbers with respect to many arithmetic  
operations (e.g., addition) but which are, in fact, not the identical  
number."

> Therefore I consider it
> appropriate to have this specified.

This is a critical point of disagreement. You are conflating a  
*style* of specification (which you, reasonably, find more  
comfortable) with the *fact* of the specification. In point of fact,  
it is fully specified. In point of presentation, you would prefer a  
rdf:text style presentation. I would be *fine* with that as an  
editorial matter *iff* you hadn't imported facts about the *style of  
presentation* into the specification itself. (E.g., by suggesting  
that it would or should affect decisions of the working group.) If  
the rdf:text style of presentation is confusing people into thinking  
that the non-relevant structural aspects of the objects of the value  
space are *significant* to the semantics, then we do have a problem.

> I take your point about not
> wanting to commit to a particular value space if it is not necessary,
> and if there is a way to have a normative specification

So, my earlier point holds. We have that now. It is fully specified  
to an appropriate level of detail. If you prefer the presentation I  
gave above, I'm happy with that too.

> that avoids
> doing so then I'm interested in having a look.

See above.

> I'd rather not go back and forth repeating what we've said again. I
> believe that I understood what you were saying the first time you said
> it,

There were two points: 1) You claimed that the XML Schema spec was  
literally inconsistent so we have to repair that inconsistency. This  
was based on a faulty reading of "number" that is not just at odds  
with fundamental facts about mathematics but also the overwhelming  
consensus of people who have read and work with the spec (see, e.g.,  
the swbp note by two acknowledged, long standing experts in RDF and  
OWL datatypes). Rob Shearer *misread* the spec because the  
disjointness point is *lexically far away* from the initial value  
space presentation. But he immediately, when the two bits of text  
were brought together said, roughly, "oh! they are colored? Eww".  
There was no *confusion* or *contradiction*, just a distaste for  
*what* was specified. I understand you share that distaste, but the  
fact of a distaste should not lead us to conclude that there's a  
contradiction.

Since there is no contradiction, there is no need to *repair* the  
specification. That is, the language we have defined is perfectly  
consistent and coherent and, really, quite standard.

It is a totally separate issue whether you find the presentation  
sufficiently easy to follow. It's clear that the XML Schema document  
is a bear (not because it doesn't *sufficiently* specify, but because  
the presentation requires reading the whole document instead of just  
the obvious point of definition), but it's not clear that we need to  
more than mitigate the presentation. We have. Boris and Rob *both*  
didn't get that the primitive types *were* disjoint (by fiat). But  
when the text was pointed out, they both immediately *understood* the  
spec.

> and I simply disagree with you that it need not be specified in a
> clearer way.

I think no one will read our spec and have any doubt as to whether  
the types are disjoint.  Please give an example of someone who made  
that error. Thus, I believe our specification is adequate. It says  
what we meant it to say without ambiguity.

Personally, I'd be very surprised if there was a significant base of  
readers who are sophisticated enough to, on the one hand, demand that  
the disjointness be a *consequence* of a structural feature of the  
value space (that is, require us to have a single mathematical  
structure in which there were an univocal set of numbers and any  
distinction would have to come from structured objects) *and* we  
cannot reasonably require them to acquire more sophistication.

Catering for such an inbetween group seems quite imprudent, esp. if  
it leads them into other errors. But I offer a variant of my text above.

Cheers,
Bijan.

Received on Wednesday, 18 March 2009 11:18:21 UTC