- From: Rob Shearer <rob.shearer@comlab.ox.ac.uk>
- Date: Sat, 5 Jul 2008 10:40:47 +0100
- To: public-webont-comments@w3.org
- Message-Id: <4D463620-93BC-47FF-8876-7F16DBCA0F00@comlab.ox.ac.uk>
>>> Putting aside the issue of whether or not it's possible to use >>> (only) the XML Schema datatypes to represent meaningful and >>> implementable OWL datatype value spaces, I expect that there is >>> consensus that when users were writing `xsd:float` and >>> `xsd:double` without values in OWL 1.0, what they really meant was >>> "any number". > > I don't know what users meant :) I would think that they should use > xsd:decimal if that was their intend (or perhaps the new > owl:rational/real). I'm providing you with my experience: every user I've ever spoken to about this topic has wanted the real number line. They are used to using the xsd datatypes `float` and `double` to represent number values, so they use these without values in OWL to mean "some number". My experience is that the use of xsd datatypes as value spaces in OWL 1.0 causes users to write what they don't mean. My experience is that *every* ontology using `xsd:float` and `xsd:double` without values would be better off using `xsd:decimal`, but that the user intent was "some real number" (and I should note that I'm against requiring support for `xsd:decimal` values). And my expectation is that users would be much less confused if this distinction between the types used for specific values and the types used for value spaces were clear. To repeat: as an implementor, I did willfully implement semantics contradictory to the spec, and I will do so again for OWL 2.0 if the spec is "broken" in the same way. > When I am working as a user, I generally, both in programming > languages and in kbs, am very careful about computational types and > numerical methods. Its easy to find extensive discussions in > programming language circles about the pitfalls of floats. All > things being equal, it doesn't seem to be that difficult to > recommend that they use a more suitable type such as decimal. > Indeed, that is what's been happening as more and more programming > languages bundled in decimal types. I am also a very careful programmer, and am familiar the details of the IEEE spec. All the good programmers I've ever worked with are aware of the basic problems with floats but almost always use them when they mean "any real number" anyway. The mental model I use, and that I encouraged among junior programmers, was that floats were "real numbers, but assume that they wiggle around a little all the time". Not technically correct, but a safe and useful mental model for programming. The point being that density of the number line is *not* an issue programmers encounter as a matter of course, and one for which their natural intuition might well be wrong. >>> No user ever intended to restrict the semantic space to a nowhere- >>> dense number line. If the OWL spec presupposes that most of our >>> users would a prefer a number line which does not include 1/3, my >>> choice as an implementor would be to once again ignore the spec >>> and be intentionally non-compliant. > > An alternative choice is to signal that a repair is required and > perhaps do it. I hereby signal that a repair to the OWL spec is required. (Are we really pretending that everybody thought datatypes in OWL 1.0 were fine and dandy?) >>> Doing what all my users want and expect in this case turns out to >>> be way way easier than doing what a broken spec would require. Any >>> working group who would produce such a spec would clearly be >>> putting their own interests (ease of spec authoring and political >>> considerations) above their duty to their intended users. > > I think your rhetoric flew ahead of reality here. It's not actually > easier to spec this (as the ongoing battle has shown :)). As you > well know, it's much easier to give in to Boris than not to :) I > don't believe I'm particularly motivated by political considerations > per se. I do think that departing from existing behavior > (disjointness) and normal meaning (in computer science) needs to be > done carefully. Let me expand upon my rhetoric: 1. Users want a (dense) real number line. 2. Users expect a (dense) real number line when they write `xsd:float` in OWL 1.0 ontologies. 3. OWL 1.0 implementations reason as though the `xsd:float` value space is dense. 4. The OWL 1.0 specifications state that the `xsd:float` value space is nowhere-dense. If you disagree about the first two points then it's certainly worth discussion: Alan's [investigation](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0103.html ) seems to support my experience on point 1. I have yet to see a single counter-example to point 2---and I've asked many users what they meant when they wrote their datatype restrictions. I admit I haven't done a comprehensive survey on point 3, but it's a point of fact and not opinion so we should be able to gather evidence one way or the other. The crux of my rhetoric is that points 1--3 (if you accept them) completely and utterly trump point 4. "Existing behavior" is *not* what the OWL 1.0 spec says. It's what OWL users (implementors and ontology authors) are doing. > Given that some people have already asked for NaN support (of some > form) and that one of the most championed use cases is managing > scientific computation results, I don't think we can be too quick to > alter things. I agree that it's an issue, and as a member of the public I don't intend to get mightily bogged down in details of the solution to be chosen. I'd think that NaN occurs quite rarely, and that semantics such as "any real" would suffice, but I don't have strong opinions on the issue. >>> (Note that in the course of the discussion I read on public-owl-wg >>> the notions of "dense" and "continuous" seem to have become >>> confused. I think the notion of density is probably the only one >>> that makes a difference in terms of current OWL semantics, since >>> number restrictions can cause inconsistencies in non-dense number >>> lines, but continuity is really what users have in their heads.) >>> >>> The [XML Schema datatype spec](http://www.w3.org/TR/xmlschema-2/) >>> is focused on representing particular values, not on classes of >>> values. The notion of "value spaces" is used within the spec, but >>> only in service of representation of values > > I'm not sure what you mean. It seems clear that the spec is all > about classes of values (i.e., types) and their relations. I mean that the problems that spec is designed to solve involve values, not sets of values. The most complex reasoning the XML Schema people have in mind is model checking, not satisfiability and consistency reasoning. Thus we can't necessarily expect their spec to have addressed all the issues which arise in our quite different context. >>> I strongly encourage the working group to publish a spec which >>> provides for the following types of semantic spaces: >>> >>> 1. A countably infinite, nowhere-dense datatype. I.e. the integers. >>> >>> 2. A countably infinite, dense datatype. I.e. strings. >>> >>> 3. An uncountably infinite, dense, continuous datatype. I.e. the >>> reals. > > These are all on the agenda. The first two were in OWL1 and the > third is being worked on as part of the n-ary data predicate > proposal, but is separate from it (i.e., I believe it will be added > regardless of the fate of n-ary). > > (Note that this will likely be the algebraic reals and only rational > constants. So, no transcendentals. I'd be interested in your view on > that. I can imagine adding the trans. but would prefer to defer it > until a later iteration.) This getting ridiculous---so you're saying you think there is a substantial user base who need to be able to specify that a value is the solution to some algebraic equation? I have absolutely no idea what perspective the working group is taking here---what implementor or user has expressed interest in anything other than the real number line??? Can you guys please just come up with a version of the [`numeric`](http://www.w3.org/TR/xmlschema-2/#rf-numeric ) notion? Pretty please? >>> I don't particularly care what each of these three is called; as >>> long as OWL specifies the internal semantics of these three types >>> of spaces, then it's straightforward to "implement" the datatypes >>> users will actually want in terms of them. But, of course, the >>> ability to use XML Schema Datatypes to encode specific values >>> within each of these spaces would be quite convenient > > Do you mean the lexical spaces? I mean the only time I explicitly want XML Schema is when my implementation is parsing specific values provided by the user. If you happen to re-use the XML Schema spec for other things that is for your own convenience, not mine. >>> ---and would use the XML Schema specification for *exactly* what >>> it's good at. > > The additional question is whether to require additional types that > are not the above three. Among these are float and double. My belief > is that if we are going to add such datatypes as required, and we > are going to take them from xsd, then they should reflect the > semantics of those types and our advice to users is to only use them > if they specifically intend those semantics. I'd guess that using xsd names for value spaces will just (continue to) confuse users. More importantly, and yet again, I have never ever encountered a user who would prefer to use the `float` or `double` value spaces if a `real` value space were available. If there are users who feel the other way, then please produce them---merely hypothesizing their theoretical existence does not seem useful. (I grant that the class is satisfiable. I contend that its size is vanishingly small in practice.) > The n-ary predicate definition system will, at most, be over the > core three types above (e.g., polynomial or linear inequations with > rational coefficients over the reals ). However, one can pretty > easily imagine a predicate definition system that was focused on the > floats and was sensitive to the various semantics. It wouldn't have > to be direct floating point based equations, but an interval > arithmetic system which was designed to help reason about > measurements and computations (and their errors). I care not a whit for n-ary datatypes. I might implement them if they're in the spec; I might not. But if the spec says you need to use n-ary datatypes to get real numbers, and leaves the issues raised with the `float` value space in place, I will ignore the spec and implement the real number line for unary datatypes. Just like I did for OWL 1.0. As a member of the public, that is my feedback to the working group. > I grant entirely that that use case is quite speculative at the > moment. But given that 1) we have alternatives for the "any number" > type and 2) cardinality reasoning with the floats is not very much > more difficult that with user defined finite ranges over the > integers (except for the fact that users have to do much more work > to get there), I don't think we should muck with the semantics of > floats. I strongly disagree with 2. I don't want my implementation to care about the difference between `double` and `float`, and I consider any line of code I write involving the internals of float representation to be a wasted line of code, because my users really don't care. Much more importantly, it's my job to turn your spec into user-facing documentation and support, and there is not a chance in hell I'm going to explain this issue to my users. They don't care, and they don't want the semantics you are describing. Experience with OWL 1.0 has demonstrated this. > Your feedback and insight are, as always, appreciated. I hope you > see that my position doesn't *quite* fall into the error you are > rightly concerned with. There's still the problem of educating > people about float and double, but that is a problem of long > standing :) > > I'll also admit up front that I *like* float and double as they are. > I think that IEEE binary floating point is a amazingly clever thing. > But then, I've always worked in programming languages that had > bigints and fractions available, so been spoiled for choice :) I'm a big fan of balanced ternary. But I don't intend to implement that, either. -rob
Attachments
- application/pkcs7-signature attachment: smime.p7s
Received on Saturday, 5 July 2008 09:45:55 UTC