W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > January to March 2000

Re: XML Schema Datatype Questions

From: Daniel Potter <dpotter@mitre.org>
Date: Thu, 06 Jan 2000 10:12:24 -0500
Message-ID: <3874B0D8.2089A6D1@mitre.org>
To: Curt Arnold <carnold@houston.rr.com>, xml-dev@ic.ac.uk, www-xml-schema-comments@w3.org
My basic question was really how to include NaN in ranges (although
that's not quite how I phrased it, sorry).  Basically, when I'm writing
a class that implements a datatype, it contains all the facets with
"default" values.  It sets the min/max values to positive/negative
infinity appropriately (as they are represented internally by floats). 
Unfortunately, since all range operations (<, >, etc.) all return false
when NaN is encountered, this means that a need another way of
indicating NaN is legal.

The real problem and what I really meant to ask is how to indicate NaN
should or shouldn't be included (same as your last question).  I can
imagine times when a schema writer would wish to include NaN in a given
range.  A simple example would be using NaN to indicate a float value is
unknown; in a math oriented schema, it might be needed to indicate that
no representable answer was generated.  There are also times when NaN
should not be included, and this is not necessarily all times when there
is a given range to the values.

Curt Arnold wrote:

[snip]

> 
> With the exception of the enumeration facet, all the other facets appear to
> form one big AND.  I'd say the logical formulation is all of the
> non-enumeration facets must be true and one of the enumeration facets must
> be true and you must acceptible to the base type.  (This distinct treatment
> of enumeration is undesirable in my eye.  I'd like something closer to the
> previous form or my earlier suggestion
> http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0022.html where the entire enumeration is one facet and the specific literals are
> child elements of it.  I'm planning on updating that suggestion for the new
> draft.).

I like that suggestion.  Something like:

<enumeration>
  <value>1</value>
  <value>2</value>
  <value>NaN</value>
</enumeration>

?  Seems like it would look nicer too.

> 
> For example,
> 
> <datatype name="oblivion" source="double">
>     <minInclusive value="3"/>
>     <maxInclusive value="4"/>
>     <enumeration value="NAN"/>
> </datatype>
> 
> should have an empty lexical space.  Only NaN passes the enumeration test
> and it is not in the range specified.  Your scenario requires some sort of
> "or" facet and "and" facet.
> 
> <datatype source="double">
>     <or>
>         <and>
>             <minInclusive value="3"/>
>             <maxInclusive value="4"/>
>        </and>
>         <enumeration value="NAN"/>
>     </or>
> </datatype>
> 
> Note: If we were to do that, then add a <not> element to.  That would make
> it simple to disallow a previous enumeration value: such as
> 
> <datatype name="Mon-Thurs" source="Mon-Friday">
>     <not>
>         <enumeration value="Friday"/>
>     </not>
> </datatype>
> 
> There is no constraint saying that
> > > enumeration and min/max
> > > values cannot be set together, which leads me to
> > > believe that they are
> > > combined to describe legal values.)
> 
> With the previous interpretation, you should be able to have any combination
> of facets but they are interpreted as a big AND.  With that explicit
> interpretation, we don't have any ambiguity is someone specifies both a
> minInclusive and a minExclusive and don't have to try to figure which one to
> keep, we can just evaluation both and they both better be true.  Just like
> Java or C++ would allow you to do
> 
>     if(x > 3 && x >= 4)
> 
> The behavior is defined, though the duplicate specification comes at slight
> performance penalty.

This is exactly where I was coming from: creating a datatype with no
legal values by enumerating only illegal values.  According to Ashok
Malhotra in an answer to basically this same question, this is illegal
because it is conflict between facet values (similar to multiple
max(In|Ex)clusive statements).

I think that <and>, <or>, and <not> sections make sense.  There is no
other way to constrain a Mon-Friday type into a Mon-Thur type, although
conceivably Mon-Friday is an extension of Mon-Thur.  But if Mon-Friday
is the datatype from the base schema which someone is basing a new
schema on, then being able to restrict it even further would definately
be useful.

> > > Also relating to float, are the characters case
> > > sensitive?  Can I use
> > > "inf" as a value or does it need to be "INF"?  Is
> > > "6.22e22" legal?  Or
> > > does it need to be "6.22E22"?
> 
> Formulation [34] would indicate that both 6.22e22 and 6.22E22 are legal.
> Formulations [31] and [32] would indicate that the only legal representation
> of infinity is INF and of not an number is NaN (which is inconsistent with
> section 3.2.4.1 which states NAN)

I missed the formulation section when writing that question (oops). 
Thanks for pointing it out!

So which is correct, then, NaN or NAN?  I'm guessing NaN.

 
> > > One last question:  For the binary datatype, what is
> > > the default
> > > encoding?
> > >
> 
> Someone should put a default in or make the encoding a required attribute.

I don't think that it can be made required:

<element name="bindata" type="binary"/>

is legal, and since type might be something other than binary, requiring
encoding would be useless.  Since encoding would be specified doing
something like:

<datatype name="base64binary" source="binary">
   <encoding value="base64"/>
</datatype>

<element name="bindata" type="base64binary"/>

There really ought to be a default encoding so that simply declaring an
element of type binary is meaningful.  I would suggest hex as the
default, since most binary data in XML is more than likely going to be
short and HEX is closer to being human-readable than Base64.

> 
> Now for my question, what is the appropriate way to disallow NaN from a
> datatype without constraining the dataspace.  I'd try the following, but
> maybe should be an explicit solution (or if this one is okay it should be
> added in the draft or some errata)
> 
> <datatype name="noNaN" source="double">
>     <minInclusive value="-INF"/>
> </datatype>

I don't really like the idea that specifying a range eliminates NaN
completely.  As stated above, NaN may be desirable even in a constrained
range.  I would suggest instead another facet which specifies whether or
not NaN can be included, or using the <and> <or> <not> sections which
you suggested above.

- Dan Potter
Received on Thursday, 6 January 2000 10:11:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 6 December 2009 18:12:46 GMT