Ambiguous values. from Kevin Yancey on 2001-07-18 (xmlschema-dev@w3.org from August 2001)

From: Kevin Yancey <kevinyancey@hotmail.com>
Date: Wed, 18 Jul 2001 12:12:28 -0400 (EDT)
To: <xmlschema-dev@w3.org>
Message-ID: <OE7IcVux1Zcn8TtEvzo00000279@hotmail.com>

I've been working on writing a schema parser and ran into a problem that doesn't seem to have been
addressed in the XML Schema specification documents. There are certain situations in parsing an XML
Schema (or any XML document for that matter) where the meaning of a lexical value in an XML Schema
would be ambiguous. To illistrate what I mean, I'll give an example:

Say that a given schema defines a simple type called "unionType" that unions the built-in types "decimal"
and "string" (the usefullness of such a type may seem insignificant, but nothing I see in the specification
prohibits it). Then, lets say that an attribute declared in the schema has unionType as its type and also has
a fixed value defined for it of "22". The value "22" could mean the string "22" or the number 22. The difference
between the two becomes relevant if this attribute declaration is applied to an attribute with a value of "+22".
"+22" is the same as "22" numerically, but not the same as a string. Therefore, the attribute could be valid
with respect to the schema, depending on how the schema's ambigous value is interpreted.

One resolution to this problem might be to simply disallow unions whose members have overlapping lexical
spaces. This is easy when dealing with simple types such as string and decimal, but if the member types
we derived from them, it would be difficult to impossible to detect with certainty such union types with all
the facets there are to take into account.

For validation purposes, the distinction between values in the document being validated becomes mute, since
"48" is still a valid decimal value, whether the writer of the doucment meant the string "48". The validator only
cares if the given lexical value is valid for the specified type. The distinction becomes critical, however, when
interpreting the schema itself, for reasons given in the example above. Similar problems arise with the use of
the enumeration facet as well.

So, in a nut shell, my question is, how is an schema parser to tell the difference between values when
encountering ambiguous values such as is the case in the example above?

Any comments are welcome,
Thanks,

Kevin P. Yancey
Balance Wheel Technologies Inc.

Received on Wednesday, 22 August 2001 05:41:58 UTC