A critique of constraining facets

Hello again,

proceeding in document order, here is a critique of constraining facets as
defined in the March 16th, 2001 version of XML Schema Part 2: Datatypes.
This time, only one off-topic suggestion is mixed in.


4.3 Constraining facets

A simple type is a set of literals, a set of values and a map from literals
to values. Not all facets are by nature constraints on the value space.
pattern primarily acts on the lexical space. Its link to the value space is
the map. The whitespace facet primarily acts on the map itself, also
modifying the value space indirectly. This should be stated for clarity.


4.3.1 length

length is a shorthand for minLength and maxLength of the same value. It
should properly be regarded as an XML representation, not an independent
concept. This would simplify ineritance and shorten many lengthy phrases
throughout the standard.


4.3.4 pattern

As stated above, this facet primarily acts on the lexical space. To
illustrate the problem of arguing in value space, consider this: If the
pattern acts on the value space, why is the interaction between patterns and
bounds never checked for numbers?

Does it make sense for pattern to operate on lexical space for strings when
whitespace transformations are in effect? Probably not, but patterns on
base64 or hex representations of binary aren't intuitive, either. A case
could be made for treating types derived from string differently --- their
value space is always a subset of their lexical space.


4.3.6 whiteSpace

As stated above, this facet primarily acts on the map.

The set of legal literals remains invariant, but the value space is
constrained as one progresses from preserve to replace to collapse. This
clarifies the whiteSpace valid restriction: derivation by restriction
prohibits an extension of the value space.

The question of admitting map modifications is an interesting one in itself.
In former versions of the standard, an encoding facet used to control the
map for binary types. The current version displays a tendency to disallow
radical map modifications by providing seperate hexBinary and base64Binary
primitives.

IMHO binary data should not be subjected to the constraint of human
readability. In the presence of more compact base64 encoding, hex encoding
should thus be pruned from the type hierarchy. There are too many primitive
types.


4.3.7 maxInclusive
4.3.8 maxExclusive

These should probably be regarded as XML representations, not independent
concepts. On the information set level, they can be collapsed into a single
maxBound facet with a boolean flag to indicate exclusion (For types of
finite precision, normalization to inclusive bounds is easy). This would
reduce the overall number of facets, simplify inheritance and shorten many
phrases in the standards document.

On the other hand, why not collapse them on the XML level? A single
max(Bound) element with optional boolean attribute exclusive defaulting to
false makes the restrictions on using inclusive and exclusive bound variants
simultaneously obsolete.


4.3.9 minInclusive
4.3.10 minExclusive

Same as above.


4.3.11 totalDigits
4.3.12 fractionDigits

Every value has countably infinite digits. Most of them are zero. I see no
provision on trailing zeros here.

If I recall correctly, this pair was once named precision and scale and
operated in lexical space. A case can be made it still belongs there.

---
Markus L. Noga                 IPD Goos                Universität Karlsruhe
noga@ipd.info.uni-karlsruhe.de

Received on Wednesday, 21 March 2001 12:44:39 UTC