What are canonical lexical representations for?

_Part 2: Datatypes_ defines canonical lexical representations for
most of the built-in simple types, but their use is unclear.  I'd
like to see some amplification on this point in 1.1.

Trolling through the archives, I find a suggestion that
canonicalization is useful in the context of signed XML, when
intermediate parties in a transaction might replace one lexical
representation with a different but equivalent one, and it is
desired that this not invalidate the signature.  This is a
worthwhile goal, but it seems impossible to canonicalize a
document without special knowledge of every type in the document.

For a silly example, consider the type
	<simpleType name='onTheHour'>
	  <restriction base='dateTime'>
	    <pattern value='.*T..:00.*'/>
	  </restriction>
	</simpleType>
which requires the minute field of its values to be zero.
Canonicalizing values of this type in general is impossible
without special knowledge of the type: an algorithm for
canonicalizing dateTimes in general cannot be used since
conversion of an onTheHour value to UTC might change the minutes
field and make the result invalid for onTheHour.

So, if canonical lexical representations cannot be used by a
generic processor to canonicalize a document, then what are they
for?  Only the processors with special knowledge?

While I'm at it, why isn't canonical form a facet of the type?

Incidentally, the above example, silly as it is, illustrates an
important respect in which values of a type derived by restriction
cannot be treated by a generic processor as values of the base
type.  It is a bit surprising that there are any such respects at
all (if, like me, you are coming from an object-oriented view of
"type"); I think this point deserves some commentary in 1.1.

-- 
Steven Taschuk           |  o- @
staschuk@telusplanet.net | 7O   )
                          |  "  (    Hummingbird 

Received on Tuesday, 24 December 2002 14:15:05 UTC