Re: XML Schema and the necessity for canonical representations

I brought this up at XML Syntax-WG. The XML Schema WG co-chair (Michael
Sperberg-McQueen) committed to pointing this out to the editors and
returning a response to you and the w3c-ietf-xmldsig@w3.org list.

At 05:13 PM 5/21/99 -0400, Joseph M. Reagle Jr. wrote:
 >
 >Not sure who is ultimately responsible for canonicalizing the schema bits,
but some thoughts to consider... (Perhaps a brief agenda item for next call?)
 >
 >Forwarded Text ----  
 >From: dee3@us.ibm.com  
 >To: www-xml-schema-comments@w3.org, 
 >"XML-DSig Workshop" <w3c-xml-sig-ws@w3.org>  
 >Date: Fri, 21 May 1999 16:50:46 -0400  
 >Subject: XML Schema and the necessity for canonical representations
Status:   
 >
 > Having a canonical form of an entity is very important for comparison and
 digital signature purposes.
 >
 > XML is sufficiently rich that canonicalization needs to be considered at
several  levels.  For example, the character set used in two XML documents
needs to be  converted to a standard if they are to be usefully compared for
many purposes.  There are also canonicalization considerations related to
white space, namespace  prefixes, etc, which are being considered by the XML
Syntax WG.  Similarly, I  believe that canonicalization of datatype
representation must be considered and  the schema WG seems like the place to
do it.
 >
 > I think the need for datatype's to have a designated canonical lexical
form  should be fairly clear for comparison purposes.  It relieves the
comparitor from  the burden of having to be able to parse every form of
every datatype and covert  it to a canonical form the comparitor has
selected.
 >
 > The need may not be as immediately obvious in the digital signature
arena,  depending on your mental picture of the "typical" digital signature
application.  If you picture is very document/object oriented, you might
wonder what all the  fuss is about since any lump of bits can be signed and,
if faithfully  transmitted, this signature can be verified later on the same
lump of bits.  On  the other hand, if you have a transactional/protocol
point of view, where pieces  of messages are being signed, data is processed
and forwarded by intermediate  parties, and the signature verified by later
recipients, etc., canonicalization  is essential.
 >
 > I have been involved with too many systems where people thought that all
they  were doing was verifying signatures on unchanged data being sent
through  multi-party but faithful transmission channels only to find that
there was some  circumstance where a signed object had to be partly or fully
re-constituted or  some transmission channel was not as faithful as they
thought.  As a result,  some incredibly stupid thing like capitalization,
padding, line ending character  sequences, etc., etc., at least temporarily
derailed their entire effort as, on  a crash basis, they designed and
painfully retrofitted canonicalization into  their system.  Also witness the
diddly little lack of canonicalization in the  original ASN.1 time and date
format: As soon as there was substantial real world  use of this, a new,
almost identical, fundamental data type, had to be added to  ASN.1, with
significant disruption and confusion, just to squeeze out the last  case of
alternative representations of the same date and time.
 >
 > There is no problem with the Schema Datatypes document providing multiple
 lexical representations as long as exactly one form is designated as the
canonical form.
 >
 > I believe that the XML Schema Datatypes document should be changed to do
this  and perhaps this should be added to the XML Schema requirements
document.
 >
 > Thanks,  Donald
 >
 >  End Forwarded Text ---- 
 >
_________________________________________________________
Joseph Reagle Jr.   
Policy Anylyst      mailto:reagle@w3.org
XML-DSig Co-Chair   http://w3.org/People/Reagle/

Received on Wednesday, 26 May 1999 14:11:17 UTC