Re: XML Canonicalization and Syntax Constraint Considerations

Canonicalization is approximately equal to standardization or
normalization.  It mean to put into a canonical or standard or normal
form.  In computer science, it means to express the internal semantic
in a unique "canonical" external form.  For example, you might choose
the ANSI C octal format with a sign always present as the canonical
form for integers.  Then you would caonicalize 69, 0x105, and 085, +69
etc into the canonical form +085.

Donald

From:  jean-luc.champion@central-europe.basf.org
Resent-Date:  Tue, 21 Dec 1999 06:25:15 -0500 (EST)
Resent-Message-Id:  <199912211125.GAA00596@www19.w3.org>
X-Lotus-FromDomain:  EUROPE
To:  w3c-ietf-xmldsig@w3.org
Message-ID:  <C125684E.003D46DF.00@europe-gw01.bcs.de>
Date:  Tue, 21 Dec 1999 12:08:36 +0100

>As I'm an European French speaker and because I didn't found any reference to
>this word in my Harrap's dictionnary,
>could you be so kind to clarify what "canonicalization" should mean in this area
>?
>
>Many thanks in advance,
>
>Jean-Luc Champion
>EDI Project Leader
>BASF Computer Services s.a.
>Belgium
>
>
>
>
>
>dee3@us.ibm.com on 20/12/99 22:57:15
>
>To:   w3c-ietf-xmldsig@w3.org
>cc:    (bcc: Jean-Luc Champion/CENTRAL-EUROPE/BASF)
>Subject:  XML Canonicalization and Syntax Constraint Considerations
>
>
>
>
>Since my presentation on canonicalization at the Wasington IETF meeting was
>fairly well received, I though I would write it up with some more detail as
>a proposed section of the Sytax and Processing draft.  The material I've
>written is below.  My current feeling is that this should be a new top
>level section although there are other places it could go...
>
>Thanks,
>Donald
>
>Donald E. Eastlake, 3rd
>IBM, 17 Skyline Drive, Hawthorne, NY 10532 USA
>dee3@us.ibm.com   tel: 1-914-784-7913, fax: 1-914-784-3833
>
>home: 65 Shindegan Hill Road, RR#1, Carmel, NY 10512 USA
>dee3@torque.pothole.com   tel: 1-914-276-2668
>
>
>X.0  XML Canonicalization and Syntax Constraint Considerations
>
>Digital signatures only work if the verification calculations are performed
>on exactly the same bits as the signing calculations.  If the surface
>representation of the signed data can change between signing and
>verification, then some way to standardize the changeable aspect must be
>used before signing and verification.  For example, even with something as
>simple as ASCII text, there are at least three different line ending
>sequences in wide use.  If it is possible for signed text to be modified
>from one line ending convention to another between the time of signing and
>signature verification, then the line endings need to be canonicalized to a
>standard form before signing and verification or signatures will break.
>
>XML is subject to surface representation changes and to processing which
>discards some surface information in typical applications.  For this
>reason, XML digital signatures have provision for indicating
>canonicalization methods in the signature so that a verifier can use the
>same canonicalization before its verification calculations as was used by
>the signer.
>
>It is useful to distinguish the Signature element from separate signed XML
>items.  It is possible for an isolated XML document to be treated as if it
>were binary data so that no changes can occur.  In that case, the digest of
>the document will not change and it need not be canonicalized if it is
>signed and verified as data.  On the other hand, XML which is read and
>processed using standard XML parsing and processing techniques is thereby
>changed so that some of its surface representation information is lost or
>modified.  In particular, this will occur in many cases for the Signature
>and enclosed SignedInfo elements since they, and possibly an encompassing
>XML document, will be processed as XML.
>
>Similarly, these considerations apply to Manifest, Package, Object, and
>SignatureProperties elements if those elements have been digested, their
>DigestValue is to be checked, and they are being processed as XML.
>
>The kinds of changes in XML which may need to be canonicalized can be
>divided into three categories.  There are those related to the basic XML
>1.0 standard, as described in X.1 below.  There are those related to DOM,
>SAX, or similar processing and the like as described in X.2 below.  And,
>third, there is the possibility of character set conversion, such as
>between UTF-8 and UTF-16, both of which all XML standards compliant
>processors are required to support. Any canonicalization algorithm should
>yield output in a specific fixed character set.  For both the minimal
>canonicalization defined in this document and the W3C standard XML
>canonicalization, that character set is UTF-8.
>
>X.1 XML 1.0, Syntax Constraints, and Canonicalization
>
>The XML 1.0 Standard defines an interface where a conformant application
>reading XML is given certain information from that XML and not other
>information.  In particular, (1) line endings are normalized to the single
>character #xA by dropping #xD characters if they are immediately followed
>by a #xA and replacing them with #xA in all other cases, (2) missing
>attributes declared to have default values are provided to the application
>as if present with the default value, (3) character references are replaced
>with the corresponding character, (4) entity references are replaced with
>the corresponding declared entity, (5) attribute values are normalized by
>(5A) replacing character and entity references as above, (5B) replacing
>occurrences of #x9, #xA, and #xD with #x20 (space) except that the sequence
>#xD#xA is replaced by a single space, and (5C) if the attribute is not
>declared to be CDATA, stripping all leading and trailing spaces and
>replacing all interior runs of spaces with a single space, and (6) for
>elements declared to have element content, eliminate white space that
>appears within their content but not within the content of any enclosed
>element.
>
>Note that items (2), (4), (5C), and (6) depend on specific Schema, DTD, or
>similar declarations. In the general case, such declarations will not be
>available to or used by the signature verifier.  Thus, for
>interoperability, it is RECOMMENDED that the following syntax constraints
>be observed when generating any material to be signed and processed as XML,
>such as the SignedInfo element: (1) attributes having default values be
>explicitly present, (2) all entity references (except "amp", "lt", "gt",
>"apos", and "quot" which are pre-defined) be expanded, (3) attribute value
>white space be normalized, and (4) insignificant white space not be
>generated within elements having element content.
>
>X.2 DOM/SAX Processing and Canonicalization
>
>In addition to the canonicalization and syntax constrains discussed above,
>most XML applications use the DOM standard or SAX interface for XML input.
>DOM maps XML into a tree structure of nodes and typically assumes it will
>be used on an entire document with subsequent processing being done on this
>tree.  SAX converts XML into a series of events such as a start tag, text,
>etc.  In either case, many surface characteristics such as the ordering of
>attributes and insignificant white space within start/end tags is lost.  In
>addition, namespace declarations are mapped over the nodes to which they
>apply, losing the namespace prefixes in the source text and, in most cases,
>losing the information as to exactly where namespace declarations appeared
>in the original.
>
>If an XML digital signature is to be produced or verified on a system using
>the common DOM or SAX processing, the need is actually for a canonical
>method to serialize the relevant part of a DOM tree or relevant sequence of
>SAX events.  XML canonicalization specifications, such as the W3C standard,
>are based only on information which is preserved by DOM and SAX.  For an
>XML digital signature to be verifiable by an implementation using DOM or
>SAX, not only must the syntax constraints given in X.1 be followed but an
>appropriate XML canonicalization must be specified so that the verifier can
>re-serialize DOM/SAX mediated input into the same byte sequence that was
>signed.
>
>
>
>
>
>

Received on Tuesday, 21 December 1999 08:10:04 UTC