- From: Donald E. Eastlake 3rd <dee3@torque.pothole.com>
- Date: Tue, 21 Dec 1999 08:08:36 -0500
- To: jean-luc.champion@central-europe.basf.org
- cc: w3c-ietf-xmldsig@w3.org, dee3@torque.pothole.com
Canonicalization is approximately equal to standardization or normalization. It mean to put into a canonical or standard or normal form. In computer science, it means to express the internal semantic in a unique "canonical" external form. For example, you might choose the ANSI C octal format with a sign always present as the canonical form for integers. Then you would caonicalize 69, 0x105, and 085, +69 etc into the canonical form +085. Donald From: jean-luc.champion@central-europe.basf.org Resent-Date: Tue, 21 Dec 1999 06:25:15 -0500 (EST) Resent-Message-Id: <199912211125.GAA00596@www19.w3.org> X-Lotus-FromDomain: EUROPE To: w3c-ietf-xmldsig@w3.org Message-ID: <C125684E.003D46DF.00@europe-gw01.bcs.de> Date: Tue, 21 Dec 1999 12:08:36 +0100 >As I'm an European French speaker and because I didn't found any reference to >this word in my Harrap's dictionnary, >could you be so kind to clarify what "canonicalization" should mean in this area >? > >Many thanks in advance, > >Jean-Luc Champion >EDI Project Leader >BASF Computer Services s.a. >Belgium > > > > > >dee3@us.ibm.com on 20/12/99 22:57:15 > >To: w3c-ietf-xmldsig@w3.org >cc: (bcc: Jean-Luc Champion/CENTRAL-EUROPE/BASF) >Subject: XML Canonicalization and Syntax Constraint Considerations > > > > >Since my presentation on canonicalization at the Wasington IETF meeting was >fairly well received, I though I would write it up with some more detail as >a proposed section of the Sytax and Processing draft. The material I've >written is below. My current feeling is that this should be a new top >level section although there are other places it could go... > >Thanks, >Donald > >Donald E. Eastlake, 3rd >IBM, 17 Skyline Drive, Hawthorne, NY 10532 USA >dee3@us.ibm.com tel: 1-914-784-7913, fax: 1-914-784-3833 > >home: 65 Shindegan Hill Road, RR#1, Carmel, NY 10512 USA >dee3@torque.pothole.com tel: 1-914-276-2668 > > >X.0 XML Canonicalization and Syntax Constraint Considerations > >Digital signatures only work if the verification calculations are performed >on exactly the same bits as the signing calculations. If the surface >representation of the signed data can change between signing and >verification, then some way to standardize the changeable aspect must be >used before signing and verification. For example, even with something as >simple as ASCII text, there are at least three different line ending >sequences in wide use. If it is possible for signed text to be modified >from one line ending convention to another between the time of signing and >signature verification, then the line endings need to be canonicalized to a >standard form before signing and verification or signatures will break. > >XML is subject to surface representation changes and to processing which >discards some surface information in typical applications. For this >reason, XML digital signatures have provision for indicating >canonicalization methods in the signature so that a verifier can use the >same canonicalization before its verification calculations as was used by >the signer. > >It is useful to distinguish the Signature element from separate signed XML >items. It is possible for an isolated XML document to be treated as if it >were binary data so that no changes can occur. In that case, the digest of >the document will not change and it need not be canonicalized if it is >signed and verified as data. On the other hand, XML which is read and >processed using standard XML parsing and processing techniques is thereby >changed so that some of its surface representation information is lost or >modified. In particular, this will occur in many cases for the Signature >and enclosed SignedInfo elements since they, and possibly an encompassing >XML document, will be processed as XML. > >Similarly, these considerations apply to Manifest, Package, Object, and >SignatureProperties elements if those elements have been digested, their >DigestValue is to be checked, and they are being processed as XML. > >The kinds of changes in XML which may need to be canonicalized can be >divided into three categories. There are those related to the basic XML >1.0 standard, as described in X.1 below. There are those related to DOM, >SAX, or similar processing and the like as described in X.2 below. And, >third, there is the possibility of character set conversion, such as >between UTF-8 and UTF-16, both of which all XML standards compliant >processors are required to support. Any canonicalization algorithm should >yield output in a specific fixed character set. For both the minimal >canonicalization defined in this document and the W3C standard XML >canonicalization, that character set is UTF-8. > >X.1 XML 1.0, Syntax Constraints, and Canonicalization > >The XML 1.0 Standard defines an interface where a conformant application >reading XML is given certain information from that XML and not other >information. In particular, (1) line endings are normalized to the single >character #xA by dropping #xD characters if they are immediately followed >by a #xA and replacing them with #xA in all other cases, (2) missing >attributes declared to have default values are provided to the application >as if present with the default value, (3) character references are replaced >with the corresponding character, (4) entity references are replaced with >the corresponding declared entity, (5) attribute values are normalized by >(5A) replacing character and entity references as above, (5B) replacing >occurrences of #x9, #xA, and #xD with #x20 (space) except that the sequence >#xD#xA is replaced by a single space, and (5C) if the attribute is not >declared to be CDATA, stripping all leading and trailing spaces and >replacing all interior runs of spaces with a single space, and (6) for >elements declared to have element content, eliminate white space that >appears within their content but not within the content of any enclosed >element. > >Note that items (2), (4), (5C), and (6) depend on specific Schema, DTD, or >similar declarations. In the general case, such declarations will not be >available to or used by the signature verifier. Thus, for >interoperability, it is RECOMMENDED that the following syntax constraints >be observed when generating any material to be signed and processed as XML, >such as the SignedInfo element: (1) attributes having default values be >explicitly present, (2) all entity references (except "amp", "lt", "gt", >"apos", and "quot" which are pre-defined) be expanded, (3) attribute value >white space be normalized, and (4) insignificant white space not be >generated within elements having element content. > >X.2 DOM/SAX Processing and Canonicalization > >In addition to the canonicalization and syntax constrains discussed above, >most XML applications use the DOM standard or SAX interface for XML input. >DOM maps XML into a tree structure of nodes and typically assumes it will >be used on an entire document with subsequent processing being done on this >tree. SAX converts XML into a series of events such as a start tag, text, >etc. In either case, many surface characteristics such as the ordering of >attributes and insignificant white space within start/end tags is lost. In >addition, namespace declarations are mapped over the nodes to which they >apply, losing the namespace prefixes in the source text and, in most cases, >losing the information as to exactly where namespace declarations appeared >in the original. > >If an XML digital signature is to be produced or verified on a system using >the common DOM or SAX processing, the need is actually for a canonical >method to serialize the relevant part of a DOM tree or relevant sequence of >SAX events. XML canonicalization specifications, such as the W3C standard, >are based only on information which is preserved by DOM and SAX. For an >XML digital signature to be verifiable by an implementation using DOM or >SAX, not only must the syntax constraints given in X.1 be followed but an >appropriate XML canonicalization must be specified so that the verifier can >re-serialize DOM/SAX mediated input into the same byte sequence that was >signed. > > > > > >
Received on Tuesday, 21 December 1999 08:10:04 UTC