- From: <dee3@us.ibm.com>
- Date: Mon, 20 Dec 1999 16:57:15 -0500
- To: w3c-ietf-xmldsig@w3.org
Since my presentation on canonicalization at the Wasington IETF meeting was fairly well received, I though I would write it up with some more detail as a proposed section of the Sytax and Processing draft. The material I've written is below. My current feeling is that this should be a new top level section although there are other places it could go... Thanks, Donald Donald E. Eastlake, 3rd IBM, 17 Skyline Drive, Hawthorne, NY 10532 USA dee3@us.ibm.com tel: 1-914-784-7913, fax: 1-914-784-3833 home: 65 Shindegan Hill Road, RR#1, Carmel, NY 10512 USA dee3@torque.pothole.com tel: 1-914-276-2668 X.0 XML Canonicalization and Syntax Constraint Considerations Digital signatures only work if the verification calculations are performed on exactly the same bits as the signing calculations. If the surface representation of the signed data can change between signing and verification, then some way to standardize the changeable aspect must be used before signing and verification. For example, even with something as simple as ASCII text, there are at least three different line ending sequences in wide use. If it is possible for signed text to be modified from one line ending convention to another between the time of signing and signature verification, then the line endings need to be canonicalized to a standard form before signing and verification or signatures will break. XML is subject to surface representation changes and to processing which discards some surface information in typical applications. For this reason, XML digital signatures have provision for indicating canonicalization methods in the signature so that a verifier can use the same canonicalization before its verification calculations as was used by the signer. It is useful to distinguish the Signature element from separate signed XML items. It is possible for an isolated XML document to be treated as if it were binary data so that no changes can occur. In that case, the digest of the document will not change and it need not be canonicalized if it is signed and verified as data. On the other hand, XML which is read and processed using standard XML parsing and processing techniques is thereby changed so that some of its surface representation information is lost or modified. In particular, this will occur in many cases for the Signature and enclosed SignedInfo elements since they, and possibly an encompassing XML document, will be processed as XML. Similarly, these considerations apply to Manifest, Package, Object, and SignatureProperties elements if those elements have been digested, their DigestValue is to be checked, and they are being processed as XML. The kinds of changes in XML which may need to be canonicalized can be divided into three categories. There are those related to the basic XML 1.0 standard, as described in X.1 below. There are those related to DOM, SAX, or similar processing and the like as described in X.2 below. And, third, there is the possibility of character set conversion, such as between UTF-8 and UTF-16, both of which all XML standards compliant processors are required to support. Any canonicalization algorithm should yield output in a specific fixed character set. For both the minimal canonicalization defined in this document and the W3C standard XML canonicalization, that character set is UTF-8. X.1 XML 1.0, Syntax Constraints, and Canonicalization The XML 1.0 Standard defines an interface where a conformant application reading XML is given certain information from that XML and not other information. In particular, (1) line endings are normalized to the single character #xA by dropping #xD characters if they are immediately followed by a #xA and replacing them with #xA in all other cases, (2) missing attributes declared to have default values are provided to the application as if present with the default value, (3) character references are replaced with the corresponding character, (4) entity references are replaced with the corresponding declared entity, (5) attribute values are normalized by (5A) replacing character and entity references as above, (5B) replacing occurrences of #x9, #xA, and #xD with #x20 (space) except that the sequence #xD#xA is replaced by a single space, and (5C) if the attribute is not declared to be CDATA, stripping all leading and trailing spaces and replacing all interior runs of spaces with a single space, and (6) for elements declared to have element content, eliminate white space that appears within their content but not within the content of any enclosed element. Note that items (2), (4), (5C), and (6) depend on specific Schema, DTD, or similar declarations. In the general case, such declarations will not be available to or used by the signature verifier. Thus, for interoperability, it is RECOMMENDED that the following syntax constraints be observed when generating any material to be signed and processed as XML, such as the SignedInfo element: (1) attributes having default values be explicitly present, (2) all entity references (except "amp", "lt", "gt", "apos", and "quot" which are pre-defined) be expanded, (3) attribute value white space be normalized, and (4) insignificant white space not be generated within elements having element content. X.2 DOM/SAX Processing and Canonicalization In addition to the canonicalization and syntax constrains discussed above, most XML applications use the DOM standard or SAX interface for XML input. DOM maps XML into a tree structure of nodes and typically assumes it will be used on an entire document with subsequent processing being done on this tree. SAX converts XML into a series of events such as a start tag, text, etc. In either case, many surface characteristics such as the ordering of attributes and insignificant white space within start/end tags is lost. In addition, namespace declarations are mapped over the nodes to which they apply, losing the namespace prefixes in the source text and, in most cases, losing the information as to exactly where namespace declarations appeared in the original. If an XML digital signature is to be produced or verified on a system using the common DOM or SAX processing, the need is actually for a canonical method to serialize the relevant part of a DOM tree or relevant sequence of SAX events. XML canonicalization specifications, such as the W3C standard, are based only on information which is preserved by DOM and SAX. For an XML digital signature to be verifiable by an implementation using DOM or SAX, not only must the syntax constraints given in X.1 be followed but an appropriate XML canonicalization must be specified so that the verifier can re-serialize DOM/SAX mediated input into the same byte sequence that was signed.
Received on Monday, 20 December 1999 17:02:23 UTC