4.3.1 The CanonicalizationMethod Element

CanonicalizationMethod is a required element that specifies the canonicalization algorithm applied to the SignedInfo element prior to performing signature calculations. This element uses the general structure for algorithms described in Algorithm Identifiers and Implementation Requirements (section 6.1). Implementations MUST support the REQUIRED canonicalization algorithms.

Alternatives to the REQUIRED canonicalization algorithms (section 6.5), such as Canonical XML with Comments (section 6.5.1) or a minimal canonicalization (such as CRLF and charset normalization), may be explicitly specified but are NOT REQUIRED. Consequently, their use may not interoperate with other applications that do not support the specified algorithm (see XML Canonicalization and Syntax Constraint Considerations, section 7). Security issues may also arise in the treatment of entity processing and comments if non-XML aware canonicalization algorithms are not properly constrained (see section 8.2: Only What is "Seen" Should be Signed).

The way in which the SignedInfo element is presented to the canonicalization method is dependent on that method. The following applies to algorithms which process XML as nodes or characters:

We recommend that resource constrained applications that do not implement XML based canonicalization and instead choose a text based canonicalization be implemented to generate canonicalized XML as their output serialization so as to mitigate interoperability and security concerns. For instance, such an implementation SHOULD (at least) generate standalone XML instances [XML].

NOTE: The signature application must exercise great care in accepting and executing an arbitrary CanonicalizationMethod. For example, the canonicalization method could rewrite the URIs of the References being validated. Or, the method could massively transform SignedInfo so that validation would always succeed (i.e., converting it to a trivial signature with a known key over trivial data). Since CanonicalizationMethod is inside SignedInfo, in the resulting canonical form it could erase itself from SignedInfo or modify the SignedInfo element so that it appears that a different canonicalization function was used! Thus a Signature which appears to authenticate the desired data with the desired key, DigestMethod, and SignatureMethod, can be meaningless if a capricious CanonicalizationMethod is used.

   Schema Definition:

   <element name="CanonicalizationMethod" type="ds:CanonicalizationMethodType"/> 
   <complexType name="CanonicalizationMethodType" mixed="true">
     <sequence>
       <any namespace="##any" minOccurs="0" maxOccurs="unbounded"/>
       <!-- (0,unbounded) elements from (1,1) namespace -->
     </sequence>
     <attribute name="Algorithm" type="anyURI" use="required"/> 
   </complexType>
   DTD:

   <!ELEMENT CanonicalizationMethod (#PCDATA %Method.ANY;)* > 
   <!ATTLIST CanonicalizationMethod 
    Algorithm CDATA #REQUIRED >

4.3.3.2 The Reference Processing Model

Note: XPath is RECOMMENDED. Signature applications need not conform to [XPath] specification in order to conform to this specification. However, the XPath data model, definitions (e.g., node-sets) and syntax is used within this document in order to describe functionality for those that want to process XML-as-XML (instead of octets) as part of signature generation. For those that want to use these features, a conformant [XPath] implementation is one way to implement these features, but it is not required. Such applications could use a sufficiently functional replacement to a node-set and implement only those XPath expression behaviors REQUIRED by this specification. However, for simplicity we generally will use XPath terminology without including this qualification on every point. Requirements over "XPath nodesets" can include a node-set functional equivalent. Requirements over XPath processing can include application behaviors that are equivalent to the corresponding XPath behavior.

The data-type of the result of URI dereferencing or subsequent Transforms is either an octet stream or an XPath node-set.

The Transforms specified in this document are defined with respect to the input they require. The following is the default signature application behavior:

Users may specify alternative transforms that override these defaults in transitions between Transforms that expect different inputs. The final octet stream contains the data octets being secured. The digest algorithm specified by DigestMethod is then applied to these data octets, resulting in the DigestValue.

Unless the URI-Reference is a 'same-document' reference as defined in [URI, Section 4.2], the result of dereferencing the URI-Reference MUST be an octet stream. In particular, an XML document identified by URI is not parsed by the signature application unless the URI is a same-document reference or unless a transform that requires XML parsing is applied (See Transforms (section 4.3.3.1).)

When a fragment is preceded by an absolute or relative URI in the URI-Reference, the meaning of the fragment is defined by the resource's MIME type. Even for XML documents, URI dereferencing (including the fragment processing) might be done for the signature application by a proxy. Therefore, reference validation might fail if fragment processing is not performed in a standard way (as defined in the following section for same-document references). Consequently, we RECOMMEND that the URI  attribute not include fragment identifiers and that such processing be specified as an additional XPath Transform.

When a fragment is not preceded by a URI in the URI-Reference, XML signature applications MUST support the null URI and barename XPointer. We RECOMMEND support for the same-document XPointers '#xpointer(/)' and '#xpointer(id('ID'))' if the application also intends to support any canonicalization that preserves comments. (Otherwise URI="#foo" will automatically remove comments before the canonicalization can even be invoked.) All other support for XPointers is OPTIONAL, especially all support for barename and other XPointers in external resources since the application may not have control over how the fragment is generated (leading to interoperability problems and validation failures).

The following examples demonstrate what the URI attribute identifies and how it is dereferenced:

URI="http://example.com/bar.xml"
Identifies the octets that represent the external resource 'http//example.com/bar.xml', that is probably XML document given its file extension.
URI="http://example.com/bar.xml#chapter1"
Identifies the element with ID attribute value 'chapter1' of the external XML resource 'http://example.com/bar.xml', provided as an octet stream. Again, for the sake of interoperability, the element identified as 'chapter1' should be obtained using an XPath transform rather than a URI fragment (barename XPointer resolution in external resources is not REQUIRED in this specification).
URI=""
Identifies the nodeset (minus any comment nodes) of the XML resource containing the signature
URI="#chapter1"
Identifies a nodeset containing the element with ID attribute value 'chapter1' of the XML resource containing the signature. XML Signature (and its applications) modify this nodeset to include the element plus all descendents including namespaces and attributes -- but not comments.

6.1 Algorithm Identifiers and Implementation Requirements

Algorithms are identified by URIs that appear as an attribute to the element that identifies the algorithms' role (DigestMethod, Transform, SignatureMethod, or CanonicalizationMethod). All algorithms used herein take parameters but in many cases the parameters are implicit. For example, a SignatureMethod is implicitly given two parameters: the keying info and the output of CanonicalizationMethod. Explicit additional parameters to an algorithm appear as content elements within the algorithm role element. Such parameter elements have a descriptive element name, which is frequently algorithm specific, and MUST be in the XML Signature namespace or an algorithm specific namespace.

This specification defines a set of algorithms, their URIs, and requirements for implementation. Requirements are specified over implementation, not over requirements for signature use. Furthermore, the mechanism is extensible; alternative algorithms may be used by signature applications.

Digest
  1. Required SHA1
    http://www.w3.org/2000/09/xmldsig#sha1
Encoding
  1. Required base64
    http://www.w3.org/2000/09/xmldsig#base64
MAC
  1. Required HMAC-SHA1
    http://www.w3.org/2000/09/xmldsig#hmac-sha1
Signature
  1. Required DSAwithSHA1 (DSS)
    http://www.w3.org/2000/09/xmldsig#dsa-sha1
  2. Recommended RSAwithSHA1
    http://www.w3.org/2000/09/xmldsig#rsa-sha1
Canonicalization
  1. Required Canonical XML (omits comments)
    http://www.w3.org/TR/2001/REC-xml-c14n-20010315
  2. Recommended Canonical XML with Comments
    http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments
  3. Required Exclusive Canonicalization (omits comments)
    http://www.w3.org/2000/09/xmldsig#excludeC14N
  4. Recommended Exclusive Canonicalization with Comments
    http:http://www.w3.org/2000/09/xmldsig#excludeC14NwithComments
Transform
  1. Optional XSLT
    http://www.w3.org/TR/1999/REC-xslt-19991116
  2. Recommended XPath
    http://www.w3.org/TR/1999/REC-xpath-19991116
  3. Required Enveloped Signature*
    http://www.w3.org/2000/09/xmldsig#enveloped-signature

* The Enveloped Signature transform removes the Signature element from the calculation of the signature when the signature is within the content that it is being signed. This MAY be implemented via the RECOMMENDED XPath specification specified in 6.6.4: Enveloped Signature Transform; it MUST have the same effect as that specified by the XPath Transform.


6.5 Canonicalization Algorithms

If canonicalization is performed over octets, the canonicalization algorithms take two implicit parameters: the content and its charset. The charset is derived according to the rules of the transport protocols and media types (e.g, RFC2376 [XML-MT] defines the media types for XML). This information is necessary to correctly sign and verify documents and often requires careful server side configuration.

Various canonicalization algorithms require conversion to [UTF-8]. The four algorithms below understand at least [UTF-8] and [UTF-16] as input encodings. We RECOMMEND that externally specified algorithms do the same. Knowledge of other encodings is OPTIONAL.

Various canonicalization algorithms transcode from a non-Unicode encoding to Unicode. The four algorithms below perform text normalization during transcoding [NFC, NFC-Corrigendum]. We RECOMMEND that externally specified canonicalization algorithms do the same. (Note, there can be ambiguities in converting existing charsets to Unicode, for an example see the XML Japanese Profile [XML-Japanese] NOTE.)

6.5.1 Canonical XML

Identifier for REQUIRED Canonical XML (omits comments):
http://www.w3.org/TR/2001/REC-xml-c14n-20010315
Identifier for Canonical XML with Comments:
http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments

An example of an XML canonicalization element is:

   <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>

The normative specification of Canonical XML is [XML-C14N]. The algorithm is capable of taking as input either an octet stream or an XPath node-set (or sufficiently functional alternative). The algorithm produces an octet stream as output. Canonical XML is easily parameterized (via an additional URI) to omit or retain comments.

6.5.2 Exclusive Canonicalization

Identifier for REQUIRED Exclusive Canonicalization (omits comments):
http://www.w3.org/TR/2001/REC-xml-c14n-20010315
Identifier for Exclusive Canonicalization with Comments:
http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments

As explained in Section 7.3, for signatures which must be portable between different ancestor namespace declaration and xml namespace attribute context, it is necessary to use some method which excludes such context. These "exclusive" canonicalization algorithms exclude such context and are the same as the Canonical XML and Canonical XML with Comments algorithms with the following exceptions:

  1. No XML namespace attributes (such as xml:lang) are imported from ancestor nodes of the top element node in the node set being serialized. Note that if some such attribute is required by the XML being canonicalized, it must be appropriately declared within such XML, possibly at its apex, or the application must assure that it will always be appropriately declared in every context in which that XML might be interpreted.
  2. An additional test is made only at the top element node of the node set being serialized before outputting a namespace declaration as part of the serialization. This may be done after the tests currently ennumerated in the Canonical XML standard. In particular, such serialized namespace declaration output is permitted only if the prefix being declared in actually in use in the start tag being serialized or in some child element in the node set being output without an intervening node at which that prefix is re-declared.

NOTE: For many applications that use the DOM Level Two data model, this effect can be achieved by first divorcing an element node from its ancestor context by using the DOM Level Two removeChild function and then applying either Canonical XML or Canonical XML with Comments as appropriate. The only difference would be in the treatment of apex node namespace declarations whose prefix is either never used in any descendent or which is always redeclared at an intervening level before being used in a descendent.


7.0 XML Canonicalization and Syntax Constraint Considerations

Digital signatures only work if the verification calculations are performed on exactly the same bits as the signing calculations. If the surface representation of the signed data can change between signing and verification, then some way to standardize the changeable aspect must be used before signing and verification. For example, even for simple ASCII text there are at least three widely used line ending sequences. If it is possible for signed text to be modified from one line ending convention to another between the time of signing and signature verification, then the line endings need to be canonicalized to a standard form before signing and verification or the signatures will break.

XML is subject to surface representation changes and to processing which discards some surface information. For this reason, XML digital signatures have a provision for indicating canonicalization methods in the signature so that a verifier can use the same canonicalization as the signer.

Throughout this specification we distinguish between the canonicalization of a Signature element and other signed XML data objects. It is possible for an isolated XML document to be treated as if it were binary data so that no changes can occur. In that case, the digest of the document will not change and it need not be canonicalized if it is signed and verified as such. However, XML that is read and processed using standard XML parsing and processing techniques is changed such that some of its surface representation information is lost or modified. In particular, this will occur in many cases for the Signature and enclosed SignedInfo elements since they, and possibly an encompassing XML document, will be processed as XML.

Similarly, these considerations apply to Manifest, Object, and SignatureProperties elements if those elements have been digested, their DigestValue is to be checked, and they are being processed as XML.

The kinds of changes in XML that may need to be canonicalized can be divided into four categories. There are those related to the basic [XML], as described in 7.1 below. There are those related to [DOM], [SAX], or similar processing as described in 7.2 below. Third, there is the possibility of coded character set conversion, such as between UTF-8 and UTF-16, both of which all  [XML] compliant processors are required to support, which is described in the paragraph immediately below. And, fourth, there are changes that related to namespace declaration and xml namespace attribute context as described in 7.3 below.

Any canonicalization algorithm should yield output in a specific fixed coded character set. All canonicalization algorithms specified in this document use UTF-8 (without a byte order mark (BOM)) and do not provide character normalization. We RECOMMEND that signature applications create XML content (Signature elements and their descendents/content) in Normalization Form C [NFC, NFC-Corrigendum] and check that any XML being consumed is in that form as well; (if not, signatures may consequently fail to validate). Additionally, none of these algorithms provide data type normalization. Applications that normalize data types in varying formats (e.g., (true, false) or (1,0)) may not be able to validate each other's signatures.


7.3 Context and Portable Signatures

In [XPath] and consequently Canonical XML data models an element has namespace nodes that correspond to those declarations within the element and its ancestors:

"Note: An element E has namespace nodes that represent its namespace declarations as well as any namespace declarations made by its ancestors that have not been overridden in E's declarations, the default namespace if it is non-empty, and the declaration of the prefix xml." [XML-C14N]

When serializing a Signature element or signed XML data that's the child of other elements using these data models, that Signature element and its children, may contain namespace declarations from its ancestor context. In addition, the Canonical XML and Canonical XML with Comments algorithms import all xml namespace attributes (such as xml:lang) from the nearest ancestor in which they are declared to the apex node of canonicalized XML unless they are already declared at that node. This may frustrate the intent of the signer to create a signature in one context which remains valid in another. For example, given a signature which is a child of B and a grandchild of A:

   <A xmlns:n1="&foo;">
     <B xmlns:n2="&bar;">
       <Signature xmlns="&dsig;">   ...
         <Reference URI="#signme"/> ...
       </Signature>
       <C ID="signme" xmlns="&baz;"/>
     </B>
   </A>

when either the element B or the signed element C is moved into a [SOAP] envelope for transport:

   <SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
     ...
     <SOAP:Body>
       <B xmlns:n2="&bar;">
         <Signature xmlns="&dsig;">
           ...
         </Signature>
         <C ID="signme" xmlns="&baz;"/>
       </B>
     </SOAP:Body>
   </SOAP:Envelope>

The canonical form of the signature in this context will contain new namespace declarations from the SOAP:Envelope context, invalidating the signature. Also, the canonical form will lack namespace declarations it may have originally had from element A's context, also invalidating the signature.

Applications that wish to create signatures that survive porting should either:

  1. Use a canonicalization method that "repels" instead of "attracts" ancestor context, such as the "exclusive" canonicalizations defined in section 6.5.2.
  2. For signed XML other than SignedInfo, use XPath or other transforms to assure that exactly the same set of desired namespace declarations and xml namespace attribute declarations are present at both signature generation and validation.


8.1.1 Only What is Signed is Secure

First, obviously, signatures over a transformed document do not secure any information discarded by transforms: only what is signed is secure.

Note that the use of any of the canonicalizations listed herein ensures that all internal entities are expanded within the content being signed. In addition, the two forms of Canonical XML generally import namespaces and xml namespace attributes from ancestors. All entities are replaced with their definitions and the Canonical XML forms explicitly represents the namespaces and xml namespace attributes that an element would otherwise inherit. Applications that do not canonicalize XML content (especially the SignedInfo element) SHOULD NOT use internal entities and SHOULD represent any namespace declarations and xml namespace attributes explicitly within the content being signed since they can not rely upon canonicalization to do this for them. Also, users concerned with the integrity of the element type definitions associated with the XML instance being signed may wish to sign those definitions as well (i.e,. the schema, DTD, or natural language description associated with the namespace/identifier).