CanonicalizationMethod
ElementCanonicalizationMethod is a required element that specifies the
canonicalization algorithm applied to the SignedInfo
element prior to performing signature calculations. This element uses
the general structure for algorithms described in Algorithm Identifiers and Implementation Requirements
(section 6.1). Implementations MUST support the REQUIRED canonicalization algorithms.
Alternatives to the REQUIRED canonicalization algorithms (section 6.5), such as Canonical XML with Comments (section 6.5.1) or a minimal canonicalization (such as CRLF and charset normalization), may be explicitly specified but are NOT REQUIRED. Consequently, their use may not interoperate with other applications that do not support the specified algorithm (see XML Canonicalization and Syntax Constraint Considerations, section 7). Security issues may also arise in the treatment of entity processing and comments if non-XML aware canonicalization algorithms are not properly constrained (see section 8.2: Only What is "Seen" Should be Signed).
The way in which the SignedInfo
element is
presented to the canonicalization method is dependent on that
method. The following applies to algorithms which process XML as
nodes or characters:
SignedInfo
and currently
indicating the SignedInfo
, its descendants, and the
attribute and namespace nodes of SignedInfo
and its
descendant elements.We recommend that resource constrained applications that do not implement XML based canonicalization and instead choose a text based canonicalization be implemented to generate canonicalized XML as their output serialization so as to mitigate interoperability and security concerns. For instance, such an implementation SHOULD (at least) generate standalone XML instances [XML].
NOTE: The signature
application must exercise great care in accepting and executing an
arbitrary CanonicalizationMethod
. For example, the
canonicalization method could rewrite the URIs of the
References
being validated. Or, the method could massively
transform SignedInfo
so that validation would always
succeed (i.e., converting it to a trivial signature with a known
key over trivial data). Since CanonicalizationMetho
d
is inside SignedInfo
, in the resulting canonical form
it could erase itself from SignedInfo
or modify the
SignedInfo
element so that it appears that a different
canonicalization function was used! Thus a Signature
which appears to authenticate the desired data with the desired
key, DigestMethod
, and SignatureMethod
,
can be meaningless if a capricious
CanonicalizationMethod
is used.
Schema Definition: <element name="CanonicalizationMethod" type="ds:CanonicalizationMethodType"/> <complexType name="CanonicalizationMethodType" mixed="true"> <sequence> <any namespace="##any" minOccurs="0" maxOccurs="unbounded"/> <!-- (0,unbounded) elements from (1,1) namespace --> </sequence> <attribute name="Algorithm" type="anyURI" use="required"/> </complexType>
DTD: <!ELEMENT CanonicalizationMethod (#PCDATA %Method.ANY;)* > <!ATTLIST CanonicalizationMethod Algorithm CDATA #REQUIRED >
Note: XPath is RECOMMENDED. Signature applications need not conform to [XPath] specification in order to conform to this specification. However, the XPath data model, definitions (e.g., node-sets) and syntax is used within this document in order to describe functionality for those that want to process XML-as-XML (instead of octets) as part of signature generation. For those that want to use these features, a conformant [XPath] implementation is one way to implement these features, but it is not required. Such applications could use a sufficiently functional replacement to a node-set and implement only those XPath expression behaviors REQUIRED by this specification. However, for simplicity we generally will use XPath terminology without including this qualification on every point. Requirements over "XPath nodesets" can include a node-set functional equivalent. Requirements over XPath processing can include application behaviors that are equivalent to the corresponding XPath behavior.
The data-type of the result of URI dereferencing or subsequent
Transforms
is either an octet stream or an XPath node-set.
The Transforms
specified in this document are defined with
respect to the input they require. The following is the default
signature application behavior:
Users may specify alternative transforms that override these
defaults in transitions between Transforms that expect different
inputs. The final octet stream contains the data octets being
secured. The digest algorithm specified by
DigestMethod
is then applied to these data octets, resulting
in the DigestValue
.
Unless the URI-Reference is a 'same-document' reference as defined in [URI, Section 4.2], the result of dereferencing the URI-Reference MUST be an octet stream. In particular, an XML document identified by URI is not parsed by the signature application unless the URI is a same-document reference or unless a transform that requires XML parsing is applied (See Transforms (section 4.3.3.1).)
When a fragment is preceded by an absolute or relative URI in
the URI-Reference, the meaning of the fragment is defined by the
resource's MIME type. Even for XML documents, URI dereferencing
(including the fragment processing) might be done for the signature
application by a proxy. Therefore, reference validation might fail
if fragment processing is not performed in a standard way (as
defined in the following section for same-document references).
Consequently, we RECOMMEND that the URI
attribute not include fragment identifiers and that such processing
be specified as an additional XPath
Transform.
When a fragment is not preceded by a URI in the URI-Reference, XML signature applications MUST support the null URI and barename XPointer. We RECOMMEND support for the same-document XPointers '#xpointer(/)' and '#xpointer(id('ID'))' if the application also intends to support any canonicalization that preserves comments. (Otherwise URI="#foo" will automatically remove comments before the canonicalization can even be invoked.) All other support for XPointers is OPTIONAL, especially all support for barename and other XPointers in external resources since the application may not have control over how the fragment is generated (leading to interoperability problems and validation failures).
The following examples demonstrate what the URI attribute identifies and how it is dereferenced:
URI="http://example.com/bar.xml"
URI="http://example.com/bar.xml#chapter1"
URI=""
URI="#chapter1"
Algorithms are identified by URIs that appear as an attribute to
the element that identifies the algorithms' role
(DigestMethod
, Transform
,
SignatureMethod
, or CanonicalizationMethod
).
All algorithms used herein take parameters but in many cases the
parameters are implicit. For example, a
SignatureMethod
is implicitly given two parameters: the
keying info and the output of CanonicalizationMethod
.
Explicit additional parameters to an algorithm appear as content
elements within the algorithm role element. Such parameter elements
have a descriptive element name, which is frequently algorithm
specific, and MUST be in the XML Signature namespace or an
algorithm specific namespace.
This specification defines a set of algorithms, their URIs, and requirements for implementation. Requirements are specified over implementation, not over requirements for signature use. Furthermore, the mechanism is extensible; alternative algorithms may be used by signature applications.
* The Enveloped Signature transform removes the
Signature
element from the calculation of the signature when
the signature is within the content that it is being signed. This
MAY be implemented via the RECOMMENDED XPath specification
specified in 6.6.4: Enveloped
Signature Transform; it MUST have the same effect as that
specified by the XPath Transform.
If canonicalization is performed over octets, the canonicalization algorithms take two implicit parameters: the content and its charset. The charset is derived according to the rules of the transport protocols and media types (e.g, RFC2376 [XML-MT] defines the media types for XML). This information is necessary to correctly sign and verify documents and often requires careful server side configuration.
Various canonicalization algorithms require conversion to [UTF-8]. The four algorithms below understand at least [UTF-8] and [UTF-16] as input encodings. We RECOMMEND that externally specified algorithms do the same. Knowledge of other encodings is OPTIONAL.
Various canonicalization algorithms transcode from a non-Unicode encoding to Unicode. The four algorithms below perform text normalization during transcoding [NFC, NFC-Corrigendum]. We RECOMMEND that externally specified canonicalization algorithms do the same. (Note, there can be ambiguities in converting existing charsets to Unicode, for an example see the XML Japanese Profile [XML-Japanese] NOTE.)
An example of an XML canonicalization element is:
<CanonicalizationMethod Algorithm="
http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
The normative specification of Canonical XML is [XML-C14N]. The algorithm is capable of taking as input either an octet stream or an XPath node-set (or sufficiently functional alternative). The algorithm produces an octet stream as output. Canonical XML is easily parameterized (via an additional URI) to omit or retain comments.
As explained in Section 7.3, for signatures which must be portable
between different ancestor namespace declaration and xml namespace
attribute context, it is necessary to use some method which excludes
such context. These "exclusive" canonicalization algorithms exclude
such context and are the same as the Canonical XML and Canonical XML
with Comments algorithms with the following exceptions:
xml:lang
) are
imported from ancestor nodes of the top element node in the node set being
serialized. Note that if some such attribute is required by the XML
being canonicalized, it must be appropriately declared within such
XML, possibly at its apex, or the application must assure that it will
always be appropriately declared in every context in which that XML
might be interpreted.
NOTE: For many applications that use the DOM Level Two data model, this effect can be
achieved by first divorcing an element node from its ancestor context
by using the DOM Level Two removeChild
function and then
applying either Canonical XML or Canonical XML with Comments as
appropriate. The only difference would be in the treatment of apex
node namespace declarations whose prefix is either never used in any
descendent or which is always redeclared at an intervening level
before being used in a descendent.
Digital signatures only work if the verification calculations are performed on exactly the same bits as the signing calculations. If the surface representation of the signed data can change between signing and verification, then some way to standardize the changeable aspect must be used before signing and verification. For example, even for simple ASCII text there are at least three widely used line ending sequences. If it is possible for signed text to be modified from one line ending convention to another between the time of signing and signature verification, then the line endings need to be canonicalized to a standard form before signing and verification or the signatures will break.
XML is subject to surface representation changes and to processing which discards some surface information. For this reason, XML digital signatures have a provision for indicating canonicalization methods in the signature so that a verifier can use the same canonicalization as the signer.
Throughout this specification we distinguish between the
canonicalization of a Signature
element and other signed
XML data objects. It is possible for an isolated XML document to be
treated as if it were binary data so that no changes can occur. In
that case, the digest of the document will not change and it need not
be canonicalized if it is signed and verified as such. However, XML
that is read and processed using standard XML parsing and processing
techniques is changed such that some of its surface representation
information is lost or modified. In particular, this will occur in
many cases for the Signature
and enclosed
SignedInfo
elements since they, and possibly an
encompassing XML document, will be processed as XML.
Similarly, these considerations apply to Manifest
,
Object
, and SignatureProperties
elements
if those elements have been digested, their
DigestValue
is to be checked, and they are being processed
as XML.
The kinds of changes in XML that may need to be canonicalized can be divided into four categories. There are those related to the basic [XML], as described in 7.1 below. There are those related to [DOM], [SAX], or similar processing as described in 7.2 below. Third, there is the possibility of coded character set conversion, such as between UTF-8 and UTF-16, both of which all [XML] compliant processors are required to support, which is described in the paragraph immediately below. And, fourth, there are changes that related to namespace declaration and xml namespace attribute context as described in 7.3 below.
Any canonicalization algorithm should yield output in a specific
fixed coded character set. All canonicalization algorithms specified in this document use
UTF-8 (without a byte order mark (BOM)) and do not provide character
normalization. We RECOMMEND that signature applications create XML
content (Signature
elements and their
descendents/content) in Normalization Form C [NFC, NFC-Corrigendum] and check that any XML
being consumed is in that form as well; (if not, signatures may
consequently fail to validate). Additionally, none of these algorithms
provide data type normalization. Applications that normalize data
types in varying formats (e.g., (true, false) or (1,0)) may not be
able to validate each other's signatures.
In [XPath] and consequently Canonical XML data models an element has namespace nodes that correspond to those declarations within the element and its ancestors:
"Note: An element E has namespace nodes that represent its namespace declarations as well as any namespace declarations made by its ancestors that have not been overridden in E's declarations, the default namespace if it is non-empty, and the declaration of the prefix
xml
." [XML-C14N]
When serializing a Signature
element or signed XML
data that's the child of other elements using these data models,
that Signature
element and its children, may contain
namespace declarations from its ancestor context. In addition, the
Canonical XML and Canonical XML with Comments algorithms import all
xml namespace attributes (such as xml:lang
) from the
nearest ancestor in which they are declared to the apex node of
canonicalized XML unless they are already declared at that node.
This may frustrate the intent of the signer to create a signature in
one context which remains valid in another. For example, given a
signature which is a child of B
and a grandchild of
A
:
<A xmlns:n1="&foo;"> <B xmlns:n2="&bar;"> <Signature xmlns="&dsig;"> ... <Reference URI="#signme"/> ... </Signature> <C ID="signme" xmlns="&baz;"/> </B> </A>
when either the element B
or the signed element
C
is moved into a [SOAP]
envelope for transport:
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/"> ... <SOAP:Body> <B xmlns:n2="&bar;"> <Signature xmlns="&dsig;"> ... </Signature> <C ID="signme" xmlns="&baz;"/> </B> </SOAP:Body> </SOAP:Envelope>
The canonical form of the signature in this context will contain
new namespace declarations from the SOAP:Envelope
context, invalidating the signature. Also, the canonical form will
lack namespace declarations it may have originally had from element
A
's context, also invalidating the signature.
Applications that wish to create signatures that survive porting
should either:
SignedInfo
,
use XPath or other transforms
to assure that exactly the same set of desired namespace declarations and
xml namespace attribute declarations are present at both signature
generation and validation.
First, obviously, signatures over a transformed document do not secure any information discarded by transforms: only what is signed is secure.
Note that the use of any of the canonicalizations listed herein ensures that
all internal entities are expanded within the content being signed. In
addition, the two forms of Canonical XML generally import namespaces
and xml namespace attributes from ancestors. All entities are
replaced with their definitions and the Canonical XML forms explicitly
represents the namespaces and xml namespace attributes that an element
would otherwise inherit. Applications that do not canonicalize XML
content (especially the SignedInfo
element) SHOULD NOT
use internal entities and SHOULD represent any namespace declarations
and xml namespace attributes explicitly within the content being
signed since they can not rely upon canonicalization to do this for
them. Also, users concerned with the integrity of the element type
definitions associated with the XML instance being signed may wish to
sign those definitions as well (i.e,. the schema, DTD, or natural
language description associated with the namespace/identifier).