RDF XML Canonicalization Intro

> RDF XML Canonicalization Intro
> ==============================
> Intended for the RDF group, summarising C14N and XML subsets.
>

There are two XML Canonicalization specifications.

http://www.w3.org/TR/xml-c14n

and

http://www.w3.org/TR/xml-exc-c14n


These are produced by the XML Signature working group who wish to make it
possible to provide digital signatures for XML documents and XML document
subsets.

One solution would be to sign just the preprocessed document, but that would
mean that documents that are identical as XML would have different
signatures. XML Infoset defines a minimum set of things that may differ
between identical XML documents, including the character encoding, white
space in various place, attribute order, empty element tags vs a start tag
followed by an end tag etc.

The C14N route is based around first turning any XML document or document
subset into its canonical form, (which is the same for equivalent documents)
and then signing that.


Moreover, the XML C14N work, decided that character and entity references
would also be subject to canonicalization, and their use not be regarded as
part of the document to be signed. In contrast, namespace prefixes, which
are also in infoset, were determined to be part of the document, and
namespace prefix rewriting (i.e. the process of renaming namespace prefixes)
is *not* part of canonicalizing.

So two documents that:
- have the same infoset
- or differ only in their use of entity and character references

canonicalize the same.

Key aspects of canonicalization are given in the short section
http://www.w3.org/TR/xml-c14n#Terminology
e.g. "Empty elements are converted to start-end tag pairs"


C14N addresses document subsets both within the main spec at:
http://www.w3.org/TR/xml-c14n#DocSubsets

and within the exclusive spec (which only addresses document subsets).

It should be noted that within C14N document subsets are not assumed to be
contiguous.

The two C14N specs differ in the exact treatment given to namespaces.

C14N can be used to address the parseType="Literal" problem because:
- it allows us to clarify that two RDF/XML files that only differ in the
exact nature of XML in an xml literal, but whose infosets are the same, are
in fact the same.
- it does specify (a number of alternative ways) how to address the
namespace issue.
- it does provide an answer to what is an XML document subset, in that it
tells us if two such subsets are the same.


However, it does leave a number of things to be decided.

1: The exact treatment of namespaces (whether to use exclusive or inclusive
canonicalization).
2: The treatment of XML comments
3: In precisely what way we the RDF specs depend on the XML C14N specs.

There are also some limitiations with C14N identified at:
http://www.w3.org/TR/xml-c14n#Limitations

Moreover the exclusive form has additional limitations, to be described
later.

Jeremy

Received on Monday, 4 March 2002 08:46:48 UTC