- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Mon, 4 Mar 2002 13:46:30 -0000
- To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>, <w3c-ietf-xmldsig@w3.org>
> RDF XML Canonicalization Intro > ============================== > Intended for the RDF group, summarising C14N and XML subsets. > There are two XML Canonicalization specifications. http://www.w3.org/TR/xml-c14n and http://www.w3.org/TR/xml-exc-c14n These are produced by the XML Signature working group who wish to make it possible to provide digital signatures for XML documents and XML document subsets. One solution would be to sign just the preprocessed document, but that would mean that documents that are identical as XML would have different signatures. XML Infoset defines a minimum set of things that may differ between identical XML documents, including the character encoding, white space in various place, attribute order, empty element tags vs a start tag followed by an end tag etc. The C14N route is based around first turning any XML document or document subset into its canonical form, (which is the same for equivalent documents) and then signing that. Moreover, the XML C14N work, decided that character and entity references would also be subject to canonicalization, and their use not be regarded as part of the document to be signed. In contrast, namespace prefixes, which are also in infoset, were determined to be part of the document, and namespace prefix rewriting (i.e. the process of renaming namespace prefixes) is *not* part of canonicalizing. So two documents that: - have the same infoset - or differ only in their use of entity and character references canonicalize the same. Key aspects of canonicalization are given in the short section http://www.w3.org/TR/xml-c14n#Terminology e.g. "Empty elements are converted to start-end tag pairs" C14N addresses document subsets both within the main spec at: http://www.w3.org/TR/xml-c14n#DocSubsets and within the exclusive spec (which only addresses document subsets). It should be noted that within C14N document subsets are not assumed to be contiguous. The two C14N specs differ in the exact treatment given to namespaces. C14N can be used to address the parseType="Literal" problem because: - it allows us to clarify that two RDF/XML files that only differ in the exact nature of XML in an xml literal, but whose infosets are the same, are in fact the same. - it does specify (a number of alternative ways) how to address the namespace issue. - it does provide an answer to what is an XML document subset, in that it tells us if two such subsets are the same. However, it does leave a number of things to be decided. 1: The exact treatment of namespaces (whether to use exclusive or inclusive canonicalization). 2: The treatment of XML comments 3: In precisely what way we the RDF specs depend on the XML C14N specs. There are also some limitiations with C14N identified at: http://www.w3.org/TR/xml-c14n#Limitations Moreover the exclusive form has additional limitations, to be described later. Jeremy
Received on Monday, 4 March 2002 08:46:49 UTC