Schema Contributions to C14N (Was: Suggested additions to 3.0 Processing Rules section) from Joseph M. Reagle Jr. on 2001-07-12 (w3c-ietf-xmldsig@w3.org from July to September 2001)

From: Joseph M. Reagle Jr. <reagle@w3.org>
Date: Thu, 12 Jul 2001 19:43:24 -0400
To: merlin <merlin@baltimore.ie>, "Gregor Karlinger" <gregor.karlinger@iaik.at>
Cc: "Donald Eastlake" <lde008@dma.isg.mot.com>, w3c-ietf-xmldsig@w3.org
Message-Id: <4.3.2.7.2.20010712185459.02502ee8@localhost>
At 06:11 7/12/2001, merlin wrote:
>I think that this may be part of a bigger issue raised later
>in [2]. I agree with you that it is probably smart for us to
>derive from string.

I neglected to explicitly represent this in [1] (now remedied) because I was 
waiting for a response back from the Schema WG on base64 and some 
discussions at the XML Processing Workshop [2]. Michael Sperberg-McQueen 
(Schema Co-Chair) did tell me the WG decided to go with the base64 lexical 
space from the RFC (no funky characters permitted) and arbitrary white space 
which is ignored -- I think. But they hadn't decided on the schema 
normalized form, so he's supposed to send an email with the character 
constraints and question about the normalized form to this WG.

[1] http://www.w3.org/Signature/20000228-last-call-issues.html#CandidateREC-2
[2] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2001AprJun/0224.html

But he's also the Chair of the Processing Workshop which I attended today 
(and tomorrow), so I understand why he's been hard pressed to send that 
email. At this Workshop, we're trying to address what/if there is a default 
XML processing (for instance, XML1.0/DTD parse, then xml:base, then XLink 
expansion, then schema validation.) I assumed (with some confusing and 
hesitation) that when we say an XML document is parsed, that stuff would 
happen automagically. However, that isn't the case and I don't think it's 
likely that a Recommendation will issue specifying how these things will 
always happen: people want flexibility. Seems most people think 
XML1.0+namespace+xml:base is the first step in the XML "pipeline" but after 
that it's up to the apps. One could for see a standard for describing the 
processing steps (as we did with Transforms) but that's an open question. 
However, the thing I take away is that we should identify XInclude and 
Schema contributions to the node set being serialized/signed as explicit 
transforms since we can't assume it'll be done consistently by default. 
(Some want to schema validate, then do XIncludes, others want to do 
XIncludes, then schema validate). For example, the following parses the XML 
(taking care of XML1.0 and its DTD, xml:base, and namespaces), expands the 
Xincludes, and schema validates it.

<Reference URI="foo.xml">
   <Transforms>
     <Transform Algorithm="http://www.w3.org/TR/2001/WD-xinclude-20010516/">
     <Transform
      Algorithm="http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/">
     <Transform Algorithm="&c14n;">
   <Transforms>

Now, to the question of what schema infoset contributions (they call this 
the "PSVI") will be manifested in the canonical form: since they won't be 
available to the XPath data model, most won't be serialized. However, if 
schema validation is specified as a transform, that *would* result in 
default attributes manifesting. In speaking to Henry Thompson about this, he 
realized that the schema spec doesn't provide this info as a normalized 
attribute value, only as a PSVI normalized attribute value which XPath 
wouldn't see -- this is a bug meriting an erratum most likely.

I'm still thinking all this through, so it requires more conversation. (Best 
yet, since XSV (schema validator) is written in python, I'd like to see what 
kind of node set it returns and what an XPath selection over it returns.)




>It is not clear to me whether a schema-validated document is
>required to expose both the initial value (i.e., post-DTD)
>and the schema-normalized value, or whether it can expose just
>the schema-normalized value. But schema validation may
>introduce a set of normalization problems with signed docs.
>
>Merlin
>
>[2] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2001AprJun/0361.html
>
>
>-----------------------------------------------------------------------------
>Baltimore Technologies plc will not be liable for 
>direct,  special,  indirect
>or consequential  damages  arising  from  alteration of  the contents of this
>message by a third party or as a result of any virus being passed on.
>
>In addition, certain Marketing collateral may be added from time to time to
>promote Baltimore Technologies products, services, Global e-Security or
>appearance at trade shows and conferences.
>
>This footnote confirms that this email message has been swept by
>Baltimore MIMEsweeper for Content Security threats, including
>computer viruses.
>    http://www.baltimore.com


--
Joseph Reagle Jr.                 http://www.w3.org/People/Reagle/
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/Signature
W3C XML Encryption Chair          http://www.w3.org/Encryption/2001/
Received on Thursday, 12 July 2001 19:43:42 UTC