Re: Proposal for various Infosetisms from Marc Hadley on 2002-10-01 (xml-dist-app@w3.org from October 2002)

From: Marc Hadley <marc.hadley@sun.com>
Date: Tue, 1 Oct 2002 12:53:45 -0400
To: noah_mendelsohn@us.ibm.com
Cc: mgudgin@microsoft.com, Rich Salz <rsalz@datapower.com>, xml-dist-app@w3.org
Message-Id: <591DE94A-D55E-11D6-9636-0003937568DC@sun.com>

On Tuesday, Oct 1, 2002, at 12:10 US/Eastern, 
noah_mendelsohn@us.ibm.com wrote:
>
>>> Digests/checksums work on bits and bytes, not
>>> abstract infosets.
>
> Not necessarily.  I have built and deployed systems that checksum 
> abstract
> interfaces.  I believe my earlier note gave the appropriate definition 
> of
> such a signature over the infoset: "we merely need a checksum that is 
> the
> same whenever the infoset is the same, and with very high probability 
> is
> different when the infoset is different."  Now, at some level most
> implementations of such signatures will indeed involve inventing little
> bits of infoset encoding, not necessarily serialized into one giant
> stream, that represent every piece of information in the infoset so it 
> can
> be hashed together to form the signature.  I think that's what you 
> mean by
> a canonicalization, but that's not the term I would use
That is exactly what I mean by C14N. Canonical XML[1] discusses, e.g. 
attribute ordering: not something that you have to worry about in the 
XML infoset but very important when calculating a digest.

> :  a
> canonicalization is a many to one mapping.  In this case, we more 
> likely
> have a 1-to-1 mapping of the information in an infoset into a code that
> can be checksummed.  There are no two infosets in this model that
> "canonicalize" to the same reprsentation or that get the same 
> signature.
> For this reason, it's just an implementation detail how you actually 
> build
> up the signature.
And I would argue that canonicalization is one of those implementation 
details ;-). I think we agree in spirit if not in terminology.

>
> So, I think there's a really simple rule that facilitates doing this 
> sort
> of thing:  intermediaries should not make gratuitous changes to the
> Envelope infoset.  I'm for this reason against the rewriting of mU
> attributes, and against the removal and insertion of empty <Header>
> elements.  As I said in my note to Gudge, my fallback position would 
> be to
> go completely the other way:  to enable both removal and insertion of 
> such
> equivalent forms.  Allowing only removal seems to me to have some of 
> the
> disadvantages of both approaches.  I can concur with a WG decision that
> goes either of these two ways, but I think the "don't mess with it"
> approach is stronger architecturally.  Thanks.
>
I need to think about this some more, but I think my preference at the 
moment is for your fallback position. The main reason being that 
preserving the exact infoset could be quite onerous.

Regards,
Marc.

[1] http://www.w3.org/TR/xml-c14n

--
Marc Hadley <marc.hadley@sun.com>
XML Technology Center, Sun Microsystems.

Received on Tuesday, 1 October 2002 12:54:14 UTC