Re: Proposal for various Infosetisms from noah_mendelsohn@us.ibm.com on 2002-10-01 (xml-dist-app@w3.org from October 2002)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 1 Oct 2002 12:10:43 -0400
To: Marc Hadley <marc.hadley@sun.com>
Cc: mgudgin@microsoft.com, Rich Salz <rsalz@datapower.com>, xml-dist-app@w3.org
Message-ID: <OF593CAF91.BE8F0830-ON85256C45.0057B5E0@lotus.com>
Marc Hadley writes:

>> Should signatures that include such 
>> header blocks break when an intermediary 
>> removes env:mustUnderstand="false" ? 

I think either answer is coherent and potentially useful, but the one I 
had in mind was just the envelope infoset.  If you let me toggle the 
physical presence of mU attributes, then I have (an admittedly very 
clumsy) covert channel available.  You and I can make an agreement to 
signal information by its coming and going.  Especially since it's also 
easier to reason about and explain the rules for "you either have the same 
infoset or you don't...if you do, the signature matches", I think that 's 
a good one to consider.  Not too that we are not inventing the signature 
here, just talking about getting a bit of variability out of the 
processing model with the justification that it might allow others to do 
the signature.

>> Digests/checksums work on bits and bytes, not 
>> abstract infosets. 

Not necessarily.  I have built and deployed systems that checksum abstract 
interfaces.  I believe my earlier note gave the appropriate definition of 
such a signature over the infoset: "we merely need a checksum that is the 
same whenever the infoset is the same, and with very high probability is 
different when the infoset is different."  Now, at some level most 
implementations of such signatures will indeed involve inventing little 
bits of infoset encoding, not necessarily serialized into one giant 
stream, that represent every piece of information in the infoset so it can 
be hashed together to form the signature.  I think that's what you mean by 
a canonicalization, but that's not the term I would use:  a 
canonicalization is a many to one mapping.  In this case, we more likely 
have a 1-to-1 mapping of the information in an infoset into a code that 
can be checksummed.  There are no two infosets in this model that 
"canonicalize" to the same reprsentation or that get the same signature. 
For this reason, it's just an implementation detail how you actually build 
up the signature.  I claim it is manifestly possible and practical to 
invent codes with the characteristic described (I.e. same infoset==> same 
code, different infoset ==(high probability)==> different code), and how 
you do it is not what's important here.  I furthermore suggest that such 
checksums implement a semantic that will be very comprehensible to and 
useful to users:  "the same envelope is OK, any change is an error".  End 
of story.

 
As I say, I have built and deployed systems that use essentially this 
approach to checksumming a set of similarly abstract information (turns 
out it was a set of declarations in the Pascal programming language), we 
did it using exactly the technique described above, and it worked well for 
users.

So, I think there's a really simple rule that facilitates doing this sort 
of thing:  intermediaries should not make gratuitous changes to the 
Envelope infoset.  I'm for this reason against the rewriting of mU 
attributes, and against the removal and insertion of empty <Header> 
elements.  As I said in my note to Gudge, my fallback position would be to 
go completely the other way:  to enable both removal and insertion of such 
equivalent forms.  Allowing only removal seems to me to have some of the 
disadvantages of both approaches.  I can concur with a WG decision that 
goes either of these two ways, but I think the "don't mess with it" 
approach is stronger architecturally.  Thanks.

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Tuesday, 1 October 2002 12:14:00 UTC