RE: Proposed Infoset Addendum to SOAP Messages with Attachments

 

> 
> -----Original Message-----
> From: Amelia A. Lewis [mailto:alewis@tibco.com] 
> Sent: 26 March 2003 09:20
> To: Marc Hadley
> Cc: Martin Gudgin; xml-dist-app@w3.org
> 
> On Wed, 26 Mar 2003 11:18:33 -0500
> Marc Hadley <Marc.Hadley@sun.com> wrote:
> > I'm generally in favor of this approach but I think there 
> are a couple 
> > of problems it doesn't adequately address:
> > 
> > (i) Attachments are an optimization to avoid having to 
> base64 encode 
> > binary data. Section 8 of the proposal requires that 
> 'signatures over 
> > elements with xbinc:Include children MUST be signatures over the 
> > base64 data'. If you buy the premise that security in the form of 
> > DSIGs etc is going to be widely used then this requirement 
> basically 
> > nullifies the advantage of using attachments since you'll 
> have to run 
> > the base64 encoding to compute and verify signatures.
> > 
> > I think a better approach would be a xbinc:Include aware 
> XML DSIG C14N 
> > algorithm that just streams the binary data in the case of 
> attachments 
> > (hence preserving the optimization) and does base64 decoding in the 
> > case of embedded data.
> 
> Interesting.  I hadn't thought of this case, only of the case 
> of trying to set content-length (if set to the length of the 
> base64 encoded representation, there are too many variables).
> 
> In fact, you probably cannot do a DSIG over the base64 
> encoding, unless the spec better specifies how the 
> transformation to base64 is to take place.

Perhaps mandating the canonical rep defined by XML Schema[1] would help.

> 
> If we were talking about the MIME specification of base64, 
> there would be less confusion.  The MIME specification says 
> that each line is 76 characters long, plus CRLF, except for 
> the final line.  It uses ASCII for text.  You can therefore 
> calculate exactly how many bytes a given input will produce.

Or this one...

> 
> The XML Schema definition of base64 is more "lenient".  I 
> guess that's the term.  It does not require that there be 
> line breaks.  Line breaks, if they appear, are defined to be 
> LF only.  And, of course, the base64 encoding is *on top* of 
> whatever text encoding the document carries: if you're using 
> UTF16, then you've got not two bits of overhead for each six 
> bits of information, but *ten*.  The octet stream (on which 
> the DSIG algorithm operates) is going to be extremely 
> sensitive to any variations in permitted line length, line 
> separator characters, and underlying encoding (although all 
> of the encodings that use ASCII as the bottom 7 bits are 
> pretty safe, here).

The current C14N algorithms for xmldsig all assume a UTF-8 encoding (
AFAIR ) so some of the above concerns are mitigated, I think.

> 
> In short, there's not enough there to specify the bit pattern 
> of the *encoded* stream.  So all operations defined on the 
> infoset interpretation probably *ought* to be over the 
> *decoded* stream.

Agreed. We need to be more specific.

Gudge

[1]
http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2001Oct/0034.html

Received on Wednesday, 26 March 2003 12:36:53 UTC