Re: Proposed Infoset Addendum to SOAP Messages with Attachments from Amelia A. Lewis on 2003-03-26 (xml-dist-app@w3.org from March 2003)

From: Amelia A. Lewis <alewis@tibco.com>
Date: Wed, 26 Mar 2003 12:19:42 -0500
To: Marc Hadley <Marc.Hadley@sun.com>
Cc: mgudgin@microsoft.com, xml-dist-app@w3.org
Message-Id: <20030326121942.25c6c0c4.alewis@tibco.com>

On Wed, 26 Mar 2003 11:18:33 -0500
Marc Hadley <Marc.Hadley@sun.com> wrote:
> I'm generally in favor of this approach but I think there are a couple 
> of problems it doesn't adequately address:
> 
> (i) Attachments are an optimization to avoid having to base64 encode 
> binary data. Section 8 of the proposal requires that 'signatures over 
> elements with xbinc:Include children MUST be signatures over the base64 
> data'. If you buy the premise that security in the form of DSIGs etc is 
> going to be widely used then this requirement basically nullifies the 
> advantage of using attachments since you'll have to run the base64 
> encoding to compute and verify signatures.
> 
> I think a better approach would be a xbinc:Include aware XML DSIG C14N 
> algorithm that just streams the binary data in the case of attachments 
> (hence preserving the optimization) and does base64 decoding in the 
> case of embedded data.

Interesting.  I hadn't thought of this case, only of the case of trying to set content-length (if set to the length of the base64 encoded representation, there are too many variables).

In fact, you probably cannot do a DSIG over the base64 encoding, unless the spec better specifies how the transformation to base64 is to take place.

If we were talking about the MIME specification of base64, there would be less confusion.  The MIME specification says that each line is 76 characters long, plus CRLF, except for the final line.  It uses ASCII for text.  You can therefore calculate exactly how many bytes a given input will produce.

The XML Schema definition of base64 is more "lenient".  I guess that's the term.  It does not require that there be line breaks.  Line breaks, if they appear, are defined to be LF only.  And, of course, the base64 encoding is *on top* of whatever text encoding the document carries: if you're using UTF16, then you've got not two bits of overhead for each six bits of information, but *ten*.  The octet stream (on which the DSIG algorithm operates) is going to be extremely sensitive to any variations in permitted line length, line separator characters, and underlying encoding (although all of the encodings that use ASCII as the bottom 7 bits are pretty safe, here).

In short, there's not enough there to specify the bit pattern of the *encoded* stream.  So all operations defined on the infoset interpretation probably *ought* to be over the *decoded* stream.

Amy!
-- 
Amelia A. Lewis
Architect, TIBCO/Extensibility, Inc.
alewis@tibco.com

Received on Wednesday, 26 March 2003 12:19:23 UTC