RE: Proposed Infoset Addendum to SOAP Messages with Attachments from Martin Gudgin on 2003-03-26 (xml-dist-app@w3.org from March 2003)

From: Martin Gudgin <mgudgin@microsoft.com>
Date: Wed, 26 Mar 2003 10:54:38 -0800
To: "Amelia A. Lewis" <alewis@tibco.com>
Cc: <Marc.Hadley@sun.com>, <xml-dist-app@w3.org>
Message-ID: <7C083876C492EB4BAAF6B3AE0732970E0AEF0269@red-msg-08.redmond.corp.microsoft.com>

 

> 
> -----Original Message-----
> From: Amelia A. Lewis [mailto:alewis@tibco.com] 
> Sent: 26 March 2003 10:24
> To: Martin Gudgin
> Cc: Marc.Hadley@sun.com; xml-dist-app@w3.org
> 
> On Wed, 26 Mar 2003 09:36:47 -0800
> "Martin Gudgin" <mgudgin@microsoft.com> wrote:
> > Perhaps mandating the canonical rep defined by XML 
> Schema[1] would help.
> 
> Has it been approved?  The supplied reference requires a 
> login (which I could do, but is there a version approved by 
> the schema WG?).

I understood that it was going to be part of the 2nd edition. 

> 
> The supplied errata has at least three problems that I can see.
> 
> 1) it states that processors cannot enforce line-length 
> limit, and then gives productions that require enforcement of 
> line-length limits.
> 
> 2) it states that whitespace is permitted, but only LF is 
> included in the productions; therefore, a processor 
> implementor could assume that no other whitespace is permitted.
> 
> 3) it is not clear what the length calculation at the bottom 
> is for, why one would perform it, or who cares (is it 
> facet-related?  Why would the length facet care about the 
> length of the decoded stream, which this algorithm seems to require?).
> 
> oh, and 4) there's commentary, at the end, that mentions that 
> RFC2045 explicitly calls out ASCII as the encoding, but the 
> RFC explicitly states that any encoding that includes the 65 
> letters/symbols in its dictionary, plus space, CR, and LF, 
> can use base64 encoding (notably including EBCDIC).  The 
> statement that "decoding of base64binary data in an XML 
> entity is to be performed on the [US-ASCII-compatible] 
> Unicode characters obtained after character encoding 
> processing as specified by XML 1.0." is wrong-headed, at best 
> (it requires the transcoding step, which may be highly 
> inappropriate; a number of processors will then "inflate" the 
> information into characters defined as sixteen bit entities, 
> and will then proceed to throw away 5/8 of the memory 
> allocated, not 1/4).

Should we send this input to the Schema WG?

> 
> > The current C14N algorithms for xmldsig all assume a UTF-8 
> encoding ( 
> > AFAIR ) so some of the above concerns are mitigated, I think.
> 
> Assume, or require?

They all work on UTF-8 WRT XML. So if you want to compute a dsig of a
UTF-16 doc, the sig still needs to be over the UTF-8 form.

> 
> > Agreed. We need to be more specific.
> 
> Please.

Is this a plea to update the document? Or just that we add this to a
list of issues and resolve it? ( I don't really mind which, just
wondering ).

Cheers

Gudge

Received on Wednesday, 26 March 2003 13:54:49 UTC