Re: Proposed Infoset Addendum to SOAP Messages with Attachments from Amelia A. Lewis on 2003-03-26 (xml-dist-app@w3.org from March 2003)

From: Amelia A. Lewis <alewis@tibco.com>
Date: Wed, 26 Mar 2003 13:24:06 -0500
To: "Martin Gudgin" <mgudgin@microsoft.com>
Cc: Marc.Hadley@sun.com, xml-dist-app@w3.org
Message-Id: <20030326132406.05faa7de.alewis@tibco.com>

On Wed, 26 Mar 2003 09:36:47 -0800
"Martin Gudgin" <mgudgin@microsoft.com> wrote:
> Perhaps mandating the canonical rep defined by XML Schema[1] would help.

Has it been approved?  The supplied reference requires a login (which I could do, but is there a version approved by the schema WG?).

The supplied errata has at least three problems that I can see.

1) it states that processors cannot enforce line-length limit, and then gives productions that require enforcement of line-length limits.

2) it states that whitespace is permitted, but only LF is included in the productions; therefore, a processor implementor could assume that no other whitespace is permitted.

3) it is not clear what the length calculation at the bottom is for, why one would perform it, or who cares (is it facet-related?  Why would the length facet care about the length of the decoded stream, which this algorithm seems to require?).

oh, and 4) there's commentary, at the end, that mentions that RFC2045 explicitly calls out ASCII as the encoding, but the RFC explicitly states that any encoding that includes the 65 letters/symbols in its dictionary, plus space, CR, and LF, can use base64 encoding (notably including EBCDIC).  The statement that "decoding of base64binary data in an XML entity is to be performed on the [US-ASCII-compatible] Unicode characters obtained after character encoding processing as specified by XML 1.0." is wrong-headed, at best (it requires the transcoding step, which may be highly inappropriate; a number of processors will then "inflate" the information into characters defined as sixteen bit entities, and will then proceed to throw away 5/8 of the memory allocated, not 1/4).

> The current C14N algorithms for xmldsig all assume a UTF-8 encoding (
> AFAIR ) so some of the above concerns are mitigated, I think.

Assume, or require?

> Agreed. We need to be more specific.

Please.

Amy!
-- 
Amelia A. Lewis
Architect, TIBCO/Extensibility, Inc.
alewis@tibco.com

Received on Wednesday, 26 March 2003 13:23:48 UTC