Re: Proposed Infoset Addendum to SOAP Messages with Attachments from Amelia A. Lewis on 2003-03-26 (xml-dist-app@w3.org from March 2003)

From: Amelia A. Lewis <alewis@tibco.com>
Date: Wed, 26 Mar 2003 14:19:14 -0500
To: "Martin Gudgin" <mgudgin@microsoft.com>
Cc: Marc.Hadley@sun.com, xml-dist-app@w3.org
Message-Id: <20030326141914.3a548e59.alewis@tibco.com>

Heylas,

On Wed, 26 Mar 2003 10:54:38 -0800
"Martin Gudgin" <mgudgin@microsoft.com> wrote:
> > -----Original Message-----
> > From: Amelia A. Lewis [mailto:alewis@tibco.com] 
> > Sent: 26 March 2003 10:24
> > To: Martin Gudgin
> > Cc: Marc.Hadley@sun.com; xml-dist-app@w3.org
> > 
> > On Wed, 26 Mar 2003 09:36:47 -0800
> > "Martin Gudgin" <mgudgin@microsoft.com> wrote:
[snip]
> 
> Should we send this input to the Schema WG?

I've passed it on to the TIBCO rep there; he may edit/chop/change (or drop) it.

> > > The current C14N algorithms for xmldsig all assume a UTF-8 
> > encoding ( 
> > > AFAIR ) so some of the above concerns are mitigated, I think.
> > 
> > Assume, or require?
> 
> They all work on UTF-8 WRT XML. So if you want to compute a dsig of a
> UTF-16 doc, the sig still needs to be over the UTF-8 form.

Oh, interesting.  This requires a transformation?  If there's a BOM, does it get transformed in that horrid monstrosity, the UTF8 BOM?  Anyway, never mind.  An auto-transformation still requires handling of line length and line separator characteristics, as well as reduction of the ambiguity in white space handling.  A canonical transform is certainly possible, but note that the XML Schema definition does not supply sufficient information to be regarded as such.

> > > Agreed. We need to be more specific.
> > 
> > Please.
> 
> Is this a plea to update the document? Or just that we add this to a
> list of issues and resolve it? ( I don't really mind which, just
> wondering ).

That's a "right now or later?" kind of question, isn't it?  The one part of the document that I think ought to be fixed immediately is the broken regex for the Accept header in the schema.  The rest probably needs to be explored as an issue.  To raise these as specific issues:

1) Should the element types derived from base64Binary enforce stricter lexical constraints?  Specifically, should the derivation remove all ambiguities about whitespace and line length?

  1a) Should the paswa spec take care to note that, unlike MIME/RFC2045, the base64Binary type in XML Schema uses bare LF as the line separator?

  1b) Should the paswa spec try to remove existing ambiguities in base64 encoding?

  1c) Should the paswa spec allow "real MIME", "HTTP pseudo-MIME", or XML Schema base64 encodings?  Some combination?  Some flag to indicate the difference?

2) Should various transformations and operations on the included/referenced byte stream operate directly on the decoded byte stream (the "value space") rather than on the encoded XML representation (the "lexical space")?  This includes length counting and signatures, among other possibilities.

3) Suppose I have a document that contains <ns:original swa:MediaType="text/plain charset="KOI-8" />.  Perfectly reasonable.  I have a document in Russian (the original), and want to send the translation along, plus the original.  So I base64 encode it.  And indicate the charset, using the charset parameter of the MIME content-type header.  Umm.  How is a processor going to cope with this?  Or, more broadly, with similar sorts of situations; the inclusion of the "text" media type in the content model for swa:MediaType leads to the assumption that enough guidance will be provided to handle this problem (somehow).

Amy!
-- 
Amelia A. Lewis
Architect, TIBCO/Extensibility, Inc.
alewis@tibco.com

Received on Wednesday, 26 March 2003 14:18:56 UTC