W3C home > Mailing lists > Public > xml-dist-app@w3.org > January 2004

Re: Propsed new issue: variability of encoding in Miffy

From: Anish Karmarkar <Anish.Karmarkar@oracle.com>
Date: Wed, 14 Jan 2004 01:55:14 -0800
Message-ID: <40051202.4040503@oracle.com>
To: noah_mendelsohn@us.ibm.com
Cc: Amelia A Lewis <alewis@tibco.com>, Mark Nottingham <mark.nottingham@bea.com>, Martin Gudgin <mgudgin@microsoft.com>, "Xml-Dist-App@W3. Org" <xml-dist-app@w3.org>

Noah,

Thanks for summarizing the two proposals that are being discussed. That 
was helpful.
Comments below.

-Anish
--

noah_mendelsohn@us.ibm.com wrote:

> Reviewing this thread, and remembering our discussion on the phone last 
> week, I think I see two proposals being discussed:
> 
> * [Mark Nottingham et. al.] The original data is binary, represented in 
> the Infoset as xsd:base64Binary lexical.  The most optimized 
> representation in the Miffy multipart is encoded as "binary", but some 
> users of Miffy may not be capable of handling this.  Furthermore, there is 
> asserted a need to convert from one encoding to another based only on the 
> Miffy representation, I.e. with no knowledge of the XML or the Infoset. 
> Accordingly, the proposal is to allow >> those encodings suitable for the 
> representation of binary data<<, and to state that Miffy's that have the 
> same data differing only in the encoding are semantically identical.
> 
> * [Anish Karamarkar, possibly et. al.]  Some binary data is known to be, 
> for example, 7 bit clean at the source.   Consider, for example, a user 
> that created an XML element from a text file known to be 7bit clean.  If 
> one is willing to claim at the xsd typing level that the element is in 
> fact base64binary, then the octet stream comprising the 7 bit text is 
> represented in the Infoset in the 30% larger base64binary lexical form, 
> but per the Miffy spec is promptly converted back to its original octet 
> stream for transmission in Miffy.  I believe the suggestion is that in 
> such situations where you know that the "binary" is in fact 7 bit text, 
> that you should be able to set the encoding accordingly.    I suppose 
> there is also a question of whether all this should require you to claim 
> int he data model that xsd:base64Binary is being used, or whether some 
> other type should be allowed.
> 

There are really two encodings/decodings for the 
content-transfer-encoding: base64Binary and quoted-printable. 7-bit, 
8-bit and binary do not have any encoding. It is just a declaration of 
the kind of data that exists OR as RFC2045 section 6.2 calls it --

    The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
    mean that the identity (i.e. NO) encoding transformation has been
    performed.  As such, they serve simply as indicators of the domain of
    the body data, and provide useful information about the sort of
    encoding that might be needed for transmission in a given transport
    system.

Irrespective of the value of the content-transfer-encoding, at the 
infoset level they are all xsd:base64Binary. I don't think we need any 
other (than xsd:base64Binary) type at the infoset level. As I see it, 
there is no difference at the infoset level.

> My main point here is to suggest that both of these have been discussed, 
> that they are different, and that if so we need to keep straight which 
> proposal we are discussing at any point in time.  Also, each of these has 
> a variation that says:  "Variability allowed by Miffy, but not by the new 
> HTTP binding."
> 

I am not sure there is any advantage to restricting the 
content-transfer-encoding to binary even for the HTTP binding. Consider 
the case where a SOAP message is traversing two hops, the first one over 
HTTP and the send one over SMTP. Now if the SOAP message contains an 
"attachment" that has the content-type: text/plain, if we restrict 
content-transfer-encoding only to binary, one has to encode the 
"attachment" data to base64 before using the SMTP hop. This encoding is 
in fact completely unnecessary, if we allow 7-bit content-transfer-encoding.

> I think both of these go somewhat beyond the original mandate of MTOM.   I 
> can more easily see the rationale for Mark's proposal, but have a fairly 
> strong opinion that if we go there it be for Miffy but not for our HTTP 
> binding.  I really want to maximize interop of the HTTP bindings, and 
> minimized the code that's required to achieve such interop.  Since HTTP is 
> binary-clean in any case, allowing variability there would just require 
> extra code in conforming implementations, more interop testing, etc.  In 
> practice, I would expect interop to be reduced.  So, I would either 
> disallow variability everywhere, or allow choice of binary, not text, 
> representations in Miffy and (a) state in Miffy that each application of 
> Miffy must specify the allowed encodings allowed and (b) allow only binary 
> in the http binding.
> 

May be I am missing something here (and I apologies if I am not getting 
it), but I don't understand what the interop problem is here. Extra code 
for conforming implementation is required only if we allow base64 and 
quoted-printable. Not for 7-bit, 8-bit and binary.

> I guess I don't quite have a comfort level with Anish's proposal, but I 
> may be missing something.  It seems to me that Miffy and MTOM are mostly 
> about binary data, and 7Bit is about text.
> 

At the Infoset level, it is all binary.

-Anish
--
Received on Wednesday, 14 January 2004 04:55:58 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 22:28:13 UTC