RE: Allow EXI as characterization for XML in the JMS body ? from Amy Lewis on 2010-11-11 (public-soap-jms@w3.org from November 2010)

From: Amy Lewis <alewis@tibco.com>
Date: Wed, 10 Nov 2010 21:07:31 -0800
To: "Jean-Baptiste Bugeaud" <bugeaud@gmail.com>
Cc: <public-soap-jms@w3.org>
Message-ID: <C04A623342E01549B395A84F4D61FDBA01E16C64@NA-PA-VBE03.na.tibco.com>
Okay, just one note.

The "Accept-"* headers are used for conneg.  They're complex, and the kindest characterization is that they are an advanced topic and an advanced implementation technique.  A less kind characterization ... well, it has no place here.  "Negotiation" here doesn't mean back-and-forth.  The "Accept-*" headers are "conneg" headers.  They are too complex, and we do not want that level of complexity at this stage of the specification.  No.

It's sufficient, for us, to fail predictably if someone uses an encoding that isn't supported.

There are clear-enough standards for mime type; we specify them.  We can state that we use HTTP pseudo-mime Content-Encoding and Transfer-Encoding rather than MIME Content-Transfer-Encoding, without saying anything further on that subject.  We do *not* want to redefine this stuff.  At most, we point at someone else's definition.

For reference: I'm going to attempt to veto any proposal that includes Accept-* headers.  They are complex, and poorly implemented in the world which invented them.  They're for conneg.  Just, *no*.

I'm in favor of mentioning the Content-Encoding header.  I'm not sure about the Transfer-Encoding header.  We should probably clarify that, like HTTP, we are defining something that is *specifiically* not MIME-compliant (for the same reason that HTTP does in 19.4.1 first sentence of RFC 2616: 8-bit clean transport), and so Content-Transfer-Encoding is prohibited.  Whether we need to say something like HTTP's "existence of a MIME-Version header indicates that this was created by and converted from a MIME-compliant protocol" isn't clear to me, so I'd say that we shouldn't.

Sorry.  I have strong opinions on some of these things, because I spent too long in the trenches (almost twenty years ago, now) watching it develop.  For SOAP over JMS (which is an API, not an internet message format driven protocol), some things are going to prove much more trouble than they are worth.

Further note: we now support both BytesMessage and TextMessage.  We should clarify that Content-Encoding (and Transfer-Encoding, if supported or mentioned) should *not* appear for TextMessage (TextMessage is implicitly readable as a java.lang.String; doing any byte-oriented funkiness in UTF-16 is ... simply egregious).  EXI and gzip are rational for BytesMessage body content; they are *not* rational for TextMessage (and no other message subtype is supported, at present).

Amy!
-- 
Amelia A. Lewis
Senior Architect, TIBCO/Extensibility, Inc.
alewis@tibco.com



-----Original Message-----
From: Jean-Baptiste Bugeaud [mailto:bugeaud@gmail.com]
Sent: Wed 10-Nov-10 7:03 PM
To: Amy Lewis
Cc: public-soap-jms@w3.org
Subject: Re: Allow EXI as characterization for XML in the JMS body ?
 
Hello Amy, George, Yves & al,

Thanks to Yves for having answered the questions from Eric on EXI.

> This appears to have quite a bit of HTTP content negotiation built into
> it.

Not so much, because although the proposition is based on some HTTP
defined principles (aka content coding).
The proposition I have made is not based on any negotiation mechanism
but on a more straightforward declarative mechanism that is more
suited for JMS implementation.

In HTTP, the client make a first request to indicate its encoding
availability and the server with use this to optimize the entity
encoding of the response.
In the accept encodoing  proposition the scenario differ sligtly, the
WSDL of the service that includes the accept list can either already
exist at the client side (bundled as part of the application) or be
fetched with a technique outside the scope of the proposition. In both
cases, when performing the "call", the client can ignore those
information or use it to perform some smart task. It is up to the
client implementer to choose.

>> At this time, there are things that prevent a content encoding (such
>> as EXI) from beeing used :
>>  - no way to store the exact content encoding used
>
> In fact, JMS headers are a clear extensibility point, as are MIME
> headers in the content of the message.

Sure, but there is no standard for mimetype or content encoding. Thus
any specification using JMS has to specify it.

>
>>  - mandatory alignment of octet and content type
>
> Not too clear on what that means.

Typically, §2.4 of the current spec "The bytes or characters of the
JMS Message payload correspond to the MIME format as indicated by the
definition of the contentType property". If you use contentType
property to store the real mimetype of the content outside encoding
(says text/xml even when using EXI), this sentense will borbid it. If
you store in the content Type the contentEncoding, then you will loose
one information that is the real mimeType and will be missing one
property.

>> ===========================
>>
>> Addendum to section 2.2.1 :
>>
>>
>> [Definition: soapjms:acceptEncoding] (list of xsd:string)
>>       * Identifies the list of accepted values for content encoding that
>> can be set using soapjms:contentEncoding.
>>       * [Definition: Each values indicated as accept content encoding MUST
>> be supported by the target destination implementation.?]
>>       * [Definition: A caller SHOULD only use Each values indicated as
>> accept content encoding MUST be supported by the target.?]
>
> I think all of this is in the realm of content negotiation, and not
> necessary.

I also don't think a real negotiation with back-and-forth, is required.

This is only an indication for a caller, the caller is free to do
whatever he wants with that data. But if given, the caller is warned
that the callee will support this. It is usefull for tracing and
checking alignment between service specification (the service
contract) and the effective runtime (application implementation,
SOAPJMS implementation, etc).
Implementers will be free to add extra checks or custom properties
such as on client side (downgrade strategy, preference strategy, etc).

> Supply ContentEncodingNotSupported, and if someone sends the wrong
> thing, you shrug, send an error, and you're done.  Supported content
> encodings might be indicated in the WSDL; if there's not already a
> pattern for that, then other folks don't consider it necessary either.
>
>> Addendum to section 2.2.3 :
>>
>> [Definition: soapjms:contentEncoding] (xsd:string)
>>       * Identifies the transformation that has been applied to the message
>> payload body.
>
> No.  At least, I don't think so.
>
> As I understand EXI, it's part of support for MTOM and such.  It's the
> attachments that are being encoded.  Yes/No?  If yes, then there will
> never be a JMS-layer header indicating encoding.
>
> Now, if EXI can be used for a non-composite SOAP message itself (that
> is, a SOAP message that is not part of a MIME multipart message), then
> perhaps we need this.

If the content encdoing is GZip, which can be a good practice in some
scenario, if you don't have a contentEncoding you will loose either
the real mime type (XML or MTOM say) or the fact you have compressed
it with GZip. You definitively need another property to endicate the
content encoding. This is the reason HTTP 1.1 editors have done it so.

Actually, MTOM is not directly tied with EXI. EXI is tied with XML.
MTOM solve situation where you have reasonable size XML (outside its
binary elements) and at least a binary content encoding that is of a
big size.
EXI solve the performance issue when you have big XML that containts
or not a big binary item embeded.

In this perspective EXI or other content encoding can bring new
solution that will be definitively helpfull.

>>         * [Definition: If the content encoding is specified, it is
>> checked to ensure that it matches the content encoding values
>> supported. A fault MUST be generated with subcode
>> contentEncodingNotSupported if the encoding values do not match.?]
>
> Errrr.  If the Content Encoding contains an unrecognized or unsupported
> value, the client or server should generate a fault with subcode
> ContentEncodingNotSupported.

Yes :)

>>       * [Definition: If no content encoding property is set or no value is
>> set, the property MUST be assumed as "identity".?]
>
> Identity is the only thing that our specification is likely to define.

Yes we should clearly indicate that "identity" means no transformation
(content encodign) has been perform on the message content (body)
whatsoever.

>
>>       * [Definition: If soapjms:acceptEncoding was set, the contentEncoding
>> value SHOULD be set to any of those value.?]
>
> Well, no.  It MUST be set to a value corresponding to the encoding
> used, about which we are not going to say anything.  I hope.
>

No, such a matching has to be done on the message and the
contentEncoding. The acceptEncoding is only there as a helper for the
caller to know the supported list of values.

I was thinking "SHOULD" was suited, but maybe it is a bit strong ...

>> Update to section 2.4 :
>>
>> change
>> "The bytes or characters of the JMS Message payload correspond to the
>> MIME format as indicated by the definition of the contentType
>> property"
>> with
>> "The bytes or characters of the JMS Message payload correspond to the
>> MIME format as indicated by the definition of the contentType property
>> and the contentEncoding property".
>
> and if defined, the contentEncoding property.
>

Correct.

>> Alter of 2.4.1 :
>>  a new point in the list of consideration for TextMessage :
>>       - Messages using the SOAP JMS content encoding will need to use
>> Content-Transfer-Encoding for attachment parts.
>
> I hope this is a typographic error?  You mean Content-Encoding, not
> Content-Transfer-Encoding, correct?  They're very different things;
<...>
> Content-Transfer-Encoding in any HTTP-consistent pseudo-MIME supporting
> protocol (it's forbidden to use in HTTP), but can use Content-Encoding.
>
> </network-protocol-geek>
>

This is definitively a bad & ungly typo. Oops ... Glad you did not
fall into its trap.
It was obviously contentEncoding ;-)

>> Addendum to section 2.8 :
>>
>>  Add of :
>>  - contentEncodingNotSupported
>>  - contentEncodingMismatch
>
> Mmmm.  With minimal explanation I think.  Vendors MUST support the
> identity encoding; others will go without mention (my preference).
>
>> Addendum to section 3.4 :
>>  Add the element acceptEncoding in the list.
>
> Let's not go there.  We don't need content negotiation; in the first
> iteration of SOAP/JMS, a clear error pattern is perfectly adequate.

Again, it is simply an indication to detect mismatch. If provided you
might used the indication, or not. It is up to implementers to go
further with such an indication and perform on client side a
negotiation based on whatever rules. Whatever is implemented :
ignoring it, using it as-is, applying rules (negotiation like) ... it
will stay interroperable.

I really think this accept mechanism worth it because it ease ease the
job of integration and production team.

>> Add of a new section  :
>>  X.X Content Encoding
>>  Content coding values indicate an encoding transformation that has
>> been or can be applied to the JMS message body content.
>>  Content codings are primarily used to allow a message body to be
>> compressed or otherwise usefully transformed without losing the
>> identity of its underlying media type and without loss of information.
>>
>>  All content-coding values are case-sensitive.
>>
>>  The Internet Assigned Numbers Authority (IANA) acts as a registry for
>> content encoding value tokens. Initially the list of valid values is
>> taken from the HTTP 1.1 Content Coding values (see
>>
> http://www.iana.org/assignments/http-parameters/http-parameters.xml#http-parameters-1
>> ).
>>
>>  New content-coding value tokens SHOULD be registered to allow
>> interoperability between clients and servers, specifications of the
>> content coding algorithms needed to implement a new value SHOULD be
>> publicly available and adequate for independent implementation, and
>> conform to the purpose of content coding defined in this section.
>>
>>  An implementation SHOULD support gzip or (and ?) exi content encoding.
>
> I think all of this is superfluous--and tending to lead to a need to
> add more and more references.  Maybe just the pointer to the IANA
> registry, and the requirement to support identity?

Well, we can link the HTTP header entry at IANA but doing so it means
that any entry there will be valid for SOAPJMS as well.

I do not find a reason the two should diverge, so this point realy
need to discussed between by all the editors.

> Note: throughout the above, I speak for myself (and for my employer as
> its representative on this working group), not for the working group as
> a whole.
>
> Summarizing and rephrasing my responses: Thank you for the detailed
> suggestions; that was very helpful.  I believe that they go too far in
> some directions; we should require no more than identity support, and I
> do not believe that we need content negotiation.  I'm not clear whether
> we need a JMS Header defined, or if we need worry about
> Content-Encoding only for the components of a multi-part message.
>

Thanks Amy for this interresting feedback.

Regards,
JB
Received on Thursday, 11 November 2010 05:08:08 UTC