Re: Update to MIFFY specification

At 22:40 28/11/03 -0800, you wrote:
>Please find attached an update to the MIFFY specification, as requested. 
>Specifically, the following sections have been adapted from the MTOM 
>specification: 1.3 A A.1 A.2.

Hi Mark,

I took a quick look at this.  It looks pretty good to me.  Some passing 
comments.

[[
This specification uses a number of namespace prefixes throughout; they are 
listed below. Note that the choice of any namespace prefix is arbitrary and 
not semantically significant (see XML Infoset [XML InfoSet]).

     * xbinc - [TBD]
     * mime - [TBD]
]]

I note that we are close to having the means to register a urn:ietf: 
sub-namespace for MIME header fields ... but you know that, of course!

[[
Unlike the Infoset, the XQuery 1.0 and XPath 2.0 Data Model ([XML Query 
Data Model] ... hereinafter referred to as the "data model") provides a 
model that carries type and value space information for each element and 
attribute. Accordingly, MIFFY is expressed in terms of that data model. A 
precondition for use of this format is therefore availability of a data 
model for the structure to be serialized. Details of the correspondence 
between Infosets and data models are provided in A. Mapping between 
Infosets and Data Models. The data model introduces accessors such as 
dm:string-value, dm:type and dm:typed-value, which are used in this 
specification.
]]

I found this was difficult to follow, appearing as it does early in the 
document.  I think you're saying that it's important to know exactly what 
data is contained in the XML infoset that is to be optimized?

It might be helpful to include a *simple* example up-front, to illustrate 
the parts that are subsequently described.

[[
2.1 MIME Multipart packaging

MIFFY Documents MUST be valid MIME Multipart/Related documents, as 
specified by [rfc2387]. Ordering of MIME parts MUST NOT be considered 
significant to MIFFY processing or to the construction of the Target Infoset.
]]

How do you determine the root element?  IIRC, multipart/related convention 
is that the *first* sub-part is the root. (though I think there may also a 
mechanism -- I forget what -- for identifying some other).

[Later:  it's the multipart/related start parameter -- do you require its use?]

[[
# Transform the replaced characters into binary data by processing them as 
base64-encoded data.
]]
-- (sect 3.1)

This is unclear to me.  Is it the case that any optimization target is 
base64 encoded, and that this step is to remove that encoding?

[[
# Otherwise, the MIME part's Content-Location header field MUST have a 
field-value identical to the URI in the value of the href attribute 
information item.
]]
-- (sect 3.1)

I'm slightly uneasy about this use of Content-location.  Maybe it's OK, but 
I'd suggest checking.  My concern is that the referencing location in the 
root infoset may have no way to distinguish between inclusion of the body 
part contained within the multipart/related and the entity that is obtained 
by dereferencing the URI on the web.

[[
4. Selecting Optimization Candidates

Optimization in MIFFY is limited to the content of those element 
information items which contain characters that can be interpreted as 
base64-encoded data. Attributes and non-base64-compatible character data 
cannot be successfully optimized by MIFFY.
]]

Nit:  "successfully" here is redundant.  I suggest dropping it.

(I think this answers an earlier question;  I suggest mentioning this 
restriction sooner.)

[[
5. Identifying MIFFY Documents

[ TBD, depending on media type feedback ]
]]

You asked me about this.  Now that I see the context, here's a suggestion:

Define a new media type that is applied to the multipart-related root 
element type, say application/miffy+xml, with a parameter that specifies 
the original root element content type.  Default is application/xml.

Thus:

    Content-type: application/miffy+xml;orig-type=application/xml ...

being eqivalent to:

    Content-type: application/miffy+xml ...

Possibly, the multipart/related start-info parameter might be used to 
convey the original content type:

    Content-type: multipart/related
                  ;type=application/miffy+xml
                  ;start="<foo@bar>"
                  ;start-info="orig-type=application/xml"


[[
A.2 Deserialization Infoset Mapping

... the goal of this feature, which is to use type information as a means 
of optimization, without affecting application semantics.
]]

I think it would be good to mention this up-front.

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

Received on Monday, 1 December 2003 05:08:18 UTC