Re: RFC 2557 & SML-IF from John Arwe on 2008-04-30 (public-sml@w3.org from April 2008)

From: John Arwe <johnarwe@us.ibm.com>
Date: Wed, 30 Apr 2008 17:36:31 -0400
To: "public-sml@w3.org" <public-sml@w3.org>
Message-ID: <OF6584A10D.50D07241-ON8525743B.0065D655-8525743B.0076B39D@us.ibm.com>
While I didn't have an action item assigned to me for this (that I am 
aware of, at least :-) ), I was able to fit a thorough reading of 2557 
into my travel last week.  It certainly lead me to better understand 
Henry's comments about the parallel in function provided, and to ask some 
questions whose answers might lead me to form an opinion with some 
conviction.  Since my opinion, in my role as chair, matters not a whit, I 
simply offer them up here for the wider group.  Today I was able to skim 
(not thoroughly read) the 5 MIME RFCs 2557 refers to as well, so it is 
possible some of my questions are answered in them.
Root must have a media type of text/html:
Citations:
[Abstract] This document a) defines the use of a MIME multipart/related 
structure to aggregate a text/html root resource and the subsidiary 
resources it references
[1. Introduction] there is no requirement that implementations claiming 
conformance to this standard be able to handle any URI linked document 
representations other than those whose root is HTML.
(I assume, based on what I read in 3023, that text/html+xml would be 
allowed although 2557 does not explicitly state this).  This implies that 
in order to have any hope of using MIME to package a set of related SML 
documents (i.e. a model), at least one of them would have to be (X)HTML. 
In theory one could artificially construct a root html document as part of 
the packaging process, but this does seem to stretch the process a bit far 
(in the same way that we encountered while considering how one might build 
an HTML ref scheme for <img> that involved the manufacture of XML proxies 
for non-XML media types).
If one were to artificially construct a root xhtml document as part of the 
packaging process, and then stipulate that it would be destroyed as part 
of receive processing so that this root was not construed as being part of 
the interchange model, that seems to me like it would require a new media 
type to be registered.  Feels a bit heavyweight, although I have not 
studied the registration process.
Regardless of the approach taken, support for XHTML will require the known 
implementations to change substantially.
Since RFC 2557 requires text/html but not text/html+xml, a conservative 
reader would have to assume that HTML (sic, not XHTML) support is 
required.  The inability of implementations to rely on off the shelf XML 
components is likely to significantly impact the known implementations.
When Henry raised the XHTML ref scheme issue, I read into that a decision 
on his part to specifically say Xhtml rather than HTML.  It is possible he 
assumed MIME allowed one to support XHTML not HTML (thus a smaller impact, 
since all the docs are still XML), but that is not how I'm reading 2557.
2557 scoped to email (?)
Citations:
[Abstract] In order to transfer a complete HTML multimedia document in a 
single e-mail message, it is necessary to:
[1. Introduction] 
There is an obvious need to be able to send such multi-resource documents 
in e-mail [SMTP], [RFC822] messages. 
The standard defined in this document specifies how to aggregate such 
multi-resource documents in MIME-formatted [MIME1 to MIME5] messages for 
precisely this purpose.
I'm honestly not sure how narrowly to read this.  While I doubt the SML 
working group would say this use case is wholly unreasonable, I have never 
heard it to be anyone's focus.
Read narrowly, it would appear to say that 2557 only applies for the 
purpose of email.  That reading seems a bit narrow for me.  I note that I 
can find no later normative statements that exclude non-HTML, and the 
introduction seems to go out of its way to encourage re-use for other 
documents.  It does however appear to qualify that encouragement with "for 
email" more often than a generous (re-use oriented) reader might expect. 
Later sections appear to studiously refer to HTML as an examplar, 
weakening the case that the intended usage is limited to email.  Overall 
it's difficult to find persuasive support for one reading over the other.
I did check with some IBM web services folks, and they say that MIME as an 
underlying format is allowed, e.g. by SOAP over HTTP.  More on that later, 
however.
Existence of a single Root 
Citations:
[Abstract] This document a) defines the use of a MIME multipart/related 
structure to aggregate a text/html root resource and the subsidiary 
resources it references
[1. Introduction] there is no requirement that implementations claiming 
conformance to this standard be able to handle any URI linked document 
representations other than those whose root is HTML.
Unless one artificially constructs a root to reference all documents in 
the interchange model, this assumption in the MIME RFC is not always met 
in SML.  It would be common enough for schema documents to have no 
explicit URI references to them, i.e. they are referenced implicitly 
through namespace URI matching, yet they are part of the model.  Similar 
situations exist for unbound rule documents and instance documents without 
any references to them in the interchange set.  In other words, an SML 
model may have 0..n root documents if one defines "root" as having 
explicit outbound URI-based references.
Limited number of URIs per model document (equivalent to SMLIF alias)
Citations:
[4.2 The Content-Location Header] 
A single Content-Location header field is allowed in any message or 
content heading, in addition to a Content-ID header
A Content-Location header can thus be used to label a resource which is 
not retrievable by some or all recipients of a message. For example a 
Content-Location header may label an object which is only retrievable 
using this URI in a restricted domain, such as within a company-internal 
web space. A Content-Location header can even contain a fictitious URI. 
Such an URI need not be globally unique. 
Multiple Content-Location header fields in the same message heading are 
not allowed.
[RFC 2045] Content-ID values must be generated to be world-unique.
Aside from the limitation on number (0..2 in MIME, 0..n in SMLIF) I see no 
functional difference.  Note, I used a generous reading here to say 0-2 
rather than 0-1.  There might be other dragons hiding in content-id that I 
did not find yet.
The difference in number is troubling, however.  In the domain of IT 
resource management, it is not unusual for a single enterprise to have 
multiple data repositories, each of which assigns a local name for a 
resource.  In order to address problems of the sort that the 
CMDB-Federation work now in DMTF states its intent to address, while 
preserving digital signatures (i.e. without forcing updates to the 
component subsets of data to all use a single URI to refer to the named 
resource), it must be possible to associate resource instances with more 
than one URI.  Since SML was originally targeted at solving problems in 
this domain, it seems likely that this would be an issue for potential 
adopters.
Note that this same consideration, to avoid rewriting of legacy source 
text, influenced 2557 itself.
[1. Introduction] The reason why this standard does not only recommend the 
use of Content-ID-s is that it should be possible to forward existing web 
pages via e-mail without having to rewrite the source text of the web 
pages. Such rewriting has several disadvantages, one of them that security 
checksums will probably be invalidated.

URI reference scope limitations
Citations:
[7. Use of the Content-Type "multipart/related"] 
If a message contains one or more MIME body parts containing URIs and also 
contains as separate body parts, resources, to which these URIs (as 
defined, for example, in HTML 2.0 [HTML2]) refer, then this whole set of 
body parts (referring body parts and referred-to body parts) SHOULD be 
sent within a multipart/related structure as defined in [REL].
Even though headers can occur in a message that lacks an associated 
multipart/related structure, this standard only covers their use for 
resolution of URIs between body parts inside a multipart/related 
structure. This standard does cover ... [Arwe paraphrasing now, for 
brevity] ... URIs referring to other resources
To me, the first cited text suggests (SHOULD) that an SML-IF document 
would act as the root (note: this does conflict with 2557's requirement 
that the root have media type text/html), i.e. the entire interchange 
model would correspond to a single multipart/related container.  Given 
Henry's initial response to our original response that using 2557 would 
make certain cross-document references impossible, I suspect he was 
reading it similarly.
The issue then becomes how to impose the requirements on references 
required by SMLIF today (we think for good reason), but not imposed by the 
model of "1 SMLIF doc == 1 multipart/related container", as Kumar pointed 
out.
CID URL Scheme support
Citations:
[8.3 Use of the Content-ID header and CID URLs] 
When URIs employing a CID (Content-ID) scheme as defined in [URL] and 
[MIDCID] are used to reference other body parts in an MHTML 
multipart/related structure, they MUST only be matched against Content-ID 
header values, and not against content-Location header with CID: values.
I have not tried to track down all the dependencies, but it seems likely 
to me that somewhere we would end up being required to support the CID URL 
scheme during URI resolution.  Another potential impact to known 
implementations.
SML-IF in web services
As mentioned earlier, I looked into the applicability of MIME for web 
services exchanges.  I was told that it is allowed, provided that both the 
service provider and the Web service client support it.  I was also told 
that none of the protocol bindings currently in wide use require it, so in 
practice MIME is not a common format.
I think it likely that one or more of the anticipated uses of SML involves 
the transfer of models as part of Web service message exchanges.  In that 
context, if we revert to MIME then we would appear to be doing one or both 
of two things:
1. Implicitly adding a requirement for MIME support in the Web services 
stack (when one wishes to transfer SML models) and in Web service clients 
(...same).
This seems unpalatable for the stack providers, and destined for failure 
on the client end which is much more distributed.  At best it will slow 
SML adoption in WS as the subset of providers (of both stack and client 
code) roll out upgrades/updates through all the existing deployed base 
components.
2. Encouraging the WS users to define their own ME-specific SML model 
encapsulation syntax when MIME is not considered a practical option.
I'm pretty sure the context specificity makes this a bad idea.  It works 
_against_ wide interoperability, if anything.  I know of one case already 
where a standard now in DMTF did not use SMLIF due to timing and a desire 
to not have dependencies on a spec "far away [organizationally]", and a 
second in OASIS with SMLIF currently on their agenda to assess for solving 
existing problems.  In both cases, reasonably complete technical 
assessments have not identified anything they needed beyond what SMLIF 
does.
That's the list of issues that caught my eye reading through 2557.  I also 
saw a number of places where we could probably benefit from re-using some 
text.  I tried to do a mental comparison for the set of issues each was 
addressing to look for gaps.
Moving on to the set of issues Kumar raised:
I suspect we will have a more useful and congenial discourse with Henry 
(who is not, remember, an expert in our spec content) if we supply at 
least one example for each case where we say "2557 cannot do X".  "We 
cannot see how to get behavior X using 2557's facilities, did you have a 
solution in mind for this you can share?" might be a more cordial way to 
phrase those.  This motif seems to apply to (1) the limitations on scope 
of reference targets (in our parlance) vs the multipart/related structure 
assumed (2) >1 (or >2) aliases per document (3) how to handle refs 
targeting schema docs resolve differently than "normal" refs.  People 
sometimes infer an agenda behind phrases like "but it has problems of its 
own" if they are not resolved, whether an agenda actually exists or not.
The base URI issue Kumar alluded to doesn't come to mind.  I looked at it 
as: SMLIF requires one whenever it would be needed (to absolutize a 
relative ref), it seems like any usage of MIME for SMLIF's function would 
impose the same requirement; as a consequence, one would never appeal to 
the implicit "thismessage:/" base URI.  Admittedly I did not wrestle with 
this very much, so there might well be a subtlety that gave me the slip 
(failing that, I blame whatever virus has me home all this week so far).
The requirements for XML Schema could easily be met with a "so what?" 
response, potentially at two levels. 
(1) While it is true that a "MIME not SMLIF" format document cannot be 
schema-assessed, "so what?  you can still assess the model documents..."
(2) While it is true that a "MIME not SMLIF" format extension cannot be 
schema-assessed, "so what?  you can still parse it..."
To a skeptic, using the "cannot be assessed against a Schema" argument on 
its own is unlikely to persuade.  This might be a relatively simple 
scoping answer.  Just as 2557 concerned itself first and foremost with 
email transfer of sets of documents, other use cases free to pile on if 
they happen to work but not otherwise considered, we might likewise either 
become scoped (if not already, and I'd be skeptical to hear we missed this 
scoping in the charter) or else point to (existing text) stating that some 
constrained set of use cases, e.g. programmatic exchange of SML-based 
models, is our first priority ... other use cases free to pile on if they 
happen to work but not otherwise considered.  We might even explicitly say 
something like "for other use cases, e.g. the exchange of SML models via 
email, other existing specifications like RFC 2557 may be used as 
appropriate".  It might be perfectly appropriate in a context like email 
exchange to make _different_ decisions than we made in SMLIF (eg one could 
rewrite all sml refs to use a single alias, perhaps by using CID URLs). 
One could also raise the (appropriate, I think) topic of implementation 
cost - certainly a big part of the reason we all chose to sit SML squarely 
on an XML base is the huge amount of existing componentry we can all 
leverage.

Best Regards, John

Street address: 2455 South Road, P328 Poughkeepsie, NY USA 12601
Voice: 1+845-435-9470      Fax: 1+845-432-9787
Received on Wednesday, 30 April 2008 21:37:17 UTC