[Bug 5546] Reconcile SML-IF with RFC 2557

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5546


Kumar Pandit <kumarp@microsoft.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kumarp@microsoft.com




--- Comment #6 from Kumar Pandit <kumarp@microsoft.com>  2008-06-13 00:05:48 ---
John's email on the same topic:

-------------------------
From: public-sml-request@w3.org [mailto:public-sml-request@w3.org] On Behalf Of
John Arwe
Sent: Wednesday, April 30, 2008 2:37 PM
To: public-sml@w3.org
Subject: Re: RFC 2557 & SML-IF


While I didn't have an action item assigned to me for this (that I am aware of,
at least :-) ), I was able to fit a thorough reading of 2557 into my travel
last week.  It certainly lead me to better understand Henry's comments about
the parallel in function provided, and to ask some questions whose answers
might lead me to form an opinion with some conviction.  Since my opinion, in my
role as chair, matters not a whit, I simply offer them up here for the wider
group.  Today I was able to skim (not thoroughly read) the 5 MIME RFCs 2557
refers to as well, so it is possible some of my questions are answered in them. 
Root must have a media type of text/html: 
Citations: 
[Abstract] This document a) defines the use of a MIME multipart/related
structure to aggregate a text/html root resource and the subsidiary resources
it references 
[1. Introduction] there is no requirement that implementations claiming
conformance to this standard be able to handle any URI linked document
representations other than those whose root is HTML. 
(I assume, based on what I read in 3023, that text/html+xml would be allowed
although 2557 does not explicitly state this).  This implies that in order to
have any hope of using MIME to package a set of related SML documents (i.e. a
model), at least one of them would have to be (X)HTML.  In theory one could
artificially construct a root html document as part of the packaging process,
but this does seem to stretch the process a bit far (in the same way that we
encountered while considering how one might build an HTML ref scheme for <img>
that involved the manufacture of XML proxies for non-XML media types). 
If one were to artificially construct a root xhtml document as part of the
packaging process, and then stipulate that it would be destroyed as part of
receive processing so that this root was not construed as being part of the
interchange model, that seems to me like it would require a new media type to
be registered.  Feels a bit heavyweight, although I have not studied the
registration process. 
Regardless of the approach taken, support for XHTML will require the known
implementations to change substantially. 
Since RFC 2557 requires text/html but not text/html+xml, a conservative reader
would have to assume that HTML (sic, not XHTML) support is required.  The
inability of implementations to rely on off the shelf XML components is likely
to significantly impact the known implementations. 
When Henry raised the XHTML ref scheme issue, I read into that a decision on
his part to specifically say Xhtml rather than HTML.  It is possible he assumed
MIME allowed one to support XHTML not HTML (thus a smaller impact, since all
the docs are still XML), but that is not how I'm reading 2557. 
2557 scoped to email (?) 
Citations: 
[Abstract] In order to transfer a complete HTML multimedia document in a single
e-mail message, it is necessary to: 
[1. Introduction] 
There is an obvious need to be able to send such multi-resource documents in
e-mail [SMTP], [RFC822] messages. 
The standard defined in this document specifies how to aggregate such
multi-resource documents in MIME-formatted [MIME1 to MIME5] messages for
precisely this purpose. 
I'm honestly not sure how narrowly to read this.  While I doubt the SML working
group would say this use case is wholly unreasonable, I have never heard it to
be anyone's focus. 
Read narrowly, it would appear to say that 2557 only applies for the purpose of
email.  That reading seems a bit narrow for me.  I note that I can find no
later normative statements that exclude non-HTML, and the introduction seems to
go out of its way to encourage re-use for other documents.  It does however
appear to qualify that encouragement with "for email" more often than a
generous (re-use oriented) reader might expect.  Later sections appear to
studiously refer to HTML as an examplar, weakening the case that the intended
usage is limited to email.  Overall it's difficult to find persuasive support
for one reading over the other. 
I did check with some IBM web services folks, and they say that MIME as an
underlying format is allowed, e.g. by SOAP over HTTP.  More on that later,
however. 
Existence of a single Root 
Citations: 
[Abstract] This document a) defines the use of a MIME multipart/related
structure to aggregate a text/html root resource and the subsidiary resources
it references 
[1. Introduction] there is no requirement that implementations claiming
conformance to this standard be able to handle any URI linked document
representations other than those whose root is HTML. 
Unless one artificially constructs a root to reference all documents in the
interchange model, this assumption in the MIME RFC is not always met in SML. 
It would be common enough for schema documents to have no explicit URI
references to them, i.e. they are referenced implicitly through namespace URI
matching, yet they are part of the model.  Similar situations exist for unbound
rule documents and instance documents without any references to them in the
interchange set.  In other words, an SML model may have 0..n root documents if
one defines "root" as having explicit outbound URI-based references. 
Limited number of URIs per model document (equivalent to SMLIF alias) 
Citations: 
[4.2 The Content-Location Header] 
A single Content-Location header field is allowed in any message or content
heading, in addition to a Content-ID header 
A Content-Location header can thus be used to label a resource which is not
retrievable by some or all recipients of a message. For example a
Content-Location header may label an object which is only retrievable using
this URI in a restricted domain, such as within a company-internal web space. A
Content-Location header can even contain a fictitious URI. Such an URI need not
be globally unique. 
Multiple Content-Location header fields in the same message heading are not
allowed. 
[RFC 2045] Content-ID values must be generated to be world-unique. 
Aside from the limitation on number (0..2 in MIME, 0..n in SMLIF) I see no
functional difference.  Note, I used a generous reading here to say 0-2 rather
than 0-1.  There might be other dragons hiding in content-id that I did not
find yet. 
The difference in number is troubling, however.  In the domain of IT resource
management, it is not unusual for a single enterprise to have multiple data
repositories, each of which assigns a local name for a resource.  In order to
address problems of the sort that the CMDB-Federation work now in DMTF states
its intent to address, while preserving digital signatures (i.e. without
forcing updates to the component subsets of data to all use a single URI to
refer to the named resource), it must be possible to associate resource
instances with more than one URI.  Since SML was originally targeted at solving
problems in this domain, it seems likely that this would be an issue for
potential adopters. 
Note that this same consideration, to avoid rewriting of legacy source text,
influenced 2557 itself. 
[1. Introduction] The reason why this standard does not only recommend the use
of Content-ID-s is that it should be possible to forward existing web pages via
e-mail without having to rewrite the source text of the web pages. Such
rewriting has several disadvantages, one of them that security checksums will
probably be invalidated. 

URI reference scope limitations 
Citations: 
[7. Use of the Content-Type "multipart/related"] 
If a message contains one or more MIME body parts containing URIs and also
contains as separate body parts, resources, to which these URIs (as defined,
for example, in HTML 2.0 [HTML2]) refer, then this whole set of body parts
(referring body parts and referred-to body parts) SHOULD be sent within a
multipart/related structure as defined in [REL]. 
Even though headers can occur in a message that lacks an associated
multipart/related structure, this standard only covers their use for resolution
of URIs between body parts inside a multipart/related structure. This standard
does cover ... [Arwe paraphrasing now, for brevity] ... URIs referring to other
resources 
To me, the first cited text suggests (SHOULD) that an SML-IF document would act
as the root (note: this does conflict with 2557's requirement that the root
have media type text/html), i.e. the entire interchange model would correspond
to a single multipart/related container.  Given Henry's initial response to our
original response that using 2557 would make certain cross-document references
impossible, I suspect he was reading it similarly. 
The issue then becomes how to impose the requirements on references required by
SMLIF today (we think for good reason), but not imposed by the model of "1
SMLIF doc == 1 multipart/related container", as Kumar pointed out. 
CID URL Scheme support 
Citations: 
[8.3 Use of the Content-ID header and CID URLs] 
When URIs employing a CID (Content-ID) scheme as defined in [URL] and [MIDCID]
are used to reference other body parts in an MHTML multipart/related structure,
they MUST only be matched against Content-ID header values, and not against
content-Location header with CID: values. 
I have not tried to track down all the dependencies, but it seems likely to me
that somewhere we would end up being required to support the CID URL scheme
during URI resolution.  Another potential impact to known implementations. 
SML-IF in web services 
As mentioned earlier, I looked into the applicability of MIME for web services
exchanges.  I was told that it is allowed, provided that both the service
provider and the Web service client support it.  I was also told that none of
the protocol bindings currently in wide use require it, so in practice MIME is
not a common format. 
I think it likely that one or more of the anticipated uses of SML involves the
transfer of models as part of Web service message exchanges.  In that context,
if we revert to MIME then we would appear to be doing one or both of two
things: 
1. Implicitly adding a requirement for MIME support in the Web services stack
(when one wishes to transfer SML models) and in Web service clients (...same). 
This seems unpalatable for the stack providers, and destined for failure on the
client end which is much more distributed.  At best it will slow SML adoption
in WS as the subset of providers (of both stack and client code) roll out
upgrades/updates through all the existing deployed base components. 
2. Encouraging the WS users to define their own ME-specific SML model
encapsulation syntax when MIME is not considered a practical option. 
I'm pretty sure the context specificity makes this a bad idea.  It works
_against_ wide interoperability, if anything.  I know of one case already where
a standard now in DMTF did not use SMLIF due to timing and a desire to not have
dependencies on a spec "far away [organizationally]", and a second in OASIS
with SMLIF currently on their agenda to assess for solving existing problems. 
In both cases, reasonably complete technical assessments have not identified
anything they needed beyond what SMLIF does. 
That's the list of issues that caught my eye reading through 2557.  I also saw
a number of places where we could probably benefit from re-using some text.  I
tried to do a mental comparison for the set of issues each was addressing to
look for gaps. 
Moving on to the set of issues Kumar raised: 
I suspect we will have a more useful and congenial discourse with Henry (who is
not, remember, an expert in our spec content) if we supply at least one example
for each case where we say "2557 cannot do X".  "We cannot see how to get
behavior X using 2557's facilities, did you have a solution in mind for this
you can share?" might be a more cordial way to phrase those.  This motif seems
to apply to (1) the limitations on scope of reference targets (in our parlance)
vs the multipart/related structure assumed (2) >1 (or >2) aliases per document
(3) how to handle refs targeting schema docs resolve differently than "normal"
refs.  People sometimes infer an agenda behind phrases like "but it has
problems of its own" if they are not resolved, whether an agenda actually
exists or not. 
The base URI issue Kumar alluded to doesn't come to mind.  I looked at it as:
SMLIF requires one whenever it would be needed (to absolutize a relative ref),
it seems like any usage of MIME for SMLIF's function would impose the same
requirement; as a consequence, one would never appeal to the implicit
"thismessage:/" base URI.  Admittedly I did not wrestle with this very much, so
there might well be a subtlety that gave me the slip (failing that, I blame
whatever virus has me home all this week so far). 
The requirements for XML Schema could easily be met with a "so what?" response,
potentially at two levels.   
(1) While it is true that a "MIME not SMLIF" format document cannot be
schema-assessed, "so what?  you can still assess the model documents..." 
(2) While it is true that a "MIME not SMLIF" format extension cannot be
schema-assessed, "so what?  you can still parse it..." 
To a skeptic, using the "cannot be assessed against a Schema" argument on its
own is unlikely to persuade.  This might be a relatively simple scoping answer.
 Just as 2557 concerned itself first and foremost with email transfer of sets
of documents, other use cases free to pile on if they happen to work but not
otherwise considered, we might likewise either become scoped (if not already,
and I'd be skeptical to hear we missed this scoping in the charter) or else
point to (existing text) stating that some constrained set of use cases, e.g.
programmatic exchange of SML-based models, is our first priority ... other use
cases free to pile on if they happen to work but not otherwise considered.  We
might even explicitly say something like "for other use cases, e.g. the
exchange of SML models via email, other existing specifications like RFC 2557
may be used as appropriate".  It might be perfectly appropriate in a context
like email exchange to make _different_ decisions than we made in SMLIF (eg one
could rewrite all sml refs to use a single alias, perhaps by using CID URLs). 
One could also raise the (appropriate, I think) topic of implementation cost -
certainly a big part of the reason we all chose to sit SML squarely on an XML
base is the huge amount of existing componentry we can all leverage. 

Best Regards, John


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 13 June 2008 00:06:22 UTC