Re: Subset Data Model

On Mon, Aug 13, 2012 at 6:53 PM, James Clark <jjc@jclark.com> wrote:

> On Tue, Aug 14, 2012 at 6:34 AM, David Carlisle <davidc@nag.co.uk> wrote:
>
>> On 14/08/2012 00:08, Uche Ogbuji wrote:
>>
>>> David, I don't think reductio ad absurdum works here, as John
>>> indicated before.
>>>
>>> To follow the example you suggest, XML 1.0 did not even specify the
>>> order in which elements should be reported by a parser.
>>>
>>
>> Yes and that was a bad thing. The spec is short but it is only really
>> understandable (or you might say consistently implementable) if you
>> assume rather a lot of inherited SGML folk law about how things are
>> supposed to work. That is not a good precedent to follow.
>
>
> I strongly agree with David on this. (I remember many, many years ago when
> I first tried to understand SGML by reading the SGML Handbook, it all made
> very little sense to me until I came across the description of ESIS.)  I
> see the ultra simple data model as absolutely central to the point of
> MicroXML.
>
> It's also critical that the spec describes not just the syntax and the
> data model but how the parser is supposed to construct the latter from the
> former.
>

Yes, I agree with the above. My point in bringing up XML 1.0's lack of data
model was not to advocate a lack of data model in MicroXML.  Here's how I
probably over-simplify the winding road that led here:

* David Lee advocated that MicroXML should not only be syntactically
backward compatible with XML, but that the MicroXML data model should be
backward compatible with XDM.

* I agreed on syntactic backward compat, but not on a *prior constraint* of
backward compat with XDM.  I said that MicroXML should come with a data
model which sensible and practical, but not strictly derived from any other.

* David Carlisle (I think was saying) that's bad because you can then do
absurd things.

* John Cowan simply said "We won't do absurd things."

* I possibly complicated the issue because it sounded to me as if there was
some sort of argument that if we did not decide on backward compat with XDM
that we'd be led into doing absurd things.

* I pointed out that XPath 1.0 in theory could have done absurd things as
well, and chose not to.

So bottom line is that I think we all agree that MicroXML should have a
data model, and not leave a hole the same size XML 1.0 did.  After all, I
have a sneaky suspicion that if XML 1.0 had not left such a hole there
would have been less maniacal activity to come up with a baker's dozen of
XML models (we haven't even yet mentioned SOAP or DOM).

I'm just arguing that there should be no prior constraint of backward
compat between MicroXML and XDM, or any other such data model.

What I do think we should do is start with something like XPath 1.0 and
simplify in some ways and touch up in others (e.g. re: HTML5 & JSON).  That
is the path I've seen reflected in all draft specs to date, which is good,
from my perspective.



> Given the loose way in which various data models are tied to the XML
>> syntax I actually suspect it would be rather hard to formally specify
>> how any micro-xml data model relates to XML.
>>
>
> Yes, but I think we should try even if it's too ugly and complex to put in
> the spec.
>
> I think we start off by identifying a profile of the XML Infoset ie
> identifying the information items and properties that we care about.  This
> (combined with the various XML specs) gives us a mapping, which I will call
> S_X, from strings that conform to XML 1.0 + XML Namespaces into a data
> structure consisting of just those information items and properties. Then
> we define a two-way mapping between that data structure and the MicroXML
> data model.
>

Yes, and I agree. John also advocated something like this, but IIRC David
Lee argued that a mapping is insufficient to address his concerns.

I would rather start with the XPath data model than the Infoset, though
since you say a profile of the Infoset, that might be the same thing in
practice.



> Let's call the mapping from the infoset to the MicroXML X_U and the other
> way round U_X.  Also let's call the mapping that will be defined by the
> MicroXML spec from MicroXML documents to the data model S_U.
>
> Then I believe the goal should be that for any string s that is both
> well-formed MicroXML and well-formed XML 1.0 + XML Namespaces,
>
> X_U(S_X(s)) = S_U(s)  and
> S_X(s) = U_X(S_U(s))
>
> I believe the one exception to this that is in the draft at the moment is
> the handling of literal newlines in attributes.  I think this exception is
> justifiable.
>
> I think it would be sufficient if the spec (probably in an Annex) gave an
> informal description of the X_U map.
>

I do think it should be in a non-normative Annex and not in the main spec.


-- 
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
http://wearekin.org
http://www.thenervousbreakdown.com/author/uogbuji/
http://copia.ogbuji.net
http://www.linkedin.com/in/ucheogbuji
http://twitter.com/uogbuji

Received on Tuesday, 14 August 2012 01:24:19 UTC