Re: Subset Data Model

On Tue, Aug 14, 2012 at 6:34 AM, David Carlisle <davidc@nag.co.uk> wrote:

> On 14/08/2012 00:08, Uche Ogbuji wrote:
>
>> David, I don't think reductio ad absurdum works here, as John
>> indicated before.
>>
>> To follow the example you suggest, XML 1.0 did not even specify the
>> order in which elements should be reported by a parser.
>>
>
> Yes and that was a bad thing. The spec is short but it is only really
> understandable (or you might say consistently implementable) if you
> assume rather a lot of inherited SGML folk law about how things are
> supposed to work. That is not a good precedent to follow.


I strongly agree with David on this. (I remember many, many years ago when
I first tried to understand SGML by reading the SGML Handbook, it all made
very little sense to me until I came across the description of ESIS.)  I
see the ultra simple data model as absolutely central to the point of
MicroXML.

It's also critical that the spec describes not just the syntax and the data
model but how the parser is supposed to construct the latter from the
former.

Given the loose way in which various data models are tied to the XML
> syntax I actually suspect it would be rather hard to formally specify
> how any micro-xml data model relates to XML.
>

Yes, but I think we should try even if it's too ugly and complex to put in
the spec.

I think we start off by identifying a profile of the XML Infoset ie
identifying the information items and properties that we care about.  This
(combined with the various XML specs) gives us a mapping, which I will call
S_X, from strings that conform to XML 1.0 + XML Namespaces into a data
structure consisting of just those information items and properties. Then
we define a two-way mapping between that data structure and the MicroXML
data model. Let's call the mapping from the infoset to the MicroXML X_U and
the other way round U_X.  Also let's call the mapping that will be defined
by the MicroXML spec from MicroXML documents to the data model S_U.

Then I believe the goal should be that for any string s that is both
well-formed MicroXML and well-formed XML 1.0 + XML Namespaces,

X_U(S_X(s)) = S_U(s)  and
S_X(s) = U_X(S_U(s))

I believe the one exception to this that is in the draft at the moment is
the handling of literal newlines in attributes.  I think this exception is
justifiable.

I think it would be sufficient if the spec (probably in an Annex) gave an
informal description of the X_U map.

James

Received on Tuesday, 14 August 2012 00:54:01 UTC