Re: data model from Stephen D Green on 2012-10-01 (public-microxml@w3.org from October 2012)

From: Stephen D Green <stephengreenubl@gmail.com>
Date: Mon, 1 Oct 2012 11:39:31 +0100
To: James Clark <jjc@jclark.com>
Cc: John Cowan <cowan@mercury.ccil.org>, David Lee <David.Lee@marklogic.com>, Maik Stührenberg <maik.stuehrenberg@uni-bielefeld.de>, "public-microxml@w3.org" <public-microxml@w3.org>
Message-ID: <CAA0AChX4CrkXJYzYKpdus8Cc5NL6kFHahwfYWKVYKQdkvHbQ7g@mail.gmail.com>
Many thanks, James. I'll accept that assurance.
----
Stephen D Green



On 1 October 2012 09:52, James Clark <jjc@jclark.com> wrote:

> You are quite wrong about conformance.  MicroXML's data model is a big
> help for conformance testing: it enable conformance testing to be much more
> thorough and concrete.
>
> I also remain confident that conformance to the data model does not impose
> any unnecessary burden on implementations.
>
> James
>
>
> On Mon, Oct 1, 2012 at 2:46 PM, Stephen D Green <stephengreenubl@gmail.com
> > wrote:
>
>> Say I want to have a more specialised parser: Perhaps all
>> I want is a parser to evaluate a particular XPath or small set
>> of XPath expressions and I want a minimally sized compiled
>> library to do my parsing just for that purpose and no more.
>> It would be nice to have a microXPath expression language
>> which has as few as possible ways to represent a single XPath
>> expression. Given such a beast, I'd like to be able to create
>> ad hoc parsers specialised to a given microXPath expression
>> evaluation - highly optimised for performance and compiled
>> size. I'd like microXML to allow such a parser to be conformant
>> without it having to include unnecessary code. I'd actually
>> prefer that conformance not even care about what the abstract
>> data model is. Even if I want a conformant parser which can
>> evaluate any or all possible microXPath expressions on any
>> microXML document, I'd like that parser not to have to conform
>> to a particular data model because that might increase the
>> parser's cost of development, size and complexity and reduce its
>> performance.
>>
>> Another possible reason to more loosely couple the abstract
>> data from the microXML spec (which I regard as most useful
>> in its specification of a syntax for microXMl documents) is in
>> the matter of conformance testing. I'm not convinced (yet) that
>>  you can test conformance of the parser's abstract data model
>> since this is likely to be invisible, internal, private rather than
>> visible, external, public (to my comparative naivety, I admit).
>>
>> I'd like to see so-called 'test assertions' for microXML for
>> conformance and interoperability testing and in producing
>> these, I suspect, it might be found that aspects of the present
>> spec's conformance clauses for parsers cannot be expressed
>> as testable test assertions (or that such assertions might rely
>> on human reading of the code base of a parser and so make
>> fully automated testing of conformance based on such test
>> assertions too expensive or impracticable).
>>
>> One suggestion I could make is to call the present spec, if
>> the above doesn't get acceptance as enough reason to
>> change it to any greater degree, something like microXML
>> - Xyz Data Model specification (where Xyz might be 'Tree'
>> or 'Hierarchical' or even 'Compound' or just something
>> like 'Level 1'). This would 1) indicate that there might follow
>> some specs for other data models - and leave room for such
>> and 2) mean that a conforming parser need only claim
>> conformance to this particular data model. Better, I think,
>> might be to add to the conformance section either a
>> placeholder note or a conformance clause to cater for more
>> specialised microXML parsers (such as my description above
>> of a parser optimised for evaluating general or specific XPath
>> expressions on a microXML document).
>> ----
>> Stephen D Green
>>
>>
>>
>> On 28 September 2012 16:49, John Cowan <cowan@mercury.ccil.org> wrote:
>>
>>> Stephen D Green scripsit:
>>>
>>> > Haven't there already been several different abstract data models
>>> > put foward for XML?
>>>
>>> Yes, but XML is a complex standard and there are lots of things which
>>> might
>>> be of interest.  The XML Infoset is an attempt to give standard names to
>>> some of those things, though there are plenty more which are left out.
>>> The PSVI could be used to report DTD information, but nobody does.
>>>
>>> MicroXML is so trivial that it's not very interesting to provide
>>> alternative
>>> data models.  You could, for example, leave out attributes, but it's
>>> simpler
>>> just to ignore them if you don't care about them.  Similarly, you could
>>> report
>>> on lexical minutiae, but there are only a few: single vs. double quotes
>>> and whether character references are used are the only ones I can think
>>> of.
>>>
>>> > Can't we have parsers for MicroXML which support a variety of data
>>> > models?
>>>
>>> In principle, I suppose, but to what purpose?  MicroLark supports push
>>> parsing (SAX-style), pull parsing (StAX-style), and tree building, but
>>> only one data model, namely that there is one element object for each
>>> element in the document, and it contains a name (a string), an attribute
>>> map from names to strings, and a sequence of children which are either
>>> strings or element objects, all of which must be reported.
>>>
>>> > I also came across mention of 'compounds' as an alternative
>>> > abstract data model for XML - may a parser not implement such if
>>> > it wants to claim to be conformant?
>>>
>>> The MicroXML data model is a simple subset of the compound model.
>>> To represent MicroXML in the obvious way, you'd have two kinds of
>>> compounds, element compounds and textual compounds.  An element compound
>>> has a STRING representing the element name, a TAG marking it as meta,
>>> a DIRECTORY mapping attribute values (textual compounds) to attribute
>>> values (also textual compounds), a KEY SET containing all the keys in the
>>> DIRECTORY, and a LIST consisting of the children.  A textual compound
>>> has a STRING representing the text, a TAG marking it as a text string,
>>> and an empty DIRECTORY, KEY SET, and LIST.  So a parser reporting these
>>> compounds would fully instantiate the MicroXML data model.
>>>
>>> <http://www.cl.cam.ac.uk/research/security/dendros/compounds-poster.pdf>
>>> gives a brief explanation of these terms.
>>>
>>> --
>>> John Cowan  cowan@ccil.org   http://www.ccil.org/~cowan
>>> Dievas dave dantis; Dievas duos duonos          --Lithuanian proverb
>>> Deus dedit dentes; deus dabit panem             --Latin version thereof
>>> Deity donated dentition;
>>>   deity'll donate doughnuts                     --English version by
>>> Muke Tever
>>> God gave gums; God'll give granary              --Version by Mat McVeagh
>>>
>>
>>
>
Received on Monday, 1 October 2012 10:40:19 UTC