- From: John Cowan <cowan@mercury.ccil.org>
- Date: Fri, 28 Sep 2012 11:49:09 -0400
- To: Stephen D Green <stephengreenubl@gmail.com>
- Cc: David Lee <David.Lee@marklogic.com>, Maik Stührenberg <maik.stuehrenberg@uni-bielefeld.de>, "public-microxml@w3.org" <public-microxml@w3.org>
Stephen D Green scripsit: > Haven't there already been several different abstract data models > put foward for XML? Yes, but XML is a complex standard and there are lots of things which might be of interest. The XML Infoset is an attempt to give standard names to some of those things, though there are plenty more which are left out. The PSVI could be used to report DTD information, but nobody does. MicroXML is so trivial that it's not very interesting to provide alternative data models. You could, for example, leave out attributes, but it's simpler just to ignore them if you don't care about them. Similarly, you could report on lexical minutiae, but there are only a few: single vs. double quotes and whether character references are used are the only ones I can think of. > Can't we have parsers for MicroXML which support a variety of data > models? In principle, I suppose, but to what purpose? MicroLark supports push parsing (SAX-style), pull parsing (StAX-style), and tree building, but only one data model, namely that there is one element object for each element in the document, and it contains a name (a string), an attribute map from names to strings, and a sequence of children which are either strings or element objects, all of which must be reported. > I also came across mention of 'compounds' as an alternative > abstract data model for XML - may a parser not implement such if > it wants to claim to be conformant? The MicroXML data model is a simple subset of the compound model. To represent MicroXML in the obvious way, you'd have two kinds of compounds, element compounds and textual compounds. An element compound has a STRING representing the element name, a TAG marking it as meta, a DIRECTORY mapping attribute values (textual compounds) to attribute values (also textual compounds), a KEY SET containing all the keys in the DIRECTORY, and a LIST consisting of the children. A textual compound has a STRING representing the text, a TAG marking it as a text string, and an empty DIRECTORY, KEY SET, and LIST. So a parser reporting these compounds would fully instantiate the MicroXML data model. <http://www.cl.cam.ac.uk/research/security/dendros/compounds-poster.pdf> gives a brief explanation of these terms. -- John Cowan cowan@ccil.org http://www.ccil.org/~cowan Dievas dave dantis; Dievas duos duonos --Lithuanian proverb Deus dedit dentes; deus dabit panem --Latin version thereof Deity donated dentition; deity'll donate doughnuts --English version by Muke Tever God gave gums; God'll give granary --Version by Mat McVeagh
Received on Friday, 28 September 2012 15:49:33 UTC