W3C home > Mailing lists > Public > public-microxml@w3.org > October 2012

RE: data model

From: David Lee <David.Lee@marklogic.com>
Date: Mon, 1 Oct 2012 06:00:58 -0700
To: James Clark <jjc@jclark.com>, "stephengreenubl@gmail.com" <stephengreenubl@gmail.com>
CC: John Cowan <cowan@mercury.ccil.org>, Maik Stührenberg <maik.stuehrenberg@uni-bielefeld.de>, "public-microxml@w3.org" <public-microxml@w3.org>
Message-ID: <EB42045A1F00224E93B82E949EC6675E16B0EC834F@EXCHG-BE.marklogic.com>
I am a bit confused.
How exactly does one test conformance to an abstract data model ?
I do  believe that having one is critical for design but how does one test it ?
Take Steven's example ... and simplify it more say I write an "uxgrep" tool that takes

Input: List of files containing Text serialized Micro XML
Output: Plain text names of all files which have content which matches the expressions.

( fyi this would be similar to the xml command "xgrep")

Now ... how does one validate that the correct abstract data model is in fact used ?
It need never be concretely realized in the program. (its *abstract!*)
That doesn't mean the model isn't there ... its in the aether of the program design.
Similar to say how protocol layers can sometimes be merged in implementations ... doesn't mean they don't exist.

So I don't see how one actually tests conformance with an abstract data model ...
In this case one only could test if the whole black box worked  'as if' the model were accurately represented.   There might be parts of the data model one chooses to ignore, say if you had uxpath which didn't handle attributes.   Thats a perfectly valid tool but impossible, and unnecessary, to test if attributes were correctly preserved in the abstract model.

I personally think that is sufficient ... but does imply that wording for conformance testing need be particularly vague.   And it doesn't imply the abstract data model has no value or is too constraining.   It simply may not always be 100% testable.

David Lee
Lead Engineer
MarkLogic Corporation
Phone: +1 812-482-5224
Cell:  +1 812-630-7622

From: James Clark [mailto:jjc@jclark.com]
Sent: Monday, October 01, 2012 4:52 AM
To: stephengreenubl@gmail.com
Cc: John Cowan; David Lee; Maik Stührenberg; public-microxml@w3.org
Subject: Re: data model

You are quite wrong about conformance.  MicroXML's data model is a big help for conformance testing: it enable conformance testing to be much more thorough and concrete.

I also remain confident that conformance to the data model does not impose any unnecessary burden on implementations.

On Mon, Oct 1, 2012 at 2:46 PM, Stephen D Green <stephengreenubl@gmail.com<mailto:stephengreenubl@gmail.com>> wrote:
Say I want to have a more specialised parser: Perhaps all
I want is a parser to evaluate a particular XPath or small set
of XPath expressions and I want a minimally sized compiled
library to do my parsing just for that purpose and no more.
It would be nice to have a microXPath expression language
which has as few as possible ways to represent a single XPath
expression. Given such a beast, I'd like to be able to create
ad hoc parsers specialised to a given microXPath expression
evaluation - highly optimised for performance and compiled
size. I'd like microXML to allow such a parser to be conformant
without it having to include unnecessary code. I'd actually
prefer that conformance not even care about what the abstract
data model is. Even if I want a conformant parser which can
evaluate any or all possible microXPath expressions on any
microXML document, I'd like that parser not to have to conform
to a particular data model because that might increase the
parser's cost of development, size and complexity and reduce its

Another possible reason to more loosely couple the abstract
data from the microXML spec (which I regard as most useful
in its specification of a syntax for microXMl documents) is in
the matter of conformance testing. I'm not convinced (yet) that
you can test conformance of the parser's abstract data model
since this is likely to be invisible, internal, private rather than
visible, external, public (to my comparative naivety, I admit).

I'd like to see so-called 'test assertions' for microXML for
conformance and interoperability testing and in producing
these, I suspect, it might be found that aspects of the present
spec's conformance clauses for parsers cannot be expressed
as testable test assertions (or that such assertions might rely
on human reading of the code base of a parser and so make
fully automated testing of conformance based on such test
assertions too expensive or impracticable).

One suggestion I could make is to call the present spec, if
the above doesn't get acceptance as enough reason to
change it to any greater degree, something like microXML
- Xyz Data Model specification (where Xyz might be 'Tree'
or 'Hierarchical' or even 'Compound' or just something
like 'Level 1'). This would 1) indicate that there might follow
some specs for other data models - and leave room for such
and 2) mean that a conforming parser need only claim
conformance to this particular data model. Better, I think,
might be to add to the conformance section either a
placeholder note or a conformance clause to cater for more
specialised microXML parsers (such as my description above
of a parser optimised for evaluating general or specific XPath
expressions on a microXML document).
Stephen D Green

On 28 September 2012 16:49, John Cowan <cowan@mercury.ccil.org<mailto:cowan@mercury.ccil.org>> wrote:
Stephen D Green scripsit:

> Haven't there already been several different abstract data models
> put foward for XML?
Yes, but XML is a complex standard and there are lots of things which might
be of interest.  The XML Infoset is an attempt to give standard names to
some of those things, though there are plenty more which are left out.
The PSVI could be used to report DTD information, but nobody does.

MicroXML is so trivial that it's not very interesting to provide alternative
data models.  You could, for example, leave out attributes, but it's simpler
just to ignore them if you don't care about them.  Similarly, you could report
on lexical minutiae, but there are only a few: single vs. double quotes
and whether character references are used are the only ones I can think of.

> Can't we have parsers for MicroXML which support a variety of data
> models?
In principle, I suppose, but to what purpose?  MicroLark supports push
parsing (SAX-style), pull parsing (StAX-style), and tree building, but
only one data model, namely that there is one element object for each
element in the document, and it contains a name (a string), an attribute
map from names to strings, and a sequence of children which are either
strings or element objects, all of which must be reported.

> I also came across mention of 'compounds' as an alternative
> abstract data model for XML - may a parser not implement such if
> it wants to claim to be conformant?
The MicroXML data model is a simple subset of the compound model.
To represent MicroXML in the obvious way, you'd have two kinds of
compounds, element compounds and textual compounds.  An element compound
has a STRING representing the element name, a TAG marking it as meta,
a DIRECTORY mapping attribute values (textual compounds) to attribute
values (also textual compounds), a KEY SET containing all the keys in the
DIRECTORY, and a LIST consisting of the children.  A textual compound
has a STRING representing the text, a TAG marking it as a text string,
and an empty DIRECTORY, KEY SET, and LIST.  So a parser reporting these
compounds would fully instantiate the MicroXML data model.

gives a brief explanation of these terms.

John Cowan  cowan@ccil.org<mailto:cowan@ccil.org>   http://www.ccil.org/~cowan
Dievas dave dantis; Dievas duos duonos          --Lithuanian proverb
Deus dedit dentes; deus dabit panem             --Latin version thereof
Deity donated dentition;
  deity'll donate doughnuts                     --English version by Muke Tever
God gave gums; God'll give granary              --Version by Mat McVeagh
Received on Monday, 1 October 2012 13:01:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:12:11 UTC