Re: Intent of ER-XML from David Carlisle on 2012-02-27 (public-xml-er@w3.org from February 2012)

From: David Carlisle <davidc@nag.co.uk>
Date: Mon, 27 Feb 2012 00:00:28 +0000
To: David Lee <David.Lee@marklogic.com>
CC: "public-xml-er@w3.org" <public-xml-er@w3.org>
Message-ID: <4F4AC79C.2070201@nag.co.uk>
On 26/02/2012 23:26, David Lee wrote:
> Great discussion ! To correct David C, actually what I am asking for
> is debate and clarity, not (yet) proposing an answer.

True, noted, but what I meant I think was that's what (I understood)
your second option to be discussing (although you further
clarify/correct that below).

> In particular, I am not suggesting a mapping at the serialization
> level. I think we have more than just a terminology class, but an
> "idea" or "philosophical" clash.
>
> Good point about there not being an actual spec of what an "XML
> Processor" really is. So if we don't know what an "XML Processor"
> is, how can we describe a "Drop In" replacement for one? (Without
> first trying to spec out this mysterious "XML Processor" then adding
> on the new XML-ER stuff).
>
> Let me attack this from the other side. What do we NOT want to
> define.
>
> I do not want to define an API.   I think that is bitten  the
> standards world in the past. It limits the creativity of
> implementers.  It also make things vastly less useful in this
> particular case because we would limit an implementation to have to
> use *or be a replacement for* a particular API (and language?).
>
> E.g.   Suppose my app uses the Java StAX API.   I want to add in
> "XML-ER" capability to it. If XML-ER is defined as an API that isn't
> StAX then I couldn't ever use it.

Yes I certainly would want this to be usable with stax or sax or
anything else. It may be that using the DOM as the output tree
description gives the impression that applications have to use the DOM
API and that would be unfortunate. Either the document needs to say
clearly somewhere that it's only specifying an abstract result tree and
that the tree may not be ever built and a streaming SAX interface or any
other interface may be used, or the DOM terminology could be changed to
a more obviously abstract tree description such as infoset.


> And I couldn't ask the vendor to add it either because it wouldn't be
> StAX. I could cite probably a dozen other examples.  Any API we
> define will only be a 'drop in replacement' for people using that
> API.    At almost every XML conference I've been to the #1 or #2
> complaint about XML is that DOM was defined as an API.   (The other
> complaint being namespaces :) So for a real 'drop in' replacement we
> need to define a spec for ALL currently (and future?) API's to avoid
> that trap ? Doesn't sound fun for me.

agreed.

>
> But how do we solve this magical goal of a 'drop in replacement' for
> an XML parser ? I suggest it is impractical to do this by defining a
> "Processor". But rather by defining the abstract rules (in whatever
> meta-language or model you like). This still doesn't give us a 'drop
> in' replacement for an XML Processor, but what it does give us is

Not sure I understand this point. The XML spec is pretty vague about
what a processor is (and certainly almost silent about how it reports
things). If the xml-er is similarly vague then it can be a drop-in
replacement for an xml parser in the same way as one xml parser may (or
may not, depending) be a drop in for another.

>
> A) A consistent statement of how an XML Processor can support
> "XML-ER"

Not sure I understand that. presumably an XML processor as defined in
xml 1.x can't support xml-er rules?

rules B) The ability for an implementer to implement such
> rules however they want,  either by retrofitting their existing
> parser or by writing a new one but still have well defined
> semantics.

agreed


> C) The ability to write a pre-processor which feeds into existing
> processors.   This is unlikely to be performant ideal, but its
> extremely useful especially for new specs.  E.g. this is how C++ was
> originally created.  It was done as a preprocessor for C.  It wasn't
> particularly efficient but it allowed the language to get into the
> hands of developers to play with which then encouraged vendors to
> start writing 'native'  C++ processors. It also allowed the early
> implementations to be vastly simpler as they only had to do the
> "C++" stuff, and could hand off to existing mature implementations
> parts that were not C++ specific, like linkers, assemblers, assembly
> code generation, optimizers etc.

But the thing it did need is a new parser. The fact that code generation
etc could be re-used is more akin to the fact that (if we spec it right)
the existing tool chain such as validators and xpath etc should all be
able to work with an xml-er parser.

>
> This train of logic leads me to be inclined to not wanting to define
> a "Processor" either. I suggest its vastly more work and less useful
> and less likely to  be adopted then if we define a more abstract set
> of rules of how to map a set of input to the well-formed output - in
> the form of abstract data types. ( no requirement for a
> serialization format for this 'output' ... rather specifically
> defined so that a common, but by no means *only*, use case would be
> an XML Processor/parser would just implement these additional rules
> within its own framework.).

I'm not sure I understand well enough how you intend this to work to be
able to agree or disagree. If you view the DOM references in the current
draft as a description of an abstract tree type rather than tied to a
particular API, does the current spec meet this description, or do you
mean something else?

>
> That doesn't mean that one could not write a "Processor" that
> implements these rules.  In fact I suspect initial implementations
> would - but what benefit in chaining ourselves into that requirement
> ?
>


David
Received on Monday, 27 February 2012 00:00:49 UTC