RE: Intent of ER-XML from David Lee on 2012-02-26 (public-xml-er@w3.org from February 2012)

From: David Lee <David.Lee@marklogic.com>
Date: Sun, 26 Feb 2012 15:26:32 -0800
To: David Carlisle <davidc@nag.co.uk>, "public-xml-er@w3.org" <public-xml-er@w3.org>
Message-ID: <EB42045A1F00224E93B82E949EC6675E16ADCEEED2@EXCHG-BE.marklogic.com>
Great discussion ! 
To correct David C, actually what I am asking for is debate and clarity, not (yet) proposing an answer.
In particular, I am not suggesting a mapping at the serialization level.  
I think we have more than just a terminology class, but an "idea" or "philosophical" clash.

Good point about there not being an actual spec of what an "XML Processor" really is.
So if we don't know what an "XML Processor" is, how can we describe a "Drop In" replacement for one?
(Without first trying to spec out this mysterious "XML Processor" then adding on the new XML-ER stuff).

Let me attack this from the other side.
What do we NOT want to define.

I do not want to define an API.   I think that is bitten  the standards world in the past.
It limits the creativity of implementers.  It also make things vastly less useful in this particular case because we would limit an implementation to have to use *or be a replacement for* a particular API (and language?).

E.g.   Suppose my app uses the Java StAX API.   I want to add in "XML-ER" capability to it.
If XML-ER is defined as an API that isn't StAX then I couldn't ever use it.  And I couldn't ask the vendor to add it either because it wouldn't be StAX.
I could cite probably a dozen other examples.  Any API we define will only be a 'drop in replacement' for people using that API.    At almost every XML conference I've been to the #1 or #2 complaint about XML is that DOM was defined as an API.   (The other complaint being namespaces :)
So for a real 'drop in' replacement we need to define a spec for ALL currently (and future?) API's to avoid that trap ? 
Doesn't sound fun for me.

But how do we solve this magical goal of a 'drop in replacement' for an XML parser ?
I suggest it is impractical to do this by defining a "Processor".  
But rather by defining the abstract rules (in whatever meta-language or model you like).
This still doesn't give us a 'drop in' replacement for an XML Processor, but what it does give us is 

A) A consistent statement of how an XML Processor can support "XML-ER" rules
B) The ability for an implementer to implement such rules however they want,  either by 
retrofitting their existing parser or by writing a new one but still have well defined semantics.
C) The ability to write a pre-processor which feeds into existing processors.   This is unlikely to be performant ideal, but its extremely useful especially for new specs.  E.g. this is how C++ was originally created.  It was done as a preprocessor for C.  It wasn't particularly efficient but it allowed the language to get into the hands of developers to play with which then encouraged vendors to start writing 'native'  C++ processors.
It also allowed the early implementations to be vastly simpler as they only had to do the "C++" stuff, and could hand off to existing mature implementations parts that were not C++ specific, like linkers, assemblers, assembly code generation, optimizers etc.  

This train of logic leads me to be inclined to not wanting to define a "Processor" either.
I suggest its vastly more work and less useful and less likely to  be adopted then if we define a more abstract set of rules of how to map a set of input to the well-formed output - in the form of abstract data types. ( no requirement for a serialization format for this 'output' ... rather specifically defined so that a common, but by no means *only*, use case would be an XML Processor/parser would just implement these additional rules within its own framework.).

That doesn't mean that one could not write a "Processor" that implements these rules.  In fact I suspect initial implementations would - but what benefit in chaining ourselves into that requirement ?



-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.

> -----Original Message-----
> From: David Carlisle [mailto:davidc@nag.co.uk]
> Sent: Sunday, February 26, 2012 5:34 PM
> To: public-xml-er@w3.org
> Subject: Re: Intent of ER-XML
> 
> On 26/02/2012 22:21, Noah Mendelsohn wrote:
> > but fwiw my intuition is that the layering of the specifications
> > would be better if we first documented the mapping from input to
> > output, without describing in detail any particular piece of
> > software that might implement such a mapping.
> 
> Maybe there is a terminology clash somewhere, as I would say that the
> current draft meets that description. (If you view DOM references with a
> sufficiently abstract way). It basically defines a mapping from an input
> string of unicode characters to an abstract tree representation. It
> doesn't (or need not) define any API to interact with that tree
> (although if the tree is described using the DOM there is an obvious
> mapping to the DOM API).
> 
> What David Lee was (I think) asking for is something more, a mapping
> defined from an input string to the string representation of a document
> matching the productions in the XML 1.x spec that should work somehow
> without needing a full parser being specified (and, presumably run) on
> the input stream. I don't have any philosophical objection to such a
> system (I often edit xml without putting it through a full xml parse
> with Emacs lisp or perl or whatever) but in this case I can't imagine
> how it would work as any way I can imagine fixing up the result tree
> involves finding out what was wrong with the input tree by parsing it.
> 
> David
>
Received on Sunday, 26 February 2012 23:27:06 UTC