Re: Intent of ER-XML from Robin Berjon on 2012-02-27 (public-xml-er@w3.org from February 2012)

From: Robin Berjon <robin@berjon.com>
Date: Mon, 27 Feb 2012 16:50:38 +0100
To: Noah Mendelsohn <nrm@arcanedomain.com>
Cc: public-xml-er@w3.org
Message-Id: <3B4A36A6-31B5-4969-B6F5-EF31C36F74B7@berjon.com>
On Feb 27, 2012, at 16:25 , Noah Mendelsohn wrote:
> On 2/27/2012 9:56 AM, Robin Berjon wrote:
>> It's very much possible that I'm being dumb and missing an important
>> distinction here but I'm having a hard time figuring out how we could
>> define a mapping from an input document to an output tree that would be
>> all of interoperable, usable, and not a processor. Can someone please
>> illuminate me?
> 
> As one who's been advocating the "output tree" approach, I think what you're missing is the proposed layering of the specifications. The XML-ER specification would be a shared building block on which multiple different sorts of processors could be specified, as is the case with the XML Recommendation today. An interoperable specification of a processor seems to require documentation of the input and output API, since that's the level at which software typically "interoperates".

Okay, I think I get what you're aiming at. I'm not convinced that this is the best approach though.

First, it seems potentially hard to test. With XML there are clearly defined WF/non-WF distinctions so that you can feed parsers a large set of WF documents and a large set of non-WF documents and those that pass do the right thing for both. In this case we're looking at something that presumably would parse everything. So in order to usefully test it, you need to specify the interpretation. And as the ancients used to say: test together, spec together.

Second, I don't think that XML got it right here. For all that it's supposed to be only about syntax, I can't think of a single specification that usefully addresses it at that layer. A possible exception could be C14N but even that is really about serialising a data model.

Then there's the problem that we've had in the past with compatibility issues at the data model level stemming from slightly different interpretations, e.g. adjacent text nodes. The lack of a DM was also a key issue in producing a useful XML Fragment specification. It was also a problem for EXI which helped lock a bunch of people out of the XML community. I wish we'd learn from these mistakes :)

Finally, I don't think that writing a processor specification precludes other APIs. It has the huge advantage of being concrete-in/concrete-out but it doesn't mean that once you have a processor well defined you can't go ahead and write another API for it (note that I didn't say "on top of it"). But when you do you'll have a clearly defined concrete and testable interpretation from which to work.

> BTW: though I've advocated an abstract output tree in this thread, I did early mention the attractions of defining a mapping from non-well formed input character streams to well formed output character streams. As I said before, this would allow us to directly leverage all the existing specifications and code that operate on well-formed XML. It's also clear how conformance testing could be achieved in an interoperable way. Of course, I don't assume that typical implementations would necessarily reserialize the well-formed XML, but it might be the right level at which to right an XML-ER specification.

I'd suggest that we can layer that as console.log(document.innerHTML). 1/2;-)

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Coming up soon: I'm teaching a W3C online course on Mobile Web Apps
http://www.w3devcampus.com/writing-great-web-applications-for-mobile/
Received on Monday, 27 February 2012 15:51:07 UTC