Re: tag name state from Noah Mendelsohn on 2012-03-03 (public-xml-er@w3.org from March 2012)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Fri, 02 Mar 2012 22:04:22 -0500
To: David Lee <David.Lee@marklogic.com>
CC: David Carlisle <davidc@nag.co.uk>, "public-xml-er@w3.org" <public-xml-er@w3.org>
Message-ID: <4F518A36.9020608@arcanedomain.com>

On 3/2/2012 9:36 AM, David Lee wrote:
> Suggestion: we say at an XML-ER parser produces an abstract data model
> ... then its up to which one ... INFOSET ? XDM ? ... Probably INFOSET as
> XDM drops several artifacts of XML that might be useful to upstream
> parser . But then that does exclude cases such as supporting invalid XML
> unicode codepoints.
>
> But then I havent read the INFOSET specs in years so I am going on old
> man braincells here ...

I still think it may be better to define the equivalence at the level of 
text. I am >not< necessarily saying that any particular processor must 
produce or go through an intermediate state that involves fixed up text.

What I am suggesting we consider is a model that builds on the XML 
Recommendation, since that's what we're trying to "fix up to". The XML 
Recommendation defines XML as text. Therefore, if we can show in the XML-ER 
specification what the equivalent well formed XML text is that corresponds 
to (the fixup of) each non-wellformed input, then all the struggles about 
abstract data models and choosing one just goes away. Let's use the the 
shorthand EWXML to refer to that equivalent text.

Any particular processor can produce a DOM, an (API over) an XML DM, or 
even the serialized XML. To prove you'd done your job right, you'd have to 
show that the DOM or DM or whatever is the same as the one you'd get by 
parsing the EWXML.

As a reasonably trivial example, for non-well formed input:

 <e a=3 />

the EWXML might be specified to be:

 <e a="3" />

This seems to me conceptually simpler and also more robust that picking a 
favorite among DOM, Infoset and XML DM. The XML Recommendation doesn't 
directly deal in any of these.

Noah

Received on Saturday, 3 March 2012 03:04:48 UTC