Re: tag name state from Noah Mendelsohn on 2012-03-04 (public-xml-er@w3.org from March 2012)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Sun, 04 Mar 2012 12:48:34 -0500
To: David Lee <David.Lee@marklogic.com>
CC: David Carlisle <davidc@nag.co.uk>, "public-xml-er@w3.org" <public-xml-er@w3.org>
Message-ID: <4F53AAF2.30603@arcanedomain.com>

On 3/3/2012 4:19 PM, David Lee wrote:
> I certainly dont get it.
>   If you parse a WF XML and then deserialize it, you rarely get byte for byte what you started with.
> Take this simple example
>
>
> <root       a=   'b'     />
>
>
> Whats the result?
>
> I do not think it un-reasonable at all that XML-ER processor produce say
>
> <root a="b"></root>

I think you misunderstood the goal I'm proposing. In your example, the input is

 <root       a=   'b'     />

and it's well formed. If you ran a regular DOM-oriented XML processor it 
would produce some DOM. As you imply, that DOM loses track of a variety of 
detail from the original input, e.g. whether there was any space following 
the "=".

Now imagine you run instead an XML-ER processor to produce a DOM. My 
proposed goal is: because the input is well formed, the DOM produced by 
that XML-ER processor must be the same as the one produced above. It too 
will not record whether there are spaces following the =.

I'm suggesting that we set a goal that XML-ER be transparent, in that 
sense, when presented with well formed input. I am not suggesting that 
XML-ER cause us to retain information, such as spaces after the =, that 
would not have been kept by equivalent XML tooling.

I do think it makes sense to >allow for< XML-ER tooling that produces text 
output, as I think there are use cases where people will want clean XML to 
save into files or to import into programs that require it. In that case, I 
would suggest that the input be passed through, byte-for-byte, or at least 
character-for-character (I'm not sure we need to preclude recoding from 
UTF-8 to UTF-16, e.g. should someone really be so inclined.)

Noah
Noah

Received on Sunday, 4 March 2012 17:49:00 UTC