- From: Noah Mendelsohn <nrm@arcanedomain.com>
- Date: Sun, 04 Mar 2012 12:48:34 -0500
- To: David Lee <David.Lee@marklogic.com>
- CC: David Carlisle <davidc@nag.co.uk>, "public-xml-er@w3.org" <public-xml-er@w3.org>
On 3/3/2012 4:19 PM, David Lee wrote: > I certainly dont get it. > If you parse a WF XML and then deserialize it, you rarely get byte for byte what you started with. > Take this simple example > > > <root a= 'b' /> > > > Whats the result? > > I do not think it un-reasonable at all that XML-ER processor produce say > > <root a="b"></root> I think you misunderstood the goal I'm proposing. In your example, the input is <root a= 'b' /> and it's well formed. If you ran a regular DOM-oriented XML processor it would produce some DOM. As you imply, that DOM loses track of a variety of detail from the original input, e.g. whether there was any space following the "=". Now imagine you run instead an XML-ER processor to produce a DOM. My proposed goal is: because the input is well formed, the DOM produced by that XML-ER processor must be the same as the one produced above. It too will not record whether there are spaces following the =. I'm suggesting that we set a goal that XML-ER be transparent, in that sense, when presented with well formed input. I am not suggesting that XML-ER cause us to retain information, such as spaces after the =, that would not have been kept by equivalent XML tooling. I do think it makes sense to >allow for< XML-ER tooling that produces text output, as I think there are use cases where people will want clean XML to save into files or to import into programs that require it. In that case, I would suggest that the input be passed through, byte-for-byte, or at least character-for-character (I'm not sure we need to preclude recoding from UTF-8 to UTF-16, e.g. should someone really be so inclined.) Noah Noah
Received on Sunday, 4 March 2012 17:49:00 UTC