RE: tag name state from David Lee on 2012-03-03 (public-xml-er@w3.org from March 2012)

From: David Lee <David.Lee@marklogic.com>
Date: Sat, 3 Mar 2012 13:19:30 -0800
To: Noah Mendelsohn <nrm@arcanedomain.com>, David Carlisle <davidc@nag.co.uk>
CC: "public-xml-er@w3.org" <public-xml-er@w3.org>
Message-ID: <EB42045A1F00224E93B82E949EC6675E16ADE903EB@EXCHG-BE.marklogic.com>

I certainly dont get it. 
 If you parse a WF XML and then deserialize it, you rarely get byte for byte what you started with.
Take this simple example 


<root       a=   'b'     />


Whats the result?

I do not think it un-reasonable at all that XML-ER processor produce say   

<root a="b"></root>



       


-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.


> -----Original Message-----
> From: Noah Mendelsohn [mailto:nrm@arcanedomain.com]
> Sent: Saturday, March 03, 2012 2:49 PM
> To: David Carlisle
> Cc: public-xml-er@w3.org
> Subject: Re: tag name state
> 
> 
> 
> On 3/3/2012 1:20 PM, David Carlisle wrote:
> > It would preclude any implementation of xml-er that used any kind of
> > parsing. It restricts you to essentially the kind of fixup that you an
> > do with regular expressions, just keeping the textual document but never
> > parsing it.
> 
> Hmm. There's a something a bit circular about this.
> 
> Certainly your input can't be in the form of somethink like a DOM, or else
> how could it represent just the sorts of things like poorly nested tags
> that are exactly the sorts of things we are trying to fix up. So, I assume
> it's OK to assume that the input is a string of text that may or may not
> prove to be well formed?
> 
> OK, so you're definitely going to run some sort of parse on it, do some
> error checking while you go, and prepare the output as you go. I infer that
> you're interested in the case where your preferred output is, say, a DOM,
> and you're building it as you go. You're losing track of, e.g. whether an
> attribute was single or double quoted.
> 
> No problem. Please look again that the requirement I proposed. It was not
> that you be capable of reserializing the original document. Rather it was:
> 
> "...when XML-ER is used on well formed input to produce (take your pick of
> {DOM, XML-DM, Infoset, text file}), the results should be the same as if a
> (non-XML-ER) tool was used. "
> 
> ..and that's almost surely what you're going to do: you're going to build
> the same DOM you would have if you weren't prepared to do error recovery.
> The fact that, if you tried to re-serialize the DOM you wouldn't remember
> what quoting is used is no problem. That's why I stated the requirement
> that way.
> 
> I think you can do exactly what you want with the requirement as I phrased
> it. Am I missing something?
> 
> Thanks.
> 
> Noah

Received on Saturday, 3 March 2012 21:20:02 UTC