W3C home > Mailing lists > Public > public-xml-er@w3.org > March 2012

Re: tag name state

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Sat, 03 Mar 2012 14:48:33 -0500
Message-ID: <4F527591.1010103@arcanedomain.com>
To: David Carlisle <davidc@nag.co.uk>
CC: public-xml-er@w3.org


On 3/3/2012 1:20 PM, David Carlisle wrote:
> It would preclude any implementation of xml-er that used any kind of
> parsing. It restricts you to essentially the kind of fixup that you an
> do with regular expressions, just keeping the textual document but never
> parsing it.

Hmm. There's a something a bit circular about this.

Certainly your input can't be in the form of somethink like a DOM, or else 
how could it represent just the sorts of things like poorly nested tags 
that are exactly the sorts of things we are trying to fix up. So, I assume 
it's OK to assume that the input is a string of text that may or may not 
prove to be well formed?

OK, so you're definitely going to run some sort of parse on it, do some 
error checking while you go, and prepare the output as you go. I infer that 
you're interested in the case where your preferred output is, say, a DOM, 
and you're building it as you go. You're losing track of, e.g. whether an 
attribute was single or double quoted.

No problem. Please look again that the requirement I proposed. It was not 
that you be capable of reserializing the original document. Rather it was:

"...when XML-ER is used on well formed input to produce (take your pick of 
{DOM, XML-DM, Infoset, text file}), the results should be the same as if a 
(non-XML-ER) tool was used. "

..and that's almost surely what you're going to do: you're going to build 
the same DOM you would have if you weren't prepared to do error recovery. 
The fact that, if you tried to re-serialize the DOM you wouldn't remember 
what quoting is used is no problem. That's why I stated the requirement 
that way.

I think you can do exactly what you want with the requirement as I phrased 
it. Am I missing something?

Thanks.

Noah
Received on Saturday, 3 March 2012 19:48:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 3 March 2012 19:49:00 GMT