Re: tag name state from Noah Mendelsohn on 2012-03-03 (public-xml-er@w3.org from March 2012)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Sat, 03 Mar 2012 13:06:34 -0500
To: David Carlisle <davidc@nag.co.uk>
CC: public-xml-er@w3.org
Message-ID: <4F525DAA.8080608@arcanedomain.com>

On 3/3/2012 12:53 PM, David Carlisle wrote:
> I'm not sure exactly what you men by 1-to-1 here as opposed to c14n.
> I think that there is inevitably a certain amount of canonicalisation
> implied when comparing two xml documents. encoding weirdness and
> attribute order at least mean we can't insist that you get byte-for-byte
> identical output given well formed document as input.

No, in this case I mean it. We're specifying a process that takes an input 
and produces an output. I'm suggesting that if the input is XML, the 
conforming XML-ER result be byte-for-byte the same. Why not?

That means that if you apply XML-ER to a well formed document, the result 
is no change. Any processing you do on the output will match what would 
happen with the input, including things like checksums.  Any DOM you 
produce would be guaranteed identical to what would have happened if you 
hadn't run XML-ER.

Regardless of whether we document at the text level or the abstract tree 
level, this seems like a very desirable property, so I let me propose it as 
a goal:

Goal: XML-ER should not change well formed input. Specifically, when XML-ER 
is used on well formed input to produce (take your pick of {DOM, XML-DM, 
Infoset, text file}), the results should be the same as if a (non-XML-ER) 
tool was used.

I think this is a really desirable property in practice. It means that it's 
 >always< safe to run XML-ER on input that's known to be well-formed.

BTW: I think there should be another goal, though it's a subset of the one 
above. So, if you buy the one above, this one falls out:

Goal: XML-ER should be idempotent.

Informally, if the result of an XML-ER run is fed back into XML-ER again, 
the result should be the same as the first time (in whatever form: DOM, 
XML-DM, text, etc.)

I suspect most of the designs we're considering are doing this implicitly, 
but I think it's a useful property and worth stating as a goal for our work.

Noah

Received on Saturday, 3 March 2012 18:06:59 UTC