W3C home > Mailing lists > Public > public-xml-er@w3.org > March 2012

Re: tag name state

From: David Carlisle <davidc@nag.co.uk>
Date: Sat, 03 Mar 2012 18:20:19 +0000
Message-ID: <4F5260E3.8040900@nag.co.uk>
To: Noah Mendelsohn <nrm@arcanedomain.com>
CC: public-xml-er@w3.org
On 03/03/2012 18:06, Noah Mendelsohn wrote:
>
>
> On 3/3/2012 12:53 PM, David Carlisle wrote:
>> I'm not sure exactly what you men by 1-to-1 here as opposed to
>> c14n. I think that there is inevitably a certain amount of
>> canonicalisation implied when comparing two xml documents. encoding
>> weirdness and attribute order at least mean we can't insist that
>> you get byte-for-byte identical output given well formed document
>> as input.
>
> No, in this case I mean it. We're specifying a process that takes an
>  input and produces an output. I'm suggesting that if the input is
> XML, the conforming XML-ER result be byte-for-byte the same. Why
> not?

It would preclude any implementation of xml-er that used any kind of
parsing. It restricts you to essentially the kind of fixup that you an
do with regular expressions, just keeping the textual document but never
parsing it.

>
> That means that if you apply XML-ER to a well formed document, the
> result is no change. Any processing you do on the output will match
> what would happen with the input, including things like checksums.
> Any DOM you produce would be guaranteed identical to what would have
>  happened if you hadn't run XML-ER.

I think it's unreasonable to expect a higher level of fidelity to the
input than can be achieved by (say) an XSLT identity transform.
>
> Regardless of whether we document at the text level or the abstract
> tree level, this seems like a very desirable property, so I let me
> propose it as a goal:
>
> Goal: XML-ER should not change well formed input. Specifically, when
>  XML-ER is used on well formed input to produce (take your pick of
> {DOM, XML-DM, Infoset, text file}), the results should be the same as
> if a (non-XML-ER) tool was used.
>
> I think this is a really desirable property in practice. It means
> that it's >always< safe to run XML-ER on input that's known to be
> well-formed.

But you are requiring far more than that if you insist on preserving
attribute order, preserving text encoding etc, or if you insist that
entities are not expanded (while presumably checking their expansion to
check the document is well formed)

>
> BTW: I think there should be another goal, though it's a subset of
> the one above. So, if you buy the one above, this one falls out:
>
> Goal: XML-ER should be idempotent.
>
> Informally, if the result of an XML-ER run is fed back into XML-ER
> again, the result should be the same as the first time (in whatever
> form: DOM, XML-DM, text, etc.)
>
> I suspect most of the designs we're considering are doing this
> implicitly, but I think it's a useful property and worth stating as a
> goal for our work.

The current draft has this property (I think) but no algorithm remotely
like it could have the first property as far as I can see.
>
> Noah

David
Received on Saturday, 3 March 2012 18:20:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 3 March 2012 18:20:42 GMT