Re: How is ambiguity defined? from Tom Hillman on 2022-01-06 (public-ixml@w3.org from January 2022)

From: Tom Hillman <tom@expertml.com>
Date: Thu, 6 Jan 2022 10:45:14 +0000
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Norm Tovey-Walsh <norm@saxonica.com>
Cc: ixml <public-ixml@w3.org>
Message-ID: <3c58964c-c0f2-4009-a1b0-7578b66c8171@Spark>

> Constructing 54 trees and doing deep-equal on them (for some definition
> of “deep and “equal”) seems like it might be expensive. Then I wondered,
> if you just walked over each tree constructing a cryptographic hash from
> the names of start tags and the character output, I think identical
> hashes would be indicative of identical XML output and would probably be
> cheaper than materializing them all and comparing them.

This reminds me of Joe Armstrong's suggestion of replacing all IDs with hashes:

https://youtu.be/lKXe3HUG2l4

_________________
Tomos Hillman
eXpertML Ltd
+44 7793 242058
On 6 Jan 2022, 10:05 +0000, Norm Tovey-Walsh <norm@saxonica.com>, wrote:
> > since multiple raw parse trees may turn into the same XML. And
> > since it’s not easy or cheap, detecting ambiguity maybe needs to be
> > downgraded to a SHOULD or MAY.
>
> Assuming that we took the position “if it has multiple parses, it’s
> ambigous”, I was thinking about the problem of parses that produce the
> same XML. I was imagining an option on my implementation for “show me
> all the parses” vs an option for “show me all the different XML parses”.
>
> I got 54 different parses out of one of my first test cases (perhaps
> erroneously given my continuing frustrations with the attempt to use
> someone else’s parser), but I’m reasonably sure that there’s only one
> XML result.
>
> Constructing 54 trees and doing deep-equal on them (for some definition
> of “deep and “equal”) seems like it might be expensive. Then I wondered,
> if you just walked over each tree constructing a cryptographic hash from
> the names of start tags and the character output, I think identical
> hashes would be indicative of identical XML output and would probably be
> cheaper than materializing them all and comparing them.
>
> But I was washing dishes at the time, so it doesn’t count as careful
> consideration, just an idea.
>
> Be seeing you,
> norm
>
> --
> Norm Tovey-Walsh
> Saxonica

Received on Thursday, 6 January 2022 10:45:37 UTC