Re: How is ambiguity defined? from Norm Tovey-Walsh on 2022-01-06 (public-ixml@w3.org from January 2022)

From: Norm Tovey-Walsh <norm@saxonica.com>
Date: Thu, 06 Jan 2022 09:58:38 +0000
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: ixml <public-ixml@w3.org>
Message-ID: <m27dbd18om.fsf@saxonica.com>

> since multiple raw parse trees may turn into the same XML.  And
> since it’s not easy or cheap, detecting ambiguity maybe needs to be
> downgraded to a SHOULD or MAY.

Assuming that we took the position “if it has multiple parses, it’s
ambigous”, I was thinking about the problem of parses that produce the
same XML. I was imagining an option on my implementation for “show me
all the parses” vs an option for “show me all the different XML parses”.

I got 54 different parses out of one of my first test cases (perhaps
erroneously given my continuing frustrations with the attempt to use
someone else’s parser), but I’m reasonably sure that there’s only one
XML result.

Constructing 54 trees and doing deep-equal on them (for some definition
of “deep and “equal”) seems like it might be expensive. Then I wondered,
if you just walked over each tree constructing a cryptographic hash from
the names of start tags and the character output, I think identical
hashes would be indicative of identical XML output and would probably be
cheaper than materializing them all and comparing them.

But I was washing dishes at the time, so it doesn’t count as careful
consideration, just an idea.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Thursday, 6 January 2022 10:05:15 UTC