Re: Action Item QT4CG-004-02: DN to make a proposal for deep-equal-safe for future discussion ( Re: Draft minutes for QT4CG meeting 004, 2022-09-27) from Dimitre Novatchev on 2022-09-28 (public-xslt-40@w3.org from September 2022)

From: Dimitre Novatchev <dnovatchev@gmail.com>
Date: Wed, 28 Sep 2022 08:09:01 -0700
To: Michael Kay <mike@saxonica.com>
Cc: Norm Tovey-Walsh <norm@saxonica.com>, public-xslt-40@w3.org
Message-ID: <CAK4KnZcyeN-AGXMDdPTUqVFd1vdY9wopew=BeVJ0vRZnQSmpeA@mail.gmail.com>
On Wed, Sep 28, 2022 at 7:22 AM Michael Kay <mike@saxonica.com> wrote:

> Comparing two document trees by content is a fairly rare requirement, and
> nearly all the use cases I know of are (a) to compare expected test
> results, or (b) to see if anything has changed. For those requirements, the
> fact that deep-equal isn't transitive and can fail isn't a big deal.
> Anything that requires repeated comparisons (such as sorting, searching for
> duplicates, or building a map) is likely to be very inefficient and the
> user would be better off meeting the requirement with document signatures
> or similar. I'd like to see the use case, and to form a view as to whether
> a document-signature() function would be a more useful way to meet the
> requirement...
>
>
Considering efficiency is the next step. Obviously it would be
significantly more efficient, if any comparison  of document tree content
starts with checking the signatures (hashes) of the two arguments, and only
if these are equal call the comparer-function. Further parametrization may
even have a boolean that says whether or to call the comparer in case of
hashes equality or just to assume that the comparison is true().

Thanks,
Dimitre



> Michael Kay
> Saxonica
>
> On 28 Sep 2022, at 15:04, Dimitre Novatchev <dnovatchev@gmail.com> wrote:
>
>
>
> On Tue, Sep 27, 2022 at 11:41 PM Michael Kay <mike@saxonica.com> wrote:
>
>> A couple of comments:
>>
>> (a) I have proposed (somewhere) changing the semantics of numeric
>> comparisons using eq to convert both values to decimal rather than to
>> double (as op:same-key does). But of course NaN=NaN would remain false.
>> This function could take advantage of this.
>>
>
> Yes, certainly. Where?
>
>>
>> (b) The function inherits most of the weaknesses of fn:deep-equal. For
>> example it's clearly a design mistake that comments and processing
>> instructions are ignored without merging their adjacent sibling text nodes;
>> the rules for comparing typed and untyped content are also pretty unusable,
>> as is the treatment of whitespace. It's also unfortunate that fn:deep-equal
>> gives a different result from serializing into XML canonical form and
>> comparing the serializations. It's hard to know whether this matters
>> without considering use cases for this new function; but if we're defining
>> a new function, then we ought to fix the known faults in the old one.
>>
>
> The purpose of this function is not to correct design mistakes in
> *fn:deep-equal*, but just to provide a function that can be used for
> comparisons (such as of keys of maps even if maps could have any sequence
> as a key)  that would return *false()* instead of raising errors, that
> would be context-free and transitive. Another use case is to use it as a
> possible default value for a *comparer* function arguments for any
> functions that will need or benefit from having a comparer - argument.
>
>
>>
>> In practice I don't think it's possible to define a set of rules for
>> comparing node trees that satisfies a wide range of use cases without
>> parameterising it. Even the rules for canonical XML are parameterized
>> (IIRC) in regards to their handling of namespaces and whitespace.
>>
>
> Yes, and we can choose a set of default values for these parameters to be
> used in a *default-comparer-function*. These two ideas are complementary
> to each other, not in conflict with each other.
>
>
>>
>> So it boils down to: what are the use cases that this new function is
>> designed for?
>>
>
> As mentioned above. And any use-case for the separately/independently
> proposed, and included by you in the PDF checklist document, *deep-equal
> with options*, is also a use case for providing this function as a
> *default-comparer*.
>
>
>>
>> Michael Kay
>>
>
>
> Thanks,
> Dimitre
>
>
>>
>>
>> On 28 Sep 2022, at 04:17, Dimitre Novatchev <dnovatchev@gmail.com> wrote:
>>
>>
>>
>> On Tue, Sep 27, 2022 at 9:42 AM Norm Tovey-Walsh <norm@saxonica.com>
>> wrote:
>>
>>>
>>>
>>> Draft Minutes
>>>
>>> Summary of new and continuing actions [0/7]
>>>
>>>      * [ ] QT4CG-002-01: NW to incorporate email feedback and produce new
>>>        versions of the process documents.
>>>      * [ ] QT4CG-003-03: NW to tweak the CSS for function signatures to
>>> avoid
>>>        line breaks on - characters.
>>>      * [ ] QT4CG-002-10: BTW to coordinate some ideas about improving
>>>        diversity in the group
>>>      * [ ] QT4CG-004-01: MK (with DN and RD) to draft a new proposal for
>>>        variadic functions
>>>      * [ ] QT4CG-004-02: DN to make a proposal for deep-equal-safe for
>>> future
>>>        discussion
>>>      * [ ] QT4CG-004-03: MK to draft a pull request implementing
>>>        fn:intersperse
>>>      * [ ] QT4CG-004-04: DN to open an issue for the inverse of
>>>        fn:intersperse
>>>
>>>
>> The description of the function *fn:deep-equal-safe*() is in a pdf file
>> that can be found here:
>>
>>
>> *https://github.com/dnovatchev/FXSL-XSLT2/blob/master/fn-deep-equal-safe.pdf
>> <https://github.com/dnovatchev/FXSL-XSLT2/blob/master/fn-deep-equal-safe.pdf>*
>> Note: this document is essentially a compilation from the FO 3.1 of:
>>       *op:same-key*  (
>> https://www.w3.org/TR/xpath-functions-31/#func-same-key),  and
>>       *fn:deep-equal *(
>> https://www.w3.org/TR/xpath-functions-31/#func-deep-equal)
>>
>> Special care was taken to substitute the fn:deep-equal semantics that
>> either results in raising an error, or in possible intransitivity or
>> context-dependency. All such behavior has been substituted with the
>> corresponding behavior from *op:same-key*, which the 3.1 Spec claims to
>> be: "*deterministic, context-independent, and ·focus-independent*", etc.
>>
>> More specifically, to achieve this:
>>
>>
>>    1. No errors are raised, instead *false() *is returned
>>
>>    2. Strings are compared without any dependency on collations (
>>    *fn:codepoint-equal *is used in such comparisons)
>>
>>    3. Not using *eq *but instead every instance of *xs:double, xs:float
>>    and xs:decimal* is represented exactly as a decimal number provided
>>    enough digits are available both before and after the decimal point.Unlike
>>    the *eq  *relation which converts both operands to *xs:double* values,
>>    possibly losing precision in the process, this comparison is transitive
>>
>>    4. *fn:deep-equal* is used in comparing values having a variety of
>>    date, time, year, month day types so that, unlike when using *eq*, no
>>    error is raised when comparing values of different types, but just
>>    *fasle()* is returned. Also, unlike when using the *eq *operator,
>>    this comparison has no dependency on implicit time-zone, meaning no
>>    dependency on this aspect of the dynamic context.
>>
>>
>> The goal of this function description is to serve as a starting point for
>> discussion about possible options for fn:deep-equal.
>>
>> Any comments will be appreciated.
>>
>> Thanks,
>> Dimitre
>>
>>
>>
Received on Wednesday, 28 September 2022 15:09:26 UTC