[Bug 5459] [FS] Static type analysis for the fn:abs, fn:ceiling, fn:floor, fn:round, and fn:round-half-to-even functions from bugzilla@wiggum.w3.org on 2008-07-21 (public-qt-comments@w3.org from July 2008)

From: <bugzilla@wiggum.w3.org>
Date: Mon, 21 Jul 2008 08:58:22 +0000
To: public-qt-comments@w3.org
Message-Id: <E1KKrDq-0003en-Em@wiggum.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5459





--- Comment #16 from Michael Dyck <jmdyck@ibiblio.org>  2008-07-21 08:58:20 ---
Okay, I finally have a proposal for min/max/sum/avg. That's FS 7.2.10,
corresponding to F+O 15.4 "Aggregate Functions" (minus fn:count). In the
above discussion, Comments #1, 3-6, 11-12 deal with these functions.

In addition to the unsoundness pointed out in comment #3, there are a few
other mistakes in the current rules:

    General:

        In STA 2 / rules 1, 3, 5, 7, we say (roughly)
            prime(Type) = empty
        but this can never be true. The proper judgment is simply
            Type = empty

        In STA 2 / rules 2, 4, 6, 8, we say (roughly)
            prime(Type) = FormalItemType1, ..., FormaItemTypeN
        but this can never be true for N>1.
        Instead, each ',' should be a '|'.

    min + max:

        In STA 2 / rules 2, 4, the set of target types should
        include xs:anyURI and xs:boolean.

    sum:

        STA 1 / rule 1 refers to expressions when it should be talking
        about types.

        STA 2 / rule 7 should deal with passing an empty seq to the
        *two-arg* version of sum.

        In STA 2 / rule 8, the set of target types should include
        xs:dayTimeDuration. (See Bug #5460).

        The prose says "The result type is the union of the target type and
        the type of the second argument." but the STA 2 / rule 8 doesn't do
        that.  Instead, it requires that the type of the second arg be a
        subtype of the target type. This has the effect that simple cases
        such as the first example in XQuery 15.4.5.1 fail to type-check.

        In STA 2 / rule 8, the use of aggregate_quantifier in the
        conclusion is incorrect. (Unlike min, max, and avg, passing an
        empty sequence to sum doesn't imply that the result will be empty.)

In addition, there are a few matters that, while not incorrect, could be
improved:

    In all rules in STA 2, the call to convert_untypedAtomic is redundant,
    because normalization has wrapped each argument in a call to
    fs:convert-simple-operand().

    Since we know we're only working with atomic types, occurrences of
    FormalItemType could be narrowed to AtomicTypeName.

    The rules for min and max are the same except for the function name.



The following proposal handles all of the above.

I believe that my proposal is more or less in agreement with that of
Comment #11 (although it looks fairly different). However, note that
rule 2 in Comment #11 has a problem: since xs:integer is a subtype of
xs:decimal, any Type2 satisfying T = xs:integer will also satisfy
T = xs:decimal, leading to two possible conclusions re the result type,
which is not what we want. In comment #15, I got around the integer/decimal
problem by introducing the 'has base numeric type' judgment, which I've
generalized into a 'has base atomic type' judgment:

    If Type1 is (or is a subtype of) any primitive atomic type other than
    xs:decimal, then that is its base atomic type.

        Type1 <: Type2
        Type2 is a primitive atomic type
        not(Type2 = xs:decimal)
        --------------
        Type1 has base atomic type Type2

    Similarly for xs:integer (which isn't primitive).

        Type1 <: xs:integer
        --------------
        Type1 has base atomic type xs:integer

    Finally, the special case for decimal:

        Type1 <: xs:decimal
        not( Type1 <: xs:integer )
        --------------
        Type1 has base atomic type xs:decimal

-------------------------------
Given that, here's my proposal:

Replace 7.2.10 / STA 1 with:

        statEnv |- (FN-URI,"sum")(Type1, xs:integer) : Type2
        ----------------------------------------
        statEnv |- (FN-URI,"sum")(Type1) : Type2

Replace 7.2.10 / STA 2 with:

    Now we can define the static typing rules for the aggregate functions.
    Note that the normalization rules of 4.1.5 will have wrapped each
    argument in calls to fn:data() and fs:convert-simple-operand() (with a
    'prototypical value' of type xs:double).  Thus, static analysis of the
    call to an aggregate function is guaranteed that any argument type is
    a subtype of xs:anyAtomicType*, with no occurrences of
    xs:untypedAtomic.

    The general approach for fn:min, fn:max, and fn:sum is as follows.
    First, we check that the input type(s) are acceptable for the function.
    Then we construct the (first) argument's prime type, a union of
    AtomicTypeNames. For each of the latter, we find the 'base atomic type'
    (a primitive atomic type or xs:integer). The union of these base atomic
    types is the basis for the result type, which may finally be adjusted
    for cardinality (fn:min and fn:max) or for the effect of the second
    argument (fn:sum). In addition, we provide a rule for the special case
    when the (first) argument has type 'empty'.

    For fn:min and fn:max, the permitted input types are all those for
    which ge(T,T) and le(T,T) are defined.  An empty input sequence yields
    an empty result.

        LocalPart in {"min", "max"}
        Type = empty
        --------------------------------------------
        statEnv |- (FN-URI,LocalPart)(Type) : empty


        LocalPart in {"min", "max"}
        Type1 <: Type3*
        Type3 in {
            fs:numeric,
            xs:anyURI|xs:string,
            xs:yearMonthDuration, xs:dayTimeDuration
            xs:date, xs:time, xs:dateTime, xs:boolean }
        prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN
        AtomicTypeName1 has base atomic type AtomicTypeName1'
        ...
        AtomicTypeNameN has base atomic type AtomicTypeNameN'
        AtomicTypeName1' | ...  | AtomicTypeNameN' = Type4
        ---------------------
        statEnv |- (FN-URI,LocalPart)(Type1) :
            Type4 . aggregate_quantifier(quantifier(Type1))


    For fn:sum, the permitted input types (for the first argument) are all
    those for which plus(T,T) is defined.  If you pass an empty sequence as
    the first argument, the function returns the value of the second
    argument. 

        Type1 = empty
        Type2 <: xs:anyAtomicType?
        --------------------------------------------
        statEnv |- (FN-URI,"sum")(Type1,Type2) : Type2


        Type1 <: Type3*
        Type3 in {fs:numeric, xs:yearMonthDuration, xs:dayTimeDuration}
        Type2 <: xs:anyAtomicType?
        prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN
        AtomicTypeName1 has base atomic type AtomicTypeName1'
        ...
        AtomicTypeNameN has base atomic type AtomicTypeNameN'
        AtomicTypeName1' | ...  | AtomicTypeNamen' = Type4
        second arg contribution for sum of Type1 and Type2 is Type2'
        ---------------------
        statEnv |- (FN-URI,'sum')(Type1,Type2) : Type4 | Type2'

    The second argument's contribution (if any) to the above result type
    is determined as follows.  If the first argument could be the empty
    sequence, we add the type of the second argument to the result type.
    Otherwise, the type of the second argument is ignored.

        empty <: Type1
        --------------
        second arg contribution for sum of Type1 and Type2 is Type2

        not (empty <: Type1)
        --------------
        second arg contribution for sum of Type1 and Type2 is none


    For the purposes of static type analysis, fn:avg($arg) is equivalent to
        fs:div( fn:sum($arg,()), fn:count($arg) )
    Thus, we have the rule:

        statEnv |- (FN-URI,'sum')(Type1, empty) : Type2
        statEnv |- (FS-URI,'div')(Type2, xs:integer) : Type3
        ------------------------------------
        statEnv |- (FN-URI,'avg')(Type1) : Type3

--------
(end of proposal)

Note that the rule for min & max isn't as tight as some might like.

The way that the main rules are written is actually "optimized" based on
various properties of type-promotion as it is currently defined. If the
"can be promoted to" relation is altered/augmented at some point in the
future, we might have to change these rules. I spent some time thinking
about a more future-proof approach, but it got fairly complicated.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 21 July 2008 08:58:57 UTC