- From: <bugzilla@wiggum.w3.org>
- Date: Mon, 21 Jul 2008 08:58:22 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5459 --- Comment #16 from Michael Dyck <jmdyck@ibiblio.org> 2008-07-21 08:58:20 --- Okay, I finally have a proposal for min/max/sum/avg. That's FS 7.2.10, corresponding to F+O 15.4 "Aggregate Functions" (minus fn:count). In the above discussion, Comments #1, 3-6, 11-12 deal with these functions. In addition to the unsoundness pointed out in comment #3, there are a few other mistakes in the current rules: General: In STA 2 / rules 1, 3, 5, 7, we say (roughly) prime(Type) = empty but this can never be true. The proper judgment is simply Type = empty In STA 2 / rules 2, 4, 6, 8, we say (roughly) prime(Type) = FormalItemType1, ..., FormaItemTypeN but this can never be true for N>1. Instead, each ',' should be a '|'. min + max: In STA 2 / rules 2, 4, the set of target types should include xs:anyURI and xs:boolean. sum: STA 1 / rule 1 refers to expressions when it should be talking about types. STA 2 / rule 7 should deal with passing an empty seq to the *two-arg* version of sum. In STA 2 / rule 8, the set of target types should include xs:dayTimeDuration. (See Bug #5460). The prose says "The result type is the union of the target type and the type of the second argument." but the STA 2 / rule 8 doesn't do that. Instead, it requires that the type of the second arg be a subtype of the target type. This has the effect that simple cases such as the first example in XQuery 15.4.5.1 fail to type-check. In STA 2 / rule 8, the use of aggregate_quantifier in the conclusion is incorrect. (Unlike min, max, and avg, passing an empty sequence to sum doesn't imply that the result will be empty.) In addition, there are a few matters that, while not incorrect, could be improved: In all rules in STA 2, the call to convert_untypedAtomic is redundant, because normalization has wrapped each argument in a call to fs:convert-simple-operand(). Since we know we're only working with atomic types, occurrences of FormalItemType could be narrowed to AtomicTypeName. The rules for min and max are the same except for the function name. The following proposal handles all of the above. I believe that my proposal is more or less in agreement with that of Comment #11 (although it looks fairly different). However, note that rule 2 in Comment #11 has a problem: since xs:integer is a subtype of xs:decimal, any Type2 satisfying T = xs:integer will also satisfy T = xs:decimal, leading to two possible conclusions re the result type, which is not what we want. In comment #15, I got around the integer/decimal problem by introducing the 'has base numeric type' judgment, which I've generalized into a 'has base atomic type' judgment: If Type1 is (or is a subtype of) any primitive atomic type other than xs:decimal, then that is its base atomic type. Type1 <: Type2 Type2 is a primitive atomic type not(Type2 = xs:decimal) -------------- Type1 has base atomic type Type2 Similarly for xs:integer (which isn't primitive). Type1 <: xs:integer -------------- Type1 has base atomic type xs:integer Finally, the special case for decimal: Type1 <: xs:decimal not( Type1 <: xs:integer ) -------------- Type1 has base atomic type xs:decimal ------------------------------- Given that, here's my proposal: Replace 7.2.10 / STA 1 with: statEnv |- (FN-URI,"sum")(Type1, xs:integer) : Type2 ---------------------------------------- statEnv |- (FN-URI,"sum")(Type1) : Type2 Replace 7.2.10 / STA 2 with: Now we can define the static typing rules for the aggregate functions. Note that the normalization rules of 4.1.5 will have wrapped each argument in calls to fn:data() and fs:convert-simple-operand() (with a 'prototypical value' of type xs:double). Thus, static analysis of the call to an aggregate function is guaranteed that any argument type is a subtype of xs:anyAtomicType*, with no occurrences of xs:untypedAtomic. The general approach for fn:min, fn:max, and fn:sum is as follows. First, we check that the input type(s) are acceptable for the function. Then we construct the (first) argument's prime type, a union of AtomicTypeNames. For each of the latter, we find the 'base atomic type' (a primitive atomic type or xs:integer). The union of these base atomic types is the basis for the result type, which may finally be adjusted for cardinality (fn:min and fn:max) or for the effect of the second argument (fn:sum). In addition, we provide a rule for the special case when the (first) argument has type 'empty'. For fn:min and fn:max, the permitted input types are all those for which ge(T,T) and le(T,T) are defined. An empty input sequence yields an empty result. LocalPart in {"min", "max"} Type = empty -------------------------------------------- statEnv |- (FN-URI,LocalPart)(Type) : empty LocalPart in {"min", "max"} Type1 <: Type3* Type3 in { fs:numeric, xs:anyURI|xs:string, xs:yearMonthDuration, xs:dayTimeDuration xs:date, xs:time, xs:dateTime, xs:boolean } prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN AtomicTypeName1 has base atomic type AtomicTypeName1' ... AtomicTypeNameN has base atomic type AtomicTypeNameN' AtomicTypeName1' | ... | AtomicTypeNameN' = Type4 --------------------- statEnv |- (FN-URI,LocalPart)(Type1) : Type4 . aggregate_quantifier(quantifier(Type1)) For fn:sum, the permitted input types (for the first argument) are all those for which plus(T,T) is defined. If you pass an empty sequence as the first argument, the function returns the value of the second argument. Type1 = empty Type2 <: xs:anyAtomicType? -------------------------------------------- statEnv |- (FN-URI,"sum")(Type1,Type2) : Type2 Type1 <: Type3* Type3 in {fs:numeric, xs:yearMonthDuration, xs:dayTimeDuration} Type2 <: xs:anyAtomicType? prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN AtomicTypeName1 has base atomic type AtomicTypeName1' ... AtomicTypeNameN has base atomic type AtomicTypeNameN' AtomicTypeName1' | ... | AtomicTypeNamen' = Type4 second arg contribution for sum of Type1 and Type2 is Type2' --------------------- statEnv |- (FN-URI,'sum')(Type1,Type2) : Type4 | Type2' The second argument's contribution (if any) to the above result type is determined as follows. If the first argument could be the empty sequence, we add the type of the second argument to the result type. Otherwise, the type of the second argument is ignored. empty <: Type1 -------------- second arg contribution for sum of Type1 and Type2 is Type2 not (empty <: Type1) -------------- second arg contribution for sum of Type1 and Type2 is none For the purposes of static type analysis, fn:avg($arg) is equivalent to fs:div( fn:sum($arg,()), fn:count($arg) ) Thus, we have the rule: statEnv |- (FN-URI,'sum')(Type1, empty) : Type2 statEnv |- (FS-URI,'div')(Type2, xs:integer) : Type3 ------------------------------------ statEnv |- (FN-URI,'avg')(Type1) : Type3 -------- (end of proposal) Note that the rule for min & max isn't as tight as some might like. The way that the main rules are written is actually "optimized" based on various properties of type-promotion as it is currently defined. If the "can be promoted to" relation is altered/augmented at some point in the future, we might have to change these rules. I spent some time thinking about a more future-proof approach, but it got fairly complicated. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Monday, 21 July 2008 08:58:57 UTC