- From: <bugzilla@wiggum.w3.org>
- Date: Mon, 21 Jul 2008 08:58:22 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5459
--- Comment #16 from Michael Dyck <jmdyck@ibiblio.org> 2008-07-21 08:58:20 ---
Okay, I finally have a proposal for min/max/sum/avg. That's FS 7.2.10,
corresponding to F+O 15.4 "Aggregate Functions" (minus fn:count). In the
above discussion, Comments #1, 3-6, 11-12 deal with these functions.
In addition to the unsoundness pointed out in comment #3, there are a few
other mistakes in the current rules:
General:
In STA 2 / rules 1, 3, 5, 7, we say (roughly)
prime(Type) = empty
but this can never be true. The proper judgment is simply
Type = empty
In STA 2 / rules 2, 4, 6, 8, we say (roughly)
prime(Type) = FormalItemType1, ..., FormaItemTypeN
but this can never be true for N>1.
Instead, each ',' should be a '|'.
min + max:
In STA 2 / rules 2, 4, the set of target types should
include xs:anyURI and xs:boolean.
sum:
STA 1 / rule 1 refers to expressions when it should be talking
about types.
STA 2 / rule 7 should deal with passing an empty seq to the
*two-arg* version of sum.
In STA 2 / rule 8, the set of target types should include
xs:dayTimeDuration. (See Bug #5460).
The prose says "The result type is the union of the target type and
the type of the second argument." but the STA 2 / rule 8 doesn't do
that. Instead, it requires that the type of the second arg be a
subtype of the target type. This has the effect that simple cases
such as the first example in XQuery 15.4.5.1 fail to type-check.
In STA 2 / rule 8, the use of aggregate_quantifier in the
conclusion is incorrect. (Unlike min, max, and avg, passing an
empty sequence to sum doesn't imply that the result will be empty.)
In addition, there are a few matters that, while not incorrect, could be
improved:
In all rules in STA 2, the call to convert_untypedAtomic is redundant,
because normalization has wrapped each argument in a call to
fs:convert-simple-operand().
Since we know we're only working with atomic types, occurrences of
FormalItemType could be narrowed to AtomicTypeName.
The rules for min and max are the same except for the function name.
The following proposal handles all of the above.
I believe that my proposal is more or less in agreement with that of
Comment #11 (although it looks fairly different). However, note that
rule 2 in Comment #11 has a problem: since xs:integer is a subtype of
xs:decimal, any Type2 satisfying T = xs:integer will also satisfy
T = xs:decimal, leading to two possible conclusions re the result type,
which is not what we want. In comment #15, I got around the integer/decimal
problem by introducing the 'has base numeric type' judgment, which I've
generalized into a 'has base atomic type' judgment:
If Type1 is (or is a subtype of) any primitive atomic type other than
xs:decimal, then that is its base atomic type.
Type1 <: Type2
Type2 is a primitive atomic type
not(Type2 = xs:decimal)
--------------
Type1 has base atomic type Type2
Similarly for xs:integer (which isn't primitive).
Type1 <: xs:integer
--------------
Type1 has base atomic type xs:integer
Finally, the special case for decimal:
Type1 <: xs:decimal
not( Type1 <: xs:integer )
--------------
Type1 has base atomic type xs:decimal
-------------------------------
Given that, here's my proposal:
Replace 7.2.10 / STA 1 with:
statEnv |- (FN-URI,"sum")(Type1, xs:integer) : Type2
----------------------------------------
statEnv |- (FN-URI,"sum")(Type1) : Type2
Replace 7.2.10 / STA 2 with:
Now we can define the static typing rules for the aggregate functions.
Note that the normalization rules of 4.1.5 will have wrapped each
argument in calls to fn:data() and fs:convert-simple-operand() (with a
'prototypical value' of type xs:double). Thus, static analysis of the
call to an aggregate function is guaranteed that any argument type is
a subtype of xs:anyAtomicType*, with no occurrences of
xs:untypedAtomic.
The general approach for fn:min, fn:max, and fn:sum is as follows.
First, we check that the input type(s) are acceptable for the function.
Then we construct the (first) argument's prime type, a union of
AtomicTypeNames. For each of the latter, we find the 'base atomic type'
(a primitive atomic type or xs:integer). The union of these base atomic
types is the basis for the result type, which may finally be adjusted
for cardinality (fn:min and fn:max) or for the effect of the second
argument (fn:sum). In addition, we provide a rule for the special case
when the (first) argument has type 'empty'.
For fn:min and fn:max, the permitted input types are all those for
which ge(T,T) and le(T,T) are defined. An empty input sequence yields
an empty result.
LocalPart in {"min", "max"}
Type = empty
--------------------------------------------
statEnv |- (FN-URI,LocalPart)(Type) : empty
LocalPart in {"min", "max"}
Type1 <: Type3*
Type3 in {
fs:numeric,
xs:anyURI|xs:string,
xs:yearMonthDuration, xs:dayTimeDuration
xs:date, xs:time, xs:dateTime, xs:boolean }
prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN
AtomicTypeName1 has base atomic type AtomicTypeName1'
...
AtomicTypeNameN has base atomic type AtomicTypeNameN'
AtomicTypeName1' | ... | AtomicTypeNameN' = Type4
---------------------
statEnv |- (FN-URI,LocalPart)(Type1) :
Type4 . aggregate_quantifier(quantifier(Type1))
For fn:sum, the permitted input types (for the first argument) are all
those for which plus(T,T) is defined. If you pass an empty sequence as
the first argument, the function returns the value of the second
argument.
Type1 = empty
Type2 <: xs:anyAtomicType?
--------------------------------------------
statEnv |- (FN-URI,"sum")(Type1,Type2) : Type2
Type1 <: Type3*
Type3 in {fs:numeric, xs:yearMonthDuration, xs:dayTimeDuration}
Type2 <: xs:anyAtomicType?
prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN
AtomicTypeName1 has base atomic type AtomicTypeName1'
...
AtomicTypeNameN has base atomic type AtomicTypeNameN'
AtomicTypeName1' | ... | AtomicTypeNamen' = Type4
second arg contribution for sum of Type1 and Type2 is Type2'
---------------------
statEnv |- (FN-URI,'sum')(Type1,Type2) : Type4 | Type2'
The second argument's contribution (if any) to the above result type
is determined as follows. If the first argument could be the empty
sequence, we add the type of the second argument to the result type.
Otherwise, the type of the second argument is ignored.
empty <: Type1
--------------
second arg contribution for sum of Type1 and Type2 is Type2
not (empty <: Type1)
--------------
second arg contribution for sum of Type1 and Type2 is none
For the purposes of static type analysis, fn:avg($arg) is equivalent to
fs:div( fn:sum($arg,()), fn:count($arg) )
Thus, we have the rule:
statEnv |- (FN-URI,'sum')(Type1, empty) : Type2
statEnv |- (FS-URI,'div')(Type2, xs:integer) : Type3
------------------------------------
statEnv |- (FN-URI,'avg')(Type1) : Type3
--------
(end of proposal)
Note that the rule for min & max isn't as tight as some might like.
The way that the main rules are written is actually "optimized" based on
various properties of type-promotion as it is currently defined. If the
"can be promoted to" relation is altered/augmented at some point in the
future, we might have to change these rules. I spent some time thinking
about a more future-proof approach, but it got fairly complicated.
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 21 July 2008 08:58:57 UTC