Re: [QB] ISSUE-31 (Aggregation hierarchies) Discussion and proposal from Richard Cyganiak on 2013-03-04 (public-gld-wg@w3.org from March 2013)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 4 Mar 2013 19:45:22 +0000
To: Dave Reynolds <dave.e.reynolds@gmail.com>
Cc: Government Linked Data Working Group <public-gld-wg@w3.org>
Message-Id: <F1D53862-69CD-4EDA-AB8A-D1E640AB5AEC@cyganiak.de>
On 3 Mar 2013, at 21:52, Dave Reynolds wrote:
>>> qb:AggregatableHierarchy (sub class of: qb:Hierarchy)
>>>    Indicates a hierarchy in which each parent concept is a disjoint union of its child concepts. So that measures such as simple counts *may* be aggregated up the hierarchy.
>> 
>> I don't quite see how this can work. If I know that each parent is the disjoint union of its children, I still don't know how to aggregate values. If the observations measure life expectancy, I need to average. If the observations measure population count, I need to sum. In other cases, I may have to take the minimum or maximum. It seems to me that this class only addresses one particular use case and is not a general solution to the problem of aggregating up the hierarchy.
> 
> The class can only define properties of the hierarchy, that's what it's about. You indeed need additional information about measures to know that a measure itself can be aggregated. The full solution would require our postponed ISSUE-30.
> 
> However, this half of the solution is already useful on its own and there are cases (e.g. when the unit of measure is "count") where aggregation is possible without additional knowledge.

My point here is that in order to implement useful functionality (such as the upward aggregation of numbers), the application needs to have two bits of information:

1. Is the hierarchy aggregatable?
2. What kind of aggregation operator is appropriate for the measure?

Your proposal is to extend QB to allow stating 1. in the DSD, but not 2 (which is postponed).

This means that applications will require either an extension vocabulary, or out-of-band or hardcoded knowledge, to address 2.

If an extension vocabulary or out-of-band knowledge is acceptable for addressing 2. in a given use case, then I have trouble seeing why that same approach wouldn't be acceptable for addressing 1.

In other words, I don't see the use case that is satisfied by 1 alone, without also addressing 2.

And given that this is a fairly complex issue, with multiple different proposals already on the table (qb:AggregatableHierarchy, xkos:coversExhaustively/coversMutuallyExclusive, the QB4OLAP approach), I feel that we shouldn't cast in concrete a design that offers only half a solution.

>> I propose to drop qb:AggregatableHierarchy. It can be easily defined in a use case specific extension.
> 
> I would certainly prefer to be able to state this property of a hierarchy and it doesn't seem like its presence should be problematic.
> 
> However, given the current timescale if this would hold up approval to move to Last Call then I'll withdraw it, with regret.

I'd prefer this course of action.

>>> qb:hierarchyRoot (domain: qb:Hierarchy, range: skos:Concept)
>>>   Specifies a root of the hierarchy. A hierarchy may have multiple roots but must have at least one.[7]
>> 
>> Fine. (Is there a general assumption that the members of a hierarchy still must be skos:Concepts? I think we don't need to make that assumption. Not making the assumption may avoid some confusion and may be less controversial. In that case, the range would simply be rdfs:Resource I think.)
> 
> OK, my current draft indeed has range skos:Concept because anything can be a skos:Concept. Happy to drop that and make the range rdfs:Resource.

I agree that anything can be a concept, but that doesn't mean it's necessarily useful to explicitly declare everything a concept :-) So, yes, I prefer rdfs:Resource as range.

(I certainly myself have asserted that anything can be a skos:Concept, but only when discussing the question whether some other class is disjoint from skos:Concept, e.g., geographical regions. I think there is no philosophical basis for asserting any such disjointness. And in practical terms, there are tradeoffs between assuming such a disjointness and not doing so, and if a data publisher *can* leave both options open, then they should do so.)

>>> qb:narrowingProperty (domain: qb:Hierarchy, range: rdf:Property)
>>>   Specifies a property which relates a parent concept in the hierarchy to a child concept. One of qb:narrowingProperty or qb:broadeningProperty must be given but it is not necessary to have both. Note that a child may have more than one parent.
>>> 
>>> qb:broadeningProperty (domain: qb:Hierarchy, range: rdf:Property)
>>>   Specifies a property which relates a child concept in the hierarchy to a parent concept. One of qb:narrowingProperty or qb:broadeningProperty must be given but it is not necessary to have both. Note that a child may have more than one parent.
> 
> 
>> One of these is redundant. I'd do away with broadeningProperty, rename narrowingProperty to something like parentChildProperty (seems more intuitive to me), and point out the design pattern of asserting
>> 
>>  ex:myHierarchy qb:parentChildProperty [ owl:inverseOf ex:parent ].
> 
> Hmm. That is certainly possible, not sure I'm convinced that's clearer. There are certainly cases (e.g. OS data) were both are available and I liked the ability to specify both of them.
> 
> However, I guess I'm prepared to go along with this.

You can specify both properties:

  ex:myHierarchy qb:parentChildProperty ex:child.
  ex:child owl:inverseOf ex:parent.

I fear that having both qb:narrowingProperty and qb:broadeningProperty compels us to specifying explicitly whether the presence of both implies an owl:inverseOf relationship. It would seem strange to me if the answer was "no"; and if the answer was "yes", then why provide a different way of saying what can already be said in OWL, and may in fact already be declared in the vocabulary's definition.

Best,
Richard



> 
>> I presume the well-formedness rule for hierarchies are something like:
>> 
>> 1. If a dimension has a hierarchy as code list, then values of that dimension in observations must be reachable from one of the roots in zero or more hops along the parentChildProperty, except if the parentChildProperty is a blank node.
>> 
>> 2. If the parentChildProperty has an inverse property P, then any dimension value must be reachable from one of the roots in zero or more inverse hops along P.
> 
> Seems reasonable.
> 
> Dave
> 
> 
>
Received on Monday, 4 March 2013 19:45:51 UTC