RE: ISSUE-29 (Well-formedness): Criteria for well-formedness [Data Cube Vocabulary] + FW: [publishing-statistical-data] Re: qb:DimensionProperty subClassOf qb:CodedProperty ?

Hi Dave,

> I'm slightly inclined to make it a separate issue and state it is closely linked to
> ISSUE-29, but either appropriate would be fine by me.
Having the issue of code lists and hierarchies for Literal values as a separate issue may help us keeping a more detailed overview of issues so it is fine for me.

Elaborating on the issue, I see the problem of more complex queries: 

Assuming, we want to query for observations that feature a certain skos:Concept as a value of a certain dimension.

We would select all observations that have as dimension value either 1) the literal value which is linked via skos:notation from the skos:Concept. Using skos:notation would make sure that the literal value is a unique identifier of the literal value within the scope of the concept scheme (code list / hierarchy) [1]; or 2) the skos:Concept itself.

Since this would require a UNION I reckon it would not be that fast to compute.

Best,

Benedikt


[1] <http://www.w3.org/TR/skos-reference/#notations> 



> -----Original Message-----
> From: Dave Reynolds [mailto:dave.e.reynolds@gmail.com]
> Sent: Thursday, March 22, 2012 11:44 AM
> To: Benedikt Kämpgen
> Cc: public-gld-wg@w3.org
> Subject: Re: ISSUE-29 (Well-formedness): Criteria for well-formedness [Data
> Cube Vocabulary] + FW: [publishing-statistical-data] Re:
> qb:DimensionProperty subClassOf qb:CodedProperty ?
> 
> Hi Benedikt,
> 
> On 22/03/12 10:07, Benedikt Kämpgen wrote:
> > Hello,
> >
> > I suggest to add the issue of creating hierarchies of literal values, which we
> discussed a while ago (see below or at [1]) in the QB Google group, as an own
> issue or as an aspect of ISSUE-29.
> 
> It's clearly an issue that has come up for you in practice so it is worth
> capturing.
> 
> I'm slightly inclined to make it a separate issue and state it is closely linked to
> ISSUE-29, but either appropriate would be fine by me.
> 
> Dave
> 
> >
> > The problem is the following:
> >
> > If observations are using literal values as dimension values, how can one
> create a hierarchy (qb:codeList skos:ConceptScheme) of these values?
> >
> > One possible solution is to create instances of skos:Concept representing
> and linking to those literal values using skos:notation.
> >
> > However, this makes it more difficult for applications to query for
> observations, since they do not know whether the observations will actually
> use literal values or skos:Concepts.
> >
> > Best,
> >
> > Benedikt
> >
> > [1]<http://groups.google.com/group/publishing-statistical-data/browse_
> > thread/thread/9903b29e670c5c94/c2386716f7b69cb4?lnk=gst>
> >
> >
> >
> > -----Original Message-----
> > From: publishing-statistical-data@googlegroups.com
> > [mailto:publishing-statistical-data@googlegroups.com] On Behalf Of
> > Dave Reynolds
> > Sent: Tuesday, December 06, 2011 11:53 PM
> > To: publishing-statistical-data@googlegroups.com
> > Cc: Dominik Siegele
> > Subject: Re: [publishing-statistical-data] Re: qb:DimensionProperty
> subClassOf qb:CodedProperty ?
> >
> > Hi Benedikt,
> >
> > I generally agree with your approach and with Richard's comments.
> >
> > The notion of defining skos:Concepts but then using the literal values in the
> data is a little odd but I can see some point to it.
> >
> > The one thing I would point out is that for dates the Interval URI Set [1] and
> associated service may be useful to you. We've tended to use that for all the
> Data Cube sets that we've published. One advantage to using the resources
> as the dimension values instead of date literals is that it makes to possible to
> query the data via other properties of those resources. For example with
> data at a day resolution we can include the Interval Set properties in the
> published data and so pick out values for a month or year or government
> year without having to do time calculations in the sparql. If your data is only
> at calendar year resolution that may be less relevant to you.
> >
> > Cheers,
> > Dave
> >
> > [1]
> > http://www.epimorphics.com/web/wiki/using-interval-set-uris-statistica
> > l-data
> >
> > On Tue, 2011-12-06 at 22:05 +0000, Richard Cyganiak wrote:
> >> Hi Benedikt,
> >>
> >> On 6 Dec 2011, at 20:37, Benedikt Kämpgen wrote:
> >>> Given the task to represent date as Date Literal, geo as specific instances
> of NUTSRegion, and sex as instances of skos:Concept for the
> male/female/total. We have this task e.g. at [1] where we are representing
> Eurostat [2] data using the RDF Data Cube Vocabulary (QB).
> >>>
> >>> The approach that we now consider to implement:
> >>> *Optional: rdfs:range for DimensionProperty in order to have an
> understanding of what kinds of things are represented by the members,
> e.g., xsd:date for dc:date and NUTSRegion for geo.
> >>
> >> That makes sense. I would always specify this when no qb:codeList is
> present.
> >>
> >>> *qb:codeList for DimensionProperty in order to list the possible
> >>> skos:Concepts that represent values of the dimension, e.g.,
> >>> estat:y2003 for one specific year, estat:AT for one specific
> >>> country, and estat:F for one specific gender
> >>
> >> I would use qb:codeList only with skos:ConceptSchemes. It looks like your
> intention is to create concept schemes for all dimensions, including time. I
> think that's ok.
> >>
> >>> *skos:Concepts have as rdfs:seeAlso instances linked that they
> >>> represent, e.g., estat:AT rdfs:seeAlso dbpedia:Austria
> >>
> >> I would use skos:closeMatch (or skos:exactMatch if you're a radical; or
> skos:relatedMatch if you're a coward) instead of rdfs:seeAlso.
> >>
> >> This has the consequence of typing dbpedia:Austria as a skos:Concept,
> but that surely is fine, given the definition of skos:Concept:
> >>
> >> [[
> >> A SKOS concept can be viewed as an idea or notion; a unit of thought.
> However, what constitutes a unit of thought is subjective, and this definition
> is meant to be suggestive, rather than restrictive.
> >> ]]
> >>
> >> Some might say: “A country is not an idea! It exists in the real world!” But I
> don't find that such arguments hold water. Countries are created and
> abolished through legislation and treaties; and decades can pass where large
> parts of mankind disagree on the question whether a particular entity is a
> country or not. Countries are really just the taxonomist's business objects of
> political geographers.
> >>
> >>> and as rdfs:label Literal values linked that they represent, e.g.,
> >>> estat:y2003 rdfs:label "2003"^^xsd:date
> >>
> >> Use skos:notation instead of rdfs:label. Note that "2003"^^xsd:date is ill-
> typed. It has to be "2003-01-01"^^xsd:date, or "2003"^^xsd:gYear.
> >>
> >>> * The observations can either use the represented instances
> >>> directly, e.g., dbpedia:Austria and "2003"^^xsd:date, or they can
> >>> use the skos:Concept representations, e.g., estat:F
> >>
> >> I agree that this makes sense in the case of literals (dates in particular).
> For URIs, it seems overly complicated. Why not just define a concept scheme
> that directly includes dbpedia:Austria as a concept using skos:inScheme?
> >>
> >>> This approach brings the following advantages:
> >>> * We can limit the number of literal values of a specific dimension
> >>
> >> Right, and I like this. The logic would be: If a dimension property has a
> qb:codeList and is used with literal values, then assume that the literal values
> are the skos:notations of the concepts in the code list.
> >>
> >>> * We can have relationships between dimension values, e.g., for
> >>> hierarchies, and still use the literal values or the non-information
> >>> URIs in the observations
> >>
> >> Yup.
> >>
> >>> * Publishers may still represent skos:Concepts as possible dimension
> values and can link them using owl:sameAs to the actual represented values.
> >>
> >> Do not *EVER* link to a skos:Concept using owl:sameAs! ;-)
> >>
> >> Seriously, skos:xxxMatch is always better for that purpose.
> >>
> >> Best,
> >> Richard
> >>
> >>
> >>
> >>> Although this may be wrong, as it would state the term (e.g.,
> skos:Concept Germany) and the actual thing (dbpedia:Germany) as being
> the same thing, applications that would work with the explained approach
> would also work here.
> >>>
> >>> I would be glad to hear your opinions on this.
> >>>
> >>> Regards,
> >>>
> >>> Benedikt
> >>>
> >>> [1]<http://estatwrap.ontologycentral.com/page/teilm020>
> >>> [2]<http://estatwrap.ontologycentral.com/>
> >>>
> >>> --
> >>> AIFB, Karlsruhe Institute of Technology (KIT)
> >>> Phone: +49 721 608-47946
> >>> Email: benedikt.kaempgen@kit.edu
> >>> Web: http://www.aifb.kit.edu/web/Hauptseite/en
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: publishing-statistical-data@googlegroups.com
> >>>> [mailto:publishing- statistical-data@googlegroups.com] On Behalf Of
> >>>> Benedikt Kämpgen
> >>>> Sent: Friday, October 28, 2011 11:39 AM
> >>>> To: publishing-statistical-data@googlegroups.com
> >>>> Cc: Dominik Siegele
> >>>> Subject: RE: [publishing-statistical-data] Re: qb:DimensionProperty
> >>>> subClassOf qb:CodedProperty ?
> >>>>
> >>>> Hi Dave,
> >>>>
> >>>> Thanks for your answer.
> >>>>
> >>>>> This is one area where I think the current QB vocabulary could do
> >>>>> with some extension. It would be nice to be able to define the
> >>>>> property that is used for hierarchical relationships between
> >>>>> dimensions values when those are not skos:Concepts (and thus
> skos:broader/narrower).
> >>>> Dito.
> >>>>
> >>>> For example, we have now tried to model it for Eurostat correctly,
> >>>> not using skos:ConceptScheme, but the actual regions from
> >>>> nuts:NUTSRegion, see [1] and definition of geo dimension.
> >>>>
> >>>> However, with this approach we cannot say anymore, that only
> >>>> certain region are used in the dataset.
> >>>>
> >>>> Best,
> >>>>
> >>>> Benedikt
> >>>>
> >>>>
> >>>> [1] http://estatwrap.ontologycentral.com/dsd/tsieb010
> >>>>
> >>>>
> >>>> --
> >>>> AIFB, Karlsruhe Institute of Technology (KIT)
> >>>> Phone: +49 721 608-47946
> >>>> Email: benedikt.kaempgen@kit.edu
> >>>> Web: http://www.aifb.kit.edu/web/Hauptseite/en
> >>>>
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: publishing-statistical-data@googlegroups.com
> >>>>> [mailto:publishing- statistical-data@googlegroups.com] On Behalf
> >>>>> Of Dave Reynolds
> >>>>> Sent: Wednesday, October 19, 2011 11:16 AM
> >>>>> To: publishing-statistical-data@googlegroups.com
> >>>>> Subject: RE: [publishing-statistical-data] Re:
> >>>>> qb:DimensionProperty
> >>>> subClassOf
> >>>>> qb:CodedProperty ?
> >>>>>
> >>>>> On Wed, 2011-10-19 at 11:01 +0200, Benedikt Kämpgen wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have a follow-up question regarding dimensions and code lists:
> >>>>>>
> >>>>>> In QB, a dimension value used by an observation typically is an
> >>>>>> instance of skos:Concept from a skos:ConceptScheme.
> >>>>>
> >>>>> Not required, can also be instances of some defined
> >>>>> [rdfs|owl]:Class
> >>>>>
> >>>>>> I have seen some examples of
> >>>>>> datasets [1,2], that then link from such instances of
> >>>>>> skos:Concept with owl:sameAs to entities they represent, e.g.
> >>>>>> <http://dbpedia.org/resource/Spain>. I guess this is fine from a
> >>>>>> practical point of view, but is it not semantically incorrect;
> >>>>>
> >>>>> Indeed, not correct.
> >>>>>
> >>>>>> I am wondering whether
> >>>>>> this is really intended and will lead to problems, later,
> >>>>>
> >>>>> In the datasets we've published we've tended to use "normal"
> >>>>> resources directly for things like geographies and time periods
> >>>>> and only use skos:Concepts for things that are definitely
> >>>>> classification schemes - e.g. gender or age groups.
> >>>>>
> >>>>>> e.g., if we want
> >>>>>> to define hierarchies on dimension values.
> >>>>>
> >>>>> This is one area where I think the current QB vocabulary could do
> >>>>> with some extension. It would be nice to be able to define the
> >>>>> property that is used for hierarchical relationships between
> >>>>> dimensions values when those are not skos:Concepts (and thus
> skos:broader/narrower).
> >>>>>
> >>>>> Dave
> >>>>>
> >>>>> --
> >>>>> Epimorphics Ltd                        www.epimorphics.com
> >>>>> Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
> >>>>> Tel: 01275 399069                     Mobile: 07906 628814
> >>>>>
> >>>>> Epimorphics Ltd. is a limited company registered in England
> >>>>> (number
> >>>>> 7016688)
> >>>>> Registered address: Court Lodge, 105 High Street, Portishead,
> >>>>> Bristol
> >>>>> BS20 6PT, UK
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Benedikt
> >>>>>>
> >>>>>>
> >>>>>> [1]<http://estatwrap.ontologycentral.com/data/tsieb010>
> >>>>>> [2]<http://estatwrap.ontologycentral.com/dic/geo#ES>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> AIFB, Karlsruhe Institute of Technology (KIT)
> >>>>>> Phone: +49 721 608-47946
> >>>>>> Email: benedikt.kaempgen@kit.edu
> >>>>>> Web: http://www.aifb.kit.edu/web/Hauptseite/en
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: publishing-statistical-data@googlegroups.com
> >>>>>>> [mailto:publishing- statistical-data@googlegroups.com] On Behalf
> >>>>>>> Of Richard Cyganiak
> >>>>>>> Sent: Friday, September 23, 2011 8:02 PM
> >>>>>>> To: publishing-statistical-data@googlegroups.com
> >>>>>>> Subject: Re: [publishing-statistical-data] Re:
> >>>>>>> qb:DimensionProperty
> >>>>>> subClassOf
> >>>>>>> qb:CodedProperty ?
> >>>>>>>
> >>>>>>> Hi Bill,
> >>>>>>>
> >>>>>>> On 23 Sep 2011, at 08:05, BillRoberts wrote:
> >>>>>>>> But I see your point Richard. Maybe I'm thinking too much like
> >>>>>>>> a physicist instead of a statistician!
> >>>>>>>>
> >>>>>>>> In practice most of these continuous variables are 'chunked':
> >>>>>>>> time into years or months, space into a list of points or
> >>>>>>>> regions, age into
> >>>>>>>> 5 year bands etc etc
> >>>>>>>
> >>>>>>> Exactly. Statistics tend to be aggregate data, where many
> >>>>>>> individual
> >>>>>> “events”
> >>>>>>> or “facts” (which often have continuous attributes) have been
> >>>>>>> lumped
> >>>>>> together
> >>>>>>> into a single observation. The values along a number of
> >>>>>>> dimensions
> >>>> have
> >>>>>> been
> >>>>>>> “classified” into discrete ranges, and everything that falls
> >>>>>>> into the same bucket (cube cell) has been “tabulated” into a
> >>>>>>> single total or average
> >>>>>> number,
> >>>>>>> and we're interested only in these totals.
> >>>>>>>
> >>>>>>> This aggregation can remove a lot of valuable detail, but also
> >>>>>>> makes it
> >>>>>> easier
> >>>>>>> to ask higher-level questions (especially for dimensions where
> >>>>>>> the classification is hierarchical), and may make the datasets
> >>>>>>> smaller and
> >>>> may
> >>>>>>> anonymize the data to some extent.
> >>>>>>>
> >>>>>>>
> >>>>>>> If you have some values that you *truly* want to model as
> >>>>>>> continuous,
> >>>> then
> >>>>>> you
> >>>>>>> should ask yourself if you aren't really looking at a measure
> >>>>>>> rather than
> >>>>>> a
> >>>>>>> dimension.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Richard
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> So it's making a bit more sense to me now.
> >>>>>>>>
> >>>>>>>> On Sep 22, 7:08 pm, Richard Cyganiak<rich...@cyganiak.de>
> wrote:
> >>>>>>>>> On 22 Sep 2011, at 16:18, BillRoberts wrote:
> >>>>>>>>>
> >>>>>>>>>> But there are many dimension properties with values that are
> >>>>>>>>>> not
> >>>> coded
> >>>>>> or
> >>>>>>> codelist-able.
> >>>>>>>>>
> >>>>>>>>> With the exception of time, I don't think that's true.
> >>>>>>>>>
> >>>>>>>>> Can you give an example of some other dimension whose values
> >>>> don't come
> >>>>>>> from a controlled/managed set of terms that ought to be
> >>>>>>> represented
> >>>> as a
> >>>>>> SKOS
> >>>>>>> concept scheme or RDFS class?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Richard
> >>>>>>
> >>>>>
> >>>
> >>
> >
> >
> >
> >

Received on Tuesday, 1 May 2012 23:24:05 UTC