- From: Benedikt Kämpgen <kaempgen@fzi.de>
- Date: Wed, 2 May 2012 01:23:38 +0200
- To: 'Dave Reynolds' <dave.e.reynolds@gmail.com>, Benedikt Kämpgen <kaempgen@fzi.de>
- CC: <public-gld-wg@w3.org>
Hi Dave, > I'm slightly inclined to make it a separate issue and state it is closely linked to > ISSUE-29, but either appropriate would be fine by me. Having the issue of code lists and hierarchies for Literal values as a separate issue may help us keeping a more detailed overview of issues so it is fine for me. Elaborating on the issue, I see the problem of more complex queries: Assuming, we want to query for observations that feature a certain skos:Concept as a value of a certain dimension. We would select all observations that have as dimension value either 1) the literal value which is linked via skos:notation from the skos:Concept. Using skos:notation would make sure that the literal value is a unique identifier of the literal value within the scope of the concept scheme (code list / hierarchy) [1]; or 2) the skos:Concept itself. Since this would require a UNION I reckon it would not be that fast to compute. Best, Benedikt [1] <http://www.w3.org/TR/skos-reference/#notations> > -----Original Message----- > From: Dave Reynolds [mailto:dave.e.reynolds@gmail.com] > Sent: Thursday, March 22, 2012 11:44 AM > To: Benedikt Kämpgen > Cc: public-gld-wg@w3.org > Subject: Re: ISSUE-29 (Well-formedness): Criteria for well-formedness [Data > Cube Vocabulary] + FW: [publishing-statistical-data] Re: > qb:DimensionProperty subClassOf qb:CodedProperty ? > > Hi Benedikt, > > On 22/03/12 10:07, Benedikt Kämpgen wrote: > > Hello, > > > > I suggest to add the issue of creating hierarchies of literal values, which we > discussed a while ago (see below or at [1]) in the QB Google group, as an own > issue or as an aspect of ISSUE-29. > > It's clearly an issue that has come up for you in practice so it is worth > capturing. > > I'm slightly inclined to make it a separate issue and state it is closely linked to > ISSUE-29, but either appropriate would be fine by me. > > Dave > > > > > The problem is the following: > > > > If observations are using literal values as dimension values, how can one > create a hierarchy (qb:codeList skos:ConceptScheme) of these values? > > > > One possible solution is to create instances of skos:Concept representing > and linking to those literal values using skos:notation. > > > > However, this makes it more difficult for applications to query for > observations, since they do not know whether the observations will actually > use literal values or skos:Concepts. > > > > Best, > > > > Benedikt > > > > [1]<http://groups.google.com/group/publishing-statistical-data/browse_ > > thread/thread/9903b29e670c5c94/c2386716f7b69cb4?lnk=gst> > > > > > > > > -----Original Message----- > > From: publishing-statistical-data@googlegroups.com > > [mailto:publishing-statistical-data@googlegroups.com] On Behalf Of > > Dave Reynolds > > Sent: Tuesday, December 06, 2011 11:53 PM > > To: publishing-statistical-data@googlegroups.com > > Cc: Dominik Siegele > > Subject: Re: [publishing-statistical-data] Re: qb:DimensionProperty > subClassOf qb:CodedProperty ? > > > > Hi Benedikt, > > > > I generally agree with your approach and with Richard's comments. > > > > The notion of defining skos:Concepts but then using the literal values in the > data is a little odd but I can see some point to it. > > > > The one thing I would point out is that for dates the Interval URI Set [1] and > associated service may be useful to you. We've tended to use that for all the > Data Cube sets that we've published. One advantage to using the resources > as the dimension values instead of date literals is that it makes to possible to > query the data via other properties of those resources. For example with > data at a day resolution we can include the Interval Set properties in the > published data and so pick out values for a month or year or government > year without having to do time calculations in the sparql. If your data is only > at calendar year resolution that may be less relevant to you. > > > > Cheers, > > Dave > > > > [1] > > http://www.epimorphics.com/web/wiki/using-interval-set-uris-statistica > > l-data > > > > On Tue, 2011-12-06 at 22:05 +0000, Richard Cyganiak wrote: > >> Hi Benedikt, > >> > >> On 6 Dec 2011, at 20:37, Benedikt Kämpgen wrote: > >>> Given the task to represent date as Date Literal, geo as specific instances > of NUTSRegion, and sex as instances of skos:Concept for the > male/female/total. We have this task e.g. at [1] where we are representing > Eurostat [2] data using the RDF Data Cube Vocabulary (QB). > >>> > >>> The approach that we now consider to implement: > >>> *Optional: rdfs:range for DimensionProperty in order to have an > understanding of what kinds of things are represented by the members, > e.g., xsd:date for dc:date and NUTSRegion for geo. > >> > >> That makes sense. I would always specify this when no qb:codeList is > present. > >> > >>> *qb:codeList for DimensionProperty in order to list the possible > >>> skos:Concepts that represent values of the dimension, e.g., > >>> estat:y2003 for one specific year, estat:AT for one specific > >>> country, and estat:F for one specific gender > >> > >> I would use qb:codeList only with skos:ConceptSchemes. It looks like your > intention is to create concept schemes for all dimensions, including time. I > think that's ok. > >> > >>> *skos:Concepts have as rdfs:seeAlso instances linked that they > >>> represent, e.g., estat:AT rdfs:seeAlso dbpedia:Austria > >> > >> I would use skos:closeMatch (or skos:exactMatch if you're a radical; or > skos:relatedMatch if you're a coward) instead of rdfs:seeAlso. > >> > >> This has the consequence of typing dbpedia:Austria as a skos:Concept, > but that surely is fine, given the definition of skos:Concept: > >> > >> [[ > >> A SKOS concept can be viewed as an idea or notion; a unit of thought. > However, what constitutes a unit of thought is subjective, and this definition > is meant to be suggestive, rather than restrictive. > >> ]] > >> > >> Some might say: “A country is not an idea! It exists in the real world!” But I > don't find that such arguments hold water. Countries are created and > abolished through legislation and treaties; and decades can pass where large > parts of mankind disagree on the question whether a particular entity is a > country or not. Countries are really just the taxonomist's business objects of > political geographers. > >> > >>> and as rdfs:label Literal values linked that they represent, e.g., > >>> estat:y2003 rdfs:label "2003"^^xsd:date > >> > >> Use skos:notation instead of rdfs:label. Note that "2003"^^xsd:date is ill- > typed. It has to be "2003-01-01"^^xsd:date, or "2003"^^xsd:gYear. > >> > >>> * The observations can either use the represented instances > >>> directly, e.g., dbpedia:Austria and "2003"^^xsd:date, or they can > >>> use the skos:Concept representations, e.g., estat:F > >> > >> I agree that this makes sense in the case of literals (dates in particular). > For URIs, it seems overly complicated. Why not just define a concept scheme > that directly includes dbpedia:Austria as a concept using skos:inScheme? > >> > >>> This approach brings the following advantages: > >>> * We can limit the number of literal values of a specific dimension > >> > >> Right, and I like this. The logic would be: If a dimension property has a > qb:codeList and is used with literal values, then assume that the literal values > are the skos:notations of the concepts in the code list. > >> > >>> * We can have relationships between dimension values, e.g., for > >>> hierarchies, and still use the literal values or the non-information > >>> URIs in the observations > >> > >> Yup. > >> > >>> * Publishers may still represent skos:Concepts as possible dimension > values and can link them using owl:sameAs to the actual represented values. > >> > >> Do not *EVER* link to a skos:Concept using owl:sameAs! ;-) > >> > >> Seriously, skos:xxxMatch is always better for that purpose. > >> > >> Best, > >> Richard > >> > >> > >> > >>> Although this may be wrong, as it would state the term (e.g., > skos:Concept Germany) and the actual thing (dbpedia:Germany) as being > the same thing, applications that would work with the explained approach > would also work here. > >>> > >>> I would be glad to hear your opinions on this. > >>> > >>> Regards, > >>> > >>> Benedikt > >>> > >>> [1]<http://estatwrap.ontologycentral.com/page/teilm020> > >>> [2]<http://estatwrap.ontologycentral.com/> > >>> > >>> -- > >>> AIFB, Karlsruhe Institute of Technology (KIT) > >>> Phone: +49 721 608-47946 > >>> Email: benedikt.kaempgen@kit.edu > >>> Web: http://www.aifb.kit.edu/web/Hauptseite/en > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: publishing-statistical-data@googlegroups.com > >>>> [mailto:publishing- statistical-data@googlegroups.com] On Behalf Of > >>>> Benedikt Kämpgen > >>>> Sent: Friday, October 28, 2011 11:39 AM > >>>> To: publishing-statistical-data@googlegroups.com > >>>> Cc: Dominik Siegele > >>>> Subject: RE: [publishing-statistical-data] Re: qb:DimensionProperty > >>>> subClassOf qb:CodedProperty ? > >>>> > >>>> Hi Dave, > >>>> > >>>> Thanks for your answer. > >>>> > >>>>> This is one area where I think the current QB vocabulary could do > >>>>> with some extension. It would be nice to be able to define the > >>>>> property that is used for hierarchical relationships between > >>>>> dimensions values when those are not skos:Concepts (and thus > skos:broader/narrower). > >>>> Dito. > >>>> > >>>> For example, we have now tried to model it for Eurostat correctly, > >>>> not using skos:ConceptScheme, but the actual regions from > >>>> nuts:NUTSRegion, see [1] and definition of geo dimension. > >>>> > >>>> However, with this approach we cannot say anymore, that only > >>>> certain region are used in the dataset. > >>>> > >>>> Best, > >>>> > >>>> Benedikt > >>>> > >>>> > >>>> [1] http://estatwrap.ontologycentral.com/dsd/tsieb010 > >>>> > >>>> > >>>> -- > >>>> AIFB, Karlsruhe Institute of Technology (KIT) > >>>> Phone: +49 721 608-47946 > >>>> Email: benedikt.kaempgen@kit.edu > >>>> Web: http://www.aifb.kit.edu/web/Hauptseite/en > >>>> > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: publishing-statistical-data@googlegroups.com > >>>>> [mailto:publishing- statistical-data@googlegroups.com] On Behalf > >>>>> Of Dave Reynolds > >>>>> Sent: Wednesday, October 19, 2011 11:16 AM > >>>>> To: publishing-statistical-data@googlegroups.com > >>>>> Subject: RE: [publishing-statistical-data] Re: > >>>>> qb:DimensionProperty > >>>> subClassOf > >>>>> qb:CodedProperty ? > >>>>> > >>>>> On Wed, 2011-10-19 at 11:01 +0200, Benedikt Kämpgen wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I have a follow-up question regarding dimensions and code lists: > >>>>>> > >>>>>> In QB, a dimension value used by an observation typically is an > >>>>>> instance of skos:Concept from a skos:ConceptScheme. > >>>>> > >>>>> Not required, can also be instances of some defined > >>>>> [rdfs|owl]:Class > >>>>> > >>>>>> I have seen some examples of > >>>>>> datasets [1,2], that then link from such instances of > >>>>>> skos:Concept with owl:sameAs to entities they represent, e.g. > >>>>>> <http://dbpedia.org/resource/Spain>. I guess this is fine from a > >>>>>> practical point of view, but is it not semantically incorrect; > >>>>> > >>>>> Indeed, not correct. > >>>>> > >>>>>> I am wondering whether > >>>>>> this is really intended and will lead to problems, later, > >>>>> > >>>>> In the datasets we've published we've tended to use "normal" > >>>>> resources directly for things like geographies and time periods > >>>>> and only use skos:Concepts for things that are definitely > >>>>> classification schemes - e.g. gender or age groups. > >>>>> > >>>>>> e.g., if we want > >>>>>> to define hierarchies on dimension values. > >>>>> > >>>>> This is one area where I think the current QB vocabulary could do > >>>>> with some extension. It would be nice to be able to define the > >>>>> property that is used for hierarchical relationships between > >>>>> dimensions values when those are not skos:Concepts (and thus > skos:broader/narrower). > >>>>> > >>>>> Dave > >>>>> > >>>>> -- > >>>>> Epimorphics Ltd www.epimorphics.com > >>>>> Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT > >>>>> Tel: 01275 399069 Mobile: 07906 628814 > >>>>> > >>>>> Epimorphics Ltd. is a limited company registered in England > >>>>> (number > >>>>> 7016688) > >>>>> Registered address: Court Lodge, 105 High Street, Portishead, > >>>>> Bristol > >>>>> BS20 6PT, UK > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Benedikt > >>>>>> > >>>>>> > >>>>>> [1]<http://estatwrap.ontologycentral.com/data/tsieb010> > >>>>>> [2]<http://estatwrap.ontologycentral.com/dic/geo#ES> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> AIFB, Karlsruhe Institute of Technology (KIT) > >>>>>> Phone: +49 721 608-47946 > >>>>>> Email: benedikt.kaempgen@kit.edu > >>>>>> Web: http://www.aifb.kit.edu/web/Hauptseite/en > >>>>>> > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: publishing-statistical-data@googlegroups.com > >>>>>>> [mailto:publishing- statistical-data@googlegroups.com] On Behalf > >>>>>>> Of Richard Cyganiak > >>>>>>> Sent: Friday, September 23, 2011 8:02 PM > >>>>>>> To: publishing-statistical-data@googlegroups.com > >>>>>>> Subject: Re: [publishing-statistical-data] Re: > >>>>>>> qb:DimensionProperty > >>>>>> subClassOf > >>>>>>> qb:CodedProperty ? > >>>>>>> > >>>>>>> Hi Bill, > >>>>>>> > >>>>>>> On 23 Sep 2011, at 08:05, BillRoberts wrote: > >>>>>>>> But I see your point Richard. Maybe I'm thinking too much like > >>>>>>>> a physicist instead of a statistician! > >>>>>>>> > >>>>>>>> In practice most of these continuous variables are 'chunked': > >>>>>>>> time into years or months, space into a list of points or > >>>>>>>> regions, age into > >>>>>>>> 5 year bands etc etc > >>>>>>> > >>>>>>> Exactly. Statistics tend to be aggregate data, where many > >>>>>>> individual > >>>>>> “events” > >>>>>>> or “facts” (which often have continuous attributes) have been > >>>>>>> lumped > >>>>>> together > >>>>>>> into a single observation. The values along a number of > >>>>>>> dimensions > >>>> have > >>>>>> been > >>>>>>> “classified” into discrete ranges, and everything that falls > >>>>>>> into the same bucket (cube cell) has been “tabulated” into a > >>>>>>> single total or average > >>>>>> number, > >>>>>>> and we're interested only in these totals. > >>>>>>> > >>>>>>> This aggregation can remove a lot of valuable detail, but also > >>>>>>> makes it > >>>>>> easier > >>>>>>> to ask higher-level questions (especially for dimensions where > >>>>>>> the classification is hierarchical), and may make the datasets > >>>>>>> smaller and > >>>> may > >>>>>>> anonymize the data to some extent. > >>>>>>> > >>>>>>> > >>>>>>> If you have some values that you *truly* want to model as > >>>>>>> continuous, > >>>> then > >>>>>> you > >>>>>>> should ask yourself if you aren't really looking at a measure > >>>>>>> rather than > >>>>>> a > >>>>>>> dimension. > >>>>>>> > >>>>>>> Best, > >>>>>>> Richard > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> So it's making a bit more sense to me now. > >>>>>>>> > >>>>>>>> On Sep 22, 7:08 pm, Richard Cyganiak<rich...@cyganiak.de> > wrote: > >>>>>>>>> On 22 Sep 2011, at 16:18, BillRoberts wrote: > >>>>>>>>> > >>>>>>>>>> But there are many dimension properties with values that are > >>>>>>>>>> not > >>>> coded > >>>>>> or > >>>>>>> codelist-able. > >>>>>>>>> > >>>>>>>>> With the exception of time, I don't think that's true. > >>>>>>>>> > >>>>>>>>> Can you give an example of some other dimension whose values > >>>> don't come > >>>>>>> from a controlled/managed set of terms that ought to be > >>>>>>> represented > >>>> as a > >>>>>> SKOS > >>>>>>> concept scheme or RDFS class? > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Richard > >>>>>> > >>>>> > >>> > >> > > > > > > > >
Received on Tuesday, 1 May 2012 23:24:05 UTC