Re: Hidden hierarchy example

On 30 May 2014 12:33, Dan Brickley <danbri@google.com> wrote:
> On 30 May 2014 12:24, Alf Eaton <eaton.alf@gmail.com> wrote:
>> On 27 May 2014 11:07, Dan Brickley <danbri@google.com> wrote:
>>
>>> Here's an example of a CSV structure that hides a hierarchy within cell values.
>>>
>>> My expectation is that we won't specify a way to access such
>>> complexity in our core work but it is worth bearing in mind when
>>> thinking about extensions, hooks for other languages etc.
>>>
>>> This link has raw CSV and prettified HTML,
>>> http://www.onetcenter.org/taxonomy/2010/list.html?d=1 ...
>>>
>>> Schema.org currently mentions this dataset as supplying possible
>>> valuess to use in http://schema.org/JobPosting in the
>>> http://schema.org/occupationalCategory property. It is very SKOS-like
>>> data, consisting of a controlled code, with short text, long text, and
>>> a hierarchy represented within the numeric structure of the codes. A
>>> simple CSV mapping could expand these out into SKOS Concept like
>>> structures; a fancy/custom mapping might figure out broader/narrower
>>> relations that show e.g. 11-9041.01,Biofuels/Biodiesel Technology and
>>> Product Development Managers as a specialization of
>>> 11-9041.00,Architectural and Engineering Managers...
>>>
>>> I haven't figured out the exact rules to parse a hierarchy yet, but at
>>> first look I'd guess it needs procedural code.
>>
>> I had a go at parsing the CSV into something that made the
>> organisation structure browseable, and this mapping seemed to work
>> quite well:
>>
>> 11-9041.01,Biofuels/Biodiesel Technology =>
>> {
>>   title: 'Biofuels/Biodiesel Technology',
>>   subsubcategory: '11-9041.01'
>>   subcategory: '11-9041',
>>   category: '11',
>> }
>>
>> This could be done declaratively: a regular expression
>> (/^(\d+)-(\d+)\.(\d+)$/) specifies how to parse the hierarchical code
>> into its constituent parts, then they just need to be combined one
>> part at a time to get the ids for each level of the hierarchy. In the
>> user interface, selecting category "11" shows only the items in that
>> category (for want of a better term), and selecting subcategory
>> "11-9041" shows only the items in that subcategory.
>
> Interesting - I was thinking of this mapping more directly into SKOS.
> But perhaps exploding from regex into this fixed structure would be
> enough to make the final step to SKOS feasible via SPARQL 1.1
> CONSTRUCT? Ok, maybe that's getting arcane, but at least it's an
> existing standard :)
>
>> In this particular case, there doesn't seem to be (as far as I can
>> tell) an ontology providing labels or relationships between each level
>> of the hierarchy, which would be useful.
>
> I believe this CSV is as close as we get to having such an ontology :)

If the aim is to build a SKOS ontology from the CSV data, then I guess
the end result would be something like this:

<onetsoc:11-9041.01> <skos:prefLabel> "'Biofuels/Biodiesel Technology"
<onetsoc:11-9041.01> <skos:broader> <onetsoc:11-9041>
<onetsoc:11-9041> <skos:broader> <onetsoc:11>

What would the input data need to be, for SPARQL CONSTRUCT to be able
to build that output?

I note that although other categorisation systems (e.g. MeSH) express
their hierarchy in this way, others (e.g. Dewey Decimal) use a
different system of identifiers that would be harder (impossible?) to
split into a hierarchy with just a regular expression:

<ddc:001.012> <skos:broader> <ddc:001.01>
<ddc:001.01> <skos:broader> <ddc:001>
<ddc:001> <skos:broader> <ddc:000>

Alf

Received on Friday, 30 May 2014 12:31:52 UTC