- From: Alf Eaton <eaton.alf@gmail.com>
- Date: Fri, 30 May 2014 13:31:04 +0100
- To: Dan Brickley <danbri@google.com>
- Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
On 30 May 2014 12:33, Dan Brickley <danbri@google.com> wrote: > On 30 May 2014 12:24, Alf Eaton <eaton.alf@gmail.com> wrote: >> On 27 May 2014 11:07, Dan Brickley <danbri@google.com> wrote: >> >>> Here's an example of a CSV structure that hides a hierarchy within cell values. >>> >>> My expectation is that we won't specify a way to access such >>> complexity in our core work but it is worth bearing in mind when >>> thinking about extensions, hooks for other languages etc. >>> >>> This link has raw CSV and prettified HTML, >>> http://www.onetcenter.org/taxonomy/2010/list.html?d=1 ... >>> >>> Schema.org currently mentions this dataset as supplying possible >>> valuess to use in http://schema.org/JobPosting in the >>> http://schema.org/occupationalCategory property. It is very SKOS-like >>> data, consisting of a controlled code, with short text, long text, and >>> a hierarchy represented within the numeric structure of the codes. A >>> simple CSV mapping could expand these out into SKOS Concept like >>> structures; a fancy/custom mapping might figure out broader/narrower >>> relations that show e.g. 11-9041.01,Biofuels/Biodiesel Technology and >>> Product Development Managers as a specialization of >>> 11-9041.00,Architectural and Engineering Managers... >>> >>> I haven't figured out the exact rules to parse a hierarchy yet, but at >>> first look I'd guess it needs procedural code. >> >> I had a go at parsing the CSV into something that made the >> organisation structure browseable, and this mapping seemed to work >> quite well: >> >> 11-9041.01,Biofuels/Biodiesel Technology => >> { >> title: 'Biofuels/Biodiesel Technology', >> subsubcategory: '11-9041.01' >> subcategory: '11-9041', >> category: '11', >> } >> >> This could be done declaratively: a regular expression >> (/^(\d+)-(\d+)\.(\d+)$/) specifies how to parse the hierarchical code >> into its constituent parts, then they just need to be combined one >> part at a time to get the ids for each level of the hierarchy. In the >> user interface, selecting category "11" shows only the items in that >> category (for want of a better term), and selecting subcategory >> "11-9041" shows only the items in that subcategory. > > Interesting - I was thinking of this mapping more directly into SKOS. > But perhaps exploding from regex into this fixed structure would be > enough to make the final step to SKOS feasible via SPARQL 1.1 > CONSTRUCT? Ok, maybe that's getting arcane, but at least it's an > existing standard :) > >> In this particular case, there doesn't seem to be (as far as I can >> tell) an ontology providing labels or relationships between each level >> of the hierarchy, which would be useful. > > I believe this CSV is as close as we get to having such an ontology :) If the aim is to build a SKOS ontology from the CSV data, then I guess the end result would be something like this: <onetsoc:11-9041.01> <skos:prefLabel> "'Biofuels/Biodiesel Technology" <onetsoc:11-9041.01> <skos:broader> <onetsoc:11-9041> <onetsoc:11-9041> <skos:broader> <onetsoc:11> What would the input data need to be, for SPARQL CONSTRUCT to be able to build that output? I note that although other categorisation systems (e.g. MeSH) express their hierarchy in this way, others (e.g. Dewey Decimal) use a different system of identifiers that would be harder (impossible?) to split into a hierarchy with just a regular expression: <ddc:001.012> <skos:broader> <ddc:001.01> <ddc:001.01> <skos:broader> <ddc:001> <ddc:001> <skos:broader> <ddc:000> Alf
Received on Friday, 30 May 2014 12:31:52 UTC