Re: SKOS for schema.org proposal for discussion

On 3 October 2013 20:09, Guha <guha@google.com> wrote:
> Just to be clear ... Schema.org 'assimilating' SKOS (or anything else) does
> not gate anything. You can most certainly go ahead and
>   "publish pages about concepts described in a controlled vocabulary and to
> describe the controlled vocabulary itself"
>
> today. Schema.org encourages the use of multiple vocabularies.

Yes, it's quite possible already, and there's a lot of SKOS out there
in RDF/XML, RDFa, Turtle, etc. using its own namespace.

I do believe there's value in giving schema.org a SKOS-oriented notion
of topic/concept/category, and I'd stick with the name "Concept" for
it. A lot of people who publish SKOS authority data will probably want
to use also the official W3C SKOS namespace (which is built-in to RDFa
1.1 btw, see skos: and skosxl: in
http://www.w3.org/2011/rdfa-context/rdfa-1.1 ). Schema.org can add
value by making it easier for these concept URLs to get deployed as
controlled property values more widely across the Web.

But let's try to walk through a couple of use cases.

1. JobPosting taxonomies
We have http://schema.org/JobPosting with
http://schema.org/occupationalCategory currently. (expected type:
Text)

"Category or categories describing the job. Use BLS O*NET-SOC
taxonomy: http://www.onetcenter.org/taxonomy.html. Ideally includes
textual label and formal code, with the property repeated for each
applicable value."


Over on the referenced site, there are a few links.
http://www.onetcenter.org/taxonomy/2010/list.html seems to be the
latest. There are also CSV and XLS downloadable versions. But no
canonical url for each concept code. Looking at the HTML for the 2010
code list, we see:

<tr>
<td class="datapubrt" width="30%">13-1031.01</td>
<td class="datapub" width="70%">Claims Examiners, Property and
Casualty Insurance</td>
</tr>
<tr>
<td class="datapubrt" width="30%">13-1031.02</td>
<td class="datapub" width="70%">Insurance Adjusters, Examiners, and
Investigators</td>
</tr>...etc

The CSV version has two columns (code and title). In this dataset
there does appear to be hierarchy, but hidden in the structure of the
names of the  codes. It would be good for the Web if we could surface
this structure and have the code list site tell us that insurance
adjusters and claims examiners are related, and that regulatory
affairs managers and compliance managers are both managers.

<td class="datapubrt" width="30%">11-9199.01</td>
<td class="datapub" width="70%">Regulatory Affairs Managers</td>
</tr>
<tr>
<td class="datapubrt" width="30%">11-9199.02</td>
<td class="datapub" width="70%">Compliance Managers</td>

So imagine onetcenter.org start marking up with SKOS or schema.org
SKOS or both. There are some choices to make about what the entity IDs
are, whether they are different from the Web page, etc. Sticking with
classic SKOS for now,

<tr typeof="skos:Concept" resource="#concept11-9199.01">
<td class="datapub" width="30%" property="skos:notation">11-9199.01</td>
<td class="datapub" width="70%" property="skos:prefLabel">Regulatory
Affairs Managers</td>
</tr>
<tr typeof="skos:Concept" resource="#concept11-9199.02" >
<td class="datapubrt" width="30%" property="skos:notation">11-9199.02</td>
<td class="datapub" width="70%" property="skos:prefLabel">Compliance
Managers</td>
</tr>

So now what do we do with our original schema.org property, which
originally expected just textual values?

Should we say it expects an URL, for example
http://www.onetcenter.org/taxonomy/2010/list.html#concept11-9199.01 ?
Or that it expects a http://schema.org/Concept (which like any other
type could always be identified by URI/URL/IRI). Aside: this raises a
general oddity in schema.org w.r.t. saying we expect the "URL" type.
Some people have want to use 'expected type: URL' to distinguish the
case where we have something a) like http://schema.org/trailer on
Movie, where the trailer is a VideoObject described inline, versus b)
http://schema.org/thumbnail on ImageObject, where we 'expect an URL'
in one sense, but we also expect it to be an ImageObject.

So one way or another we can extend our use of
http://schema.org/occupationalCategory so that it properly accepts an
URL like http://www.onetcenter.org/taxonomy/2010/list.html#concept11-9199.01
and we can encourage authority data publishers to use RDFa, SKOS
and/or maybe Schema.org SKOS to describe their vocabulary.

2.)

Second scenario I'll cover more quickly. The LRMI initiative came up
with vocabulary which we now include in schema.org. It includes the
notion that you describe the educational characteristics of
information resources through aligning them with standard code lists,
e.g. in the US, the Common Core, or in broader terms with the kind of
topical codes we see in the SKOS world, such as LCSH.

The key property here is http://schema.org/targetUrl which is
documented as expecting an URL.

You can see an example of it in
http://www.cteonline.org/portal/default/Curriculum/Viewer/Curriculum?action=2&cmobjid=177674&refcmobjid=132904
which uses http://purl.org/ASN/resources/S103AD27 (which has linked
machine readable versions, but not rdfa, skos or schema.org).

I won't copy all the CTEOnline.org markup into this mail, but just an excerpt,

<div class="contents"><span itemprop="educationalAlignment"
itemscope="" itemtype="http://schema.org/AlignmentObject"><meta
itemprop="name" content="ANR.C.C13.3 Use the scientific method to
conduct agricultural experiments." /><meta itemprop="description"
content="Use the scientific method to conduct agricultural
experiments." /><meta itemprop="targetName" content="ANR.C.C13.3 Use
the scientific method to conduct agricultural experiments." /><meta
itemprop="targetDescription" content="Use the scientific method to
conduct agricultural experiments." /><meta itemprop="alignmentType"
content="teaches" /><meta itemprop="targetUrl"
content="http://purl.org/ASN/resources/S103AE77" />

... if you dig around the educational use case you see that topics for
educational content, topics for bibliographic description, and job and
skill taxonomies are all quite inter-related. It would be very
positive if schema.org could send a clear message for how all these
things fit together in terms of markup that is search-engine friendly.

My inclination is to add a basic skos:Concept type and
broader/narrower links, but not necessarily to reflect all of SKOS
into schema.org. For schema.org we should focus on getting a lot more
instance data linked to these SKOS-describable vocabularies. For that
it helps if we can explicitly say ( within schema.org's self-contained
framework ) which schema.org properties can be used with SKOS-like
controlled lists. Adding a Concept type addresses this need.

Are there any important scenarios missing? rNews++/storyline? events
categories? Drupal 8? I'd like to get some agreement on motivations...

Dan

Received on Thursday, 3 October 2013 20:11:01 UTC