RE: Using schema.org Dataset metadata properties from Tandy, Jeremy on 2014-09-17 (public-csv-wg@w3.org from September 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Wed, 17 Sep 2014 07:35:20 +0000
To: Dan Brickley <danbri@google.com>, Jeni Tennison <jeni@jenitennison.com>, Thomas Baker <tom@tombaker.org>
CC: CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE20888004C@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Thanks Dan - I certainly found your note informative. My inclination is to support use of schema.org - which we can see as a superset of terms from DC supported by an active community with support from major infrastructure players.

...

Regarding the mechanics of using schema.org terms in the CSV metadata ... when looking at the Metadata Vocabulary document, section 3.3.2 Links [1] we state:

"""
Unlike the Dublin Core terms, link relations are an ever-expanding list and there may eventually be clashes between link relation terms and those defined above. That's why the above list uses QNames for all link relations, so that they look like link:relation rather than plain relation.
"""

Presumably with schema.org being an "ever-expanding list" we would need to present these as QNames too? e.g. "schema:event" rather than just plain "event" ...


Jeremy

[1] http://w3c.github.io/csvw/metadata/index.html#links


-----Original Message-----
From: Dan Brickley [mailto:danbri@google.com] 
Sent: 16 September 2014 12:32
To: Jeni Tennison; Thomas Baker
Cc: CSV on the Web Working Group
Subject: Re: Using schema.org Dataset metadata properties

+Cc: Tom Baker from Dublin Core

On 13 September 2014 17:28, Jeni Tennison <jeni@jenitennison.com> wrote:
> Hi,
>
> In the current metadata document here:
>
>   http://w3c.github.io/csvw/metadata/#common-properties

>
> the spec maps adopts the list of Dublin Core properties for describing tables etc. As ISSUE 6 says, this might not be the right choice: there might be other standard vocabularies that should be used instead or as well.
>
> On the call this week, Dan suggested using schema.org instead, namely the properties on Dataset here:
>
>   http://schema.org/Dataset

>
> The properties there are informed by DCAT which itself was informed by Dublin Core.
>
> Any thoughts?

As a WG co-chair, as a loyal member of the DC community, and as someone in large part responsible for schema.org day-to-day, I've stepped back from this conversation so far.

I first met Tom Baker (cc:'d) at the 5th Dublin Core meeting, October 6-8, 1997 in Helsinki, Finland. That was the week that W3C announced the first draft RDF spec, which also began the long and noble tradition of DC being used in W3C RDF-related spec examples:

http://www.w3.org/TR/WD-rdf-syntax-971002/


<?namespace href="http://purl.org/DublinCore/RDFschema" as="DC"?> <?namespace href="http://www.w3.org/schemas/rdf-schema" as="RDF"?> <RDF:serialization>
  <RDF:assertions href="http://www.webnuts.net/Jan97.html">
    <DC:subject>
      <RDF:resource id="subject_001">
        <DC:scheme>Dewey Decimal Code</DC:scheme>
        <DC:lang>English</DC:lang>
        <RDF:PropValue>020 - Library Science</RDF:PropValue>
      </RDF:resource>
    </DC:subject>
  </RDF:assertions>
</RDF:serialization>

This even pre-dates modern XML namespaces :)

Some aspects of this thread feel like a conversation that hasn't stopped since those days. "When do we describe something as a 'thing', or with a string? Or a link?". Both DC and schema.org navigate those tradeoffs, and in similar ways. They both try to gently encourage thing-centric data modeling, while acknowledging that many important repositories are full of information based on ambiguous or vague strings. We take what we can get.

There are good cases for both Dublin Core and Schema.org in CSV metadata files. If you have a repository/collection whose metadata generally is DC-based, it is entirely reasonable to want to have DC-based CSV metadata, and W3C CSVW metadata files absolutely should support that use case. JSON-LD makes that relatively easy.

My personal feeling for how DC and Schema.org should best relate (also see http://www.slideshare.net/danbri/what-is-left-to-do-dublin-core-2012-keynote

) is that schema.org is weakest on controlled values for properties, and this is an area where the DC community (through connection to the digital library / GLAM world) can excel. In terms of the actual vocabulary terms (basic properties and types) their expressivity is similar, except schema.org's is much larger (I think around 1200 terms now, and growing). I don't expect schema.org as a project to add a lot of controlled enumerations, whereas DC could easily find a role doing just that e.g. SKOS thesauri, controlled terms for educational technology publishing, etc. There is certainly room for both, even if there are overlaps.

In terms of explicit mappings, we did have a DC/schema.org mapping task force a couple of years ago, but we let it fizzle out without finalizing its output. More recently the schema.org site codebase has been opensourced, posted on Github, and has grown some features that make it worthwhile revisiting those mappings.

The master file defining schema.org is
https://github.com/rvguha/schemaorg/blob/master/data/schema.rdfa


Recently we have been publishing frozen snapshots of that with every release, e.g. http://schema.org/release/20140912/20140912-v1.91.rdfa.html

although there is a need for more structure around those. Schema.org is currently updated fairly often, see http://schema.org/docs/releases.html for a release history or Github for the full details.

Within those machine readable files there are already some basic mappings to DC, e.g. Event:

 <div typeof="rdfs:Class" resource="http://schema.org/Event">
      <span class="h" property="rdfs:label">Event</span>
      <span property="rdfs:comment">An event happening at a certain time and location, such as a concert, lecture, or festival. Ticketing information may be added via the &#39;offers&#39; property. Repeated events may be structured as separate Event objects.</span>
       <span>Subclass of: <a property="rdfs:subClassOf"
href="http://schema.org/Thing">Thing</a></span>
       <link property="owl:equivalentClass"
href="http://purl.org/dc/dcmitype/Event"/>
    </div>

... which in turn gets re-published in per-term pages like http://schema.org/Event as follows:

<div id="mainContent" vocab="http://schema.org/" typeof="rdfs:Class"
resource="http://schema.org/Event">
  <link property="owl:equivalentClass"
href="http://purl.org/dc/dcmitype/Event"/>
</div>...

Now that this is possible we should go back and put in the rest of the draft mappings.

As a long time Dublin Core person I don't want to advocate against DC here, but I do think there are advantages to using schema.org:

1.
When we map the actual payloads of CSV data into triples, schema.org's added depth will make it more useful than DC. So schema.org will re-appear within mappings/templates anyway, whether normatively encouraged or not.

2.
It has the attention of publishers and consumers at large scale.
Schema.org went from nothing to being on 7+ million domains in 3 years, and is still being actively evolved. It is not a classic formal standards activity but both builds on standards and has most discussion/collaboration through public means on github and W3C public-vocabs list.

3.
It is relatively easy to get it extended. No promises, but if there are unaddressed use cases, at this moment a change request to schema.org is much more likely to result in changes / improvements than a change request to Dublin Core.

The downside of this is that the thing is constantly evolving, which goes against some W3C instincts w.r.t. making normative references.
And it is under the stewardship of the 4 sponsor search engines (Yandex, Yahoo, Bing, Google), which is not everyone's preferred model.

I would be happy with either DC or schema.org as the default for CSVW metadata, but would prefer either way that we make sure publishers can choose which they prefer since CSVs are often a smaller part of a larger story. I wouldn't want to hold back DC-centric systems from using DC, or schema.org-centric systems from using schema.org.

Beyond that I suggest we collect concrete metadata use cases and see what in practice is missing from either DC and schema.org and try to get them added to one or the other.

I've copied Tom as we were chatting earlier and he may have thoughts to add. As an RDFish person I'm just happy that both vocabs share a common underlying data model at least (and overlapping communities...).

cheers,

Dan
Received on Wednesday, 17 September 2014 07:35:51 UTC