Re: The dcterm/schema.org issue: a proposal to move forward from Gregg Kellogg on 2014-10-05 (public-csv-wg@w3.org from October 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Sun, 5 Oct 2014 12:46:13 -0700
To: Ivan Herman <ivan@w3.org>
Cc: Dan Brickley <danbri@google.com>, Rufus Pollock <rufus.pollock@okfn.org>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>, Jeni Tennison <jeni@jenitennison.com>
Message-Id: <4BB85274-AC34-45F3-9A1D-8A1C8FA2BDF8@greggkellogg.net>
On Oct 5, 2014, at 10:42 AM, Ivan Herman <ivan@w3.org> wrote:

> 
> Hey Dan,
> 
>> On 3 Oct 2014, at 19:51, Dan Brickley <danbri@google.com> wrote:
>> 
>>> On 3 October 2014 18:22, Rufus Pollock <rufus.pollock@okfn.org> wrote:
>>>> On 3 October 2014 10:42, Ivan Herman <ivan@w3.org> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> I was wondering how to move ahead with the schema.org/dcmi question, their
>>>> normativeness, etc; we seem to be in an impasse at this moment. Here is a
>>>> strategy that may help us moving forward. (Some of the issues below were
>>>> also motivated by giving some more thoughts to the RDF/JSON conversion
>>>> document and its possible implementation.)
>>>> 
>>>> 1. We define a small set of core properties that we consider to be
>>>> essential in the metadata. "We define" means that we specify the terms to be
>>>> used in the metadata specification as well as their data types and intended
>>>> meaning.  There is already a set of such terms defined at the end of 3.4.2:
>>>> 
>>>>       • created
>>>>       • creator
>>>>       • description
>>>>       • language
>>>>       • license
>>>>       • modified
>>>>       • provenance
>>>>       • publisher
>>>>       • rights
>>>>       • rightsHolder
>>>>       • source
>>>>       • spatial
>>>>       • subject
>>>>       • temporal
>>>>       • title
>> 
>> I made a quick run through mapping these to schema.org. It's fairly good fit,
>> 
>> • created: http://schema.org/dateCreated
>> • creator: http://schema.org/creator or http://schema.org/author
>> • description: http://schema.org/description
>> • language: http://schema.org/language (definition applies to actions;
>> could be generalized)
>> • license: http://schema.org/license
>> • modified: http://schema.org/dateModified
>> • provenance: no direct. http://schema.org/evidenceOrigin is related.
>> • publisher: http://schema.org/publisher
>> • rights: no direct mapping
>> • rightsHolder: http://schema.org/copyrightHolder
>> • source: no direct mapping (how does this compare to provenance), not
>> http://schema.org/source which is medical.
>> • spatial: https://schema.org/spatial
>> • subject: http://schema.org/about
>> • temporal: https://schema.org/temporal
>> • title: https://schema.org/name (rather than https://schema.org/title)
>> 
>> (I'm interested in the difference between source vs provenance...)
> 
> Note (as I told Rufus) this is just a starting shot, we can reduce, it, enlarge it, rename terms, etc...
> 
>> 
>>> I think this seems pretty reasonable. The one item I always find a bit
>>> awkward is creator (vs author) but that's just me ;-) (creator esp for data
>>> always seems a bit odd whereas author is more neutral - cf
>>> https://github.com/dataprotocols/dataprotocols/issues/130)
>> 
>> Yeah, I believe 'creator' in Dublin Core was an early (1996ish)
>> replacement for 'author', to better fit images, media objects,
>> cultural heritage artifacts etc. Schema.org has both 'creator' and
>> 'author' fwiw.
>> 
>> 
>>>> we may start there (although I am not sure 'spatial' and 'temporal' should
>>>> be part of such core set of terms). (I actually believe that these terms
>>>> should be used _only_ as top level terms, at least in some cases; I am not
>>>> sure it makes sense to add, say, a license to a specific cell in the table.)
>>> 
>>> 
>>> I agree there to: I like them but they are not as regularly used or as clear
>>> in their usage. At the same time i occasionally find them useful ...
>> 
>> We should make it clear that this is only a "starter kit", if people
>> have reason to add more detail, that's all for the good.
> 
> Well, we have to be careful how we say this. The core set is  standardized, in the sense that we define what those terms may be, and we require tools (validators, converters, etc.) to understand those. Any new term the user may use may not have the same behaviour (e.g. tools may not check them), so we should probably advise not to use non-qualified terms other than the one we define ( although we cannot avoid people doing that). The advise would be to use qualified names for any other terms.
> 
> Of course, we can revise the standard every few years if we want to add new terms.
> 
>> 
>>>> 2. The metadata already refers to @context. We would then say that
>>>> (JSON-LD compatible) context entries MAY be added by the author to assign
>>>> these terms to explicit URI-s in the vocabulary of their choice. The
>>>> metadata document would also include an informative appendix with @context
>>>> examples for a mapping on DCT or to schema.org. That being said, I foresee
>>>> that many authors would not really bother, in fact, and just use the terms
>>>> in JSON.
>>> 
>>> Agreed - seems very sensible.
>> 
>> From the above it looks like a subset that wrote '@context':
>> 'http://schema.org/' would work, and presumably people could do
>> fancier things with their own context file.
>> 
> 
> Yes and yes. However, if we decide to use a different term name for something, then the core schema.org context may not work. Also, I do not know whether there is a context set up for DCMI terms... (Gregg may know). Bottom line is that we may have to provide our own context file.

I'm not aware of any such context, but for the most part, the following would just work:

{
  "@context": {
    "@vocab": "http://purl.org/dc/terms/"
  }
}

IMO, getting DCMI to publish and "official" context, with perhaps some datatyping information (although DC Terms is light on this anyway), would be a good idea.


>>>> 3. The current 3.3.1 section in the metadata document (listing the Dublin
>>>> Core terms) should be removed altogether.
>>>> 
>>>> 4. The metadata document should also make it possible to use any set of
>>>> properties anywhere in the metadata _in qualified form. We should also refer
>>>> to a number of predefined prefixes; the best approach is, probably, to refer
>>>> to the RDFa predefined prefix set. Ie, people may add properties of the form
>>>> "dc:spatial" or "schema:author". However, authors may also add prefixes they
>>>> want, besides those that are predefined. For many users, that is where it
>>>> stops; others, who care about a proper RDF-ization of the metadata, may want
>>>> to add the proper mapping of the prefixes to URI-s; we should provide a
>>>> @context for the predefined prefixes as well.
>>> 
>>> 
>>> Again seems very sensible :-)
>> 
>> +1
>> 
>> Are we allowed to make a normative ref to
>> http://www.w3.org/2011/rdfa-context/rdfa-1.1 ? Under what
>> circumstances and it what ways (additive vs edits) does it change?
> 
> Well, RDFa has a reference to that file, obviously...
> 
> The three  rules (which are described there) are:
> 
> - no prefix definition is ever removed
> - if there is a new term coming to the fore then the community (so far the RDFa, but that can be extended) may have a consensus to add this. The initial set was based on a crawl done, independently, of Yahoo! and Sindice to evaluate which were the most widespread vocabularies and their prefixes and we took, I believe, the top 10. Remarkably, I do not know of any new vocabulary that may have come to the fore since.
> - W3C standard vocabularies are added to the mix when they become standard
> 
> I never had any complaint on the set since its creation. I guess it could work for us, and would avoid reinventing the wheel...

Agreed. I suspect that such definitions will eventually be used by other formats, and we considered having a default context for JSON-LD, but ultimately rejected it. Note that prefix.cc now supports a JSON-LD context defining all such prefixes [1].

Gregg

[1] http://prefix.cc/context

> Thanks
> 
> Cheers
> 
> Ivan
> 
>> 
>>>> 
>>>> I believe this approach could work, and covers our issues:
>>>> 
>>>>       - users (authors, clients) who do not really care about URI-s,
>>>> linkage, RDF formats, or indeed vocabularies, could simply rely on the terms
>>>> that are defined as standard terms in the metadata document. (I believe this
>>>> is what Jeni proposed on our meeting 10 days ago.)
>>>> 
>>>>       - users who care about binding the terms to outside vocabularies
>>>> can choose to add a set of URI mappings through a @context; whether they
>>>> choose DC, schema.org, or some application area specific vocabulary is not
>>>> for the standard to define, although we make it easy to use the well known
>>>> vocabularies. That also means that neither the DC reference nor the
>>>> schema.org reference is normative.
>>>> 
>>>>       - with the appropriate contexts the metadata is proper JSON-LD,
>>>> ie, can serve as a 'glue' between the core CSV data and the Linked Data
>>>> Cloud. (It is important to note that, afaik, a context can be delivered to a
>>>> client via HTTP links, ie, the publisher of the data may not even care about
>>>> the @context but the data store may ensure the JSON-LD aspects
>>>> nevertheless.)
>>>> 
>>>>       - the RDF mapping of the CSV content would rely on @context if
>>>> present, otherwise these terms would be mapped against the "csv:" namespace.
>>>> That means the mapping to RDF becomes clearly defined. (The current CSV->RDF
>>>> document is hand-wawing around the top level terms right now, it is not
>>>> really clean yet)
>>>> 
>>>>       - users can use any other types of vocabularies at their heart's
>>>> content, and the proper usage and mapping of these vocabularies can be
>>>> ensured through the usage of qualified names to separate those from the
>>>> terms that are defined by our standard.
>>>> 
>>>> 
>>>> How does that sound?
>>> 
>>> 
>>> Very sensible. I've also booted an issue for tracking this more:
>>> https://github.com/w3c/csvw/issues/29 (you may want to add your proposal
>>> there too).
>> 
>> Thanks,
>> 
>> Dan
>> 
>>> Rufus
>>> 
>>>> 
>>>> 
>
Received on Sunday, 5 October 2014 19:46:44 UTC