Re: The dcterm/schema.org issue: a proposal to move forward from Dan Brickley on 2014-10-03 (public-csv-wg@w3.org from October 2014)

From: Dan Brickley <danbri@google.com>
Date: Fri, 3 Oct 2014 18:51:21 +0100
To: Rufus Pollock <rufus.pollock@okfn.org>
Cc: Ivan Herman <ivan@w3.org>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>, Jeni Tennison <jeni@jenitennison.com>
Message-ID: <CAK-qy=6r6Qc20r4KSqxBb_kvEQs4vsm0QhRnqmQBP8zXRiH8Ng@mail.gmail.com>
On 3 October 2014 18:22, Rufus Pollock <rufus.pollock@okfn.org> wrote:
> On 3 October 2014 10:42, Ivan Herman <ivan@w3.org> wrote:
>>
>> Dear all,
>>
>> I was wondering how to move ahead with the schema.org/dcmi question, their
>> normativeness, etc; we seem to be in an impasse at this moment. Here is a
>> strategy that may help us moving forward. (Some of the issues below were
>> also motivated by giving some more thoughts to the RDF/JSON conversion
>> document and its possible implementation.)
>>
>> 1. We define a small set of core properties that we consider to be
>> essential in the metadata. "We define" means that we specify the terms to be
>> used in the metadata specification as well as their data types and intended
>> meaning.  There is already a set of such terms defined at the end of 3.4.2:
>>
>>         • created
>>         • creator
>>         • description
>>         • language
>>         • license
>>         • modified
>>         • provenance
>>         • publisher
>>         • rights
>>         • rightsHolder
>>         • source
>>         • spatial
>>         • subject
>>         • temporal
>>         • title

I made a quick run through mapping these to schema.org. It's fairly good fit,

• created: http://schema.org/dateCreated
• creator: http://schema.org/creator or http://schema.org/author
• description: http://schema.org/description
• language: http://schema.org/language (definition applies to actions;
could be generalized)
• license: http://schema.org/license
• modified: http://schema.org/dateModified
• provenance: no direct. http://schema.org/evidenceOrigin is related.
• publisher: http://schema.org/publisher
• rights: no direct mapping
• rightsHolder: http://schema.org/copyrightHolder
• source: no direct mapping (how does this compare to provenance), not
http://schema.org/source which is medical.
• spatial: https://schema.org/spatial
• subject: http://schema.org/about
• temporal: https://schema.org/temporal
• title: https://schema.org/name (rather than https://schema.org/title)

(I'm interested in the difference between source vs provenance...)

> I think this seems pretty reasonable. The one item I always find a bit
> awkward is creator (vs author) but that's just me ;-) (creator esp for data
> always seems a bit odd whereas author is more neutral - cf
> https://github.com/dataprotocols/dataprotocols/issues/130)

Yeah, I believe 'creator' in Dublin Core was an early (1996ish)
replacement for 'author', to better fit images, media objects,
cultural heritage artifacts etc. Schema.org has both 'creator' and
'author' fwiw.


>> we may start there (although I am not sure 'spatial' and 'temporal' should
>> be part of such core set of terms). (I actually believe that these terms
>> should be used _only_ as top level terms, at least in some cases; I am not
>> sure it makes sense to add, say, a license to a specific cell in the table.)
>
>
> I agree there to: I like them but they are not as regularly used or as clear
> in their usage. At the same time i occasionally find them useful ...

We should make it clear that this is only a "starter kit", if people
have reason to add more detail, that's all for the good.

>> 2. The metadata already refers to @context. We would then say that
>> (JSON-LD compatible) context entries MAY be added by the author to assign
>> these terms to explicit URI-s in the vocabulary of their choice. The
>> metadata document would also include an informative appendix with @context
>> examples for a mapping on DCT or to schema.org. That being said, I foresee
>> that many authors would not really bother, in fact, and just use the terms
>> in JSON.
>
> Agreed - seems very sensible.

>From the above it looks like a subset that wrote '@context':
'http://schema.org/' would work, and presumably people could do
fancier things with their own context file.

>> 3. The current 3.3.1 section in the metadata document (listing the Dublin
>> Core terms) should be removed altogether.
>>
>> 4. The metadata document should also make it possible to use any set of
>> properties anywhere in the metadata _in qualified form. We should also refer
>> to a number of predefined prefixes; the best approach is, probably, to refer
>> to the RDFa predefined prefix set. Ie, people may add properties of the form
>> "dc:spatial" or "schema:author". However, authors may also add prefixes they
>> want, besides those that are predefined. For many users, that is where it
>> stops; others, who care about a proper RDF-ization of the metadata, may want
>> to add the proper mapping of the prefixes to URI-s; we should provide a
>> @context for the predefined prefixes as well.
>
>
> Again seems very sensible :-)

+1

Are we allowed to make a normative ref to
http://www.w3.org/2011/rdfa-context/rdfa-1.1 ? Under what
circumstances and it what ways (additive vs edits) does it change?

>>
>> I believe this approach could work, and covers our issues:
>>
>>         - users (authors, clients) who do not really care about URI-s,
>> linkage, RDF formats, or indeed vocabularies, could simply rely on the terms
>> that are defined as standard terms in the metadata document. (I believe this
>> is what Jeni proposed on our meeting 10 days ago.)
>>
>>         - users who care about binding the terms to outside vocabularies
>> can choose to add a set of URI mappings through a @context; whether they
>> choose DC, schema.org, or some application area specific vocabulary is not
>> for the standard to define, although we make it easy to use the well known
>> vocabularies. That also means that neither the DC reference nor the
>> schema.org reference is normative.
>>
>>         - with the appropriate contexts the metadata is proper JSON-LD,
>> ie, can serve as a 'glue' between the core CSV data and the Linked Data
>> Cloud. (It is important to note that, afaik, a context can be delivered to a
>> client via HTTP links, ie, the publisher of the data may not even care about
>> the @context but the data store may ensure the JSON-LD aspects
>> nevertheless.)
>>
>>         - the RDF mapping of the CSV content would rely on @context if
>> present, otherwise these terms would be mapped against the "csv:" namespace.
>> That means the mapping to RDF becomes clearly defined. (The current CSV->RDF
>> document is hand-wawing around the top level terms right now, it is not
>> really clean yet)
>>
>>         - users can use any other types of vocabularies at their heart's
>> content, and the proper usage and mapping of these vocabularies can be
>> ensured through the usage of qualified names to separate those from the
>> terms that are defined by our standard.
>>
>>
>> How does that sound?
>
>
> Very sensible. I've also booted an issue for tracking this more:
> https://github.com/w3c/csvw/issues/29 (you may want to add your proposal
> there too).

Thanks,

Dan

> Rufus
>
>>
>>
Received on Friday, 3 October 2014 17:51:48 UTC