Re: Feedback on Dataset Schema

Hi Leigh,


On Thu, Jul 12, 2012 at 3:42 AM, Leigh Dodds <ld@talis.com> wrote:
> Hi,
>
> Some initial feedback on the new Dataset schema description. Overall I
> think this is a great first start that captures the essential
> information which appears to be common across various dataset
> description proposals, as well as in actual usage.


I appreciate the feedback!



> I think its important to clarify the relationship of this proposal
> with existing work from the Linked Data community, which has already
> seen some adoption, in order to avoid confusion. The mapping between
> schema elements is an important first step there. It might be useful
> to also note in the documentation where a publisher might want to
> support more than one approach or where another approach might offer
> additional benefits.


I'll add some discussion about the mapping.  It's as much a discussion
of syntax (microdata vs. RDFa vs. Linked Data, etc.) as vocabulary.
With schema.org microdata, there is less of a question of mixing
vocabularies, so the mapping is more important for integration with
other data sources than it is for choosing among alternative terms.



> In my own work on dataset description, the key elements that
> developers and users have appreciated are:
>
> * Clear name and description of a dataset, with some indication of
> scope. The schema supports that, and includes spatial support
> * Clear provenance -- achieved through use of publisher markup


Good.


> * Clear publication dates: when was the dataset published,


Check.



> when was it last updated.


Check. schema.org's CreativeWork type distinguishes between
dateCreated, dateModified, and datePublished, the last two of which
are explicitly included in the vocab.



> It might also be useful to indicate the time period to
> which the dataset applies, e.g. census data for UK for 1901.


Yes.  DCAT uses dc:temporal, and I have been leaning towards pulling
it into the extension (e.g. using schema.org's Duration type and ISO
8601 time intervals).  Now leaning a little harder.



> * Pointers to downloads -- captured as DatasetDownload
>
> With the addition of a few new types and properties, and reuse of
> existing schema.org markup, these core use cases seem well covered.
>
> It would be useful to see more examples that cover each of these
> areas, including how to communicate downloads in various formats.


More examples are on the way.



> I note that the table in the wiki refers to ds:license but this is not
> called out anywhere.


Currently, the idea is simply to point to a WebPage about the license,
but I'm open to other suggestions.



> Does a generic license property apply to the
> Dataset schema or is there a more general term?


Well, not to the schema, but to the data...



> License might usefully
> be captured as an enumeration of, e.g. Creative Commons and Open
> Government licenses.


This sounds like a job for a License extension.  Any takers?  At
present, the closest equivalent appears to be the copyrightHolder and
copyrightYear properties.



> It might also be useful to be able to indicate where a user of a
> dataset can contact the publisher to report problems with a dataset,
> or ask questions about its usage.


That should be covered by providing a ContactPoint for the publisher.
An example would be helpful.



> As well as a short description,
> pointers to fuller documentation are also useful.


Still thinking inside the schema.org box, discussionUrl might do the
trick, but now I'm making things up.  Will have to think about the
best way to express this.



> Hope thats useful!


Very much so.  Thanks!


Joshua


>
> Cheers,
>
> L.
> --
> Leigh Dodds
> Freelance Technologist
> e: leigh@ldodds.com
> t: @ldodds
> w: http://ldodds.com
>

Received on Thursday, 12 July 2012 09:40:18 UTC