thoughts on the dataset usage vocab from Annette Greiner on 2015-07-27 (public-dwbp-wg@w3.org from July 2015)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Mon, 27 Jul 2015 01:10:38 -0700
To: Eric Stephan <ericphb@gmail.com>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
Message-Id: <FEA50352-BA8A-414F-9758-6FFE44E03B5F@lbl.gov>

Hi Eric,
Folllowing up on Friday's discussion about the DUV, first I want to say that you have done a great job of thinking through a lot of the relationshipis between existing vocabularies and terms that we would need in a dataset usage vocabulary. There is already a lot of good information and useful stuff in there. I'd like to push things a little further toward addressing the use cases that come to mind for me when I think about what a dataset usage vocabulary might offer. As a developer, I want to find out about uses that others are making of the data that I make available, and there are a few aspects of those usages that are of particular interest. I think it would be very helpful if the vocabulary could provide means of expressing them.

It would be interesting to know whether others are using the full dataset or parts thereof. That helps me understand what is deemed useful and helps prioritize future work. One of the reasons I've been thinking of positioning an instance of dataset usage as an oa:annotation is that those annotations can apply at a pretty granular level, so it would be possible to express the usage of a subset of a dataset.

It would be useful to know whether others are using a dataset that I've published as an ongoing dependency or not. That is, did they pull the data once and are they using it without need to pull again, or are they calling the API at runtime? It's pretty common for at least one project I've worked on (the Materials Project) to have users that pull from their API a single time, to get a database of their own from which they can work locally. It is also possible for them to create a new web application that calls the API at runtime, which creates a dependency. If I needed to inform those who were using my API on an ongoing basis of some issue, knowing which people's work had dependencies on it would be a great help.

It would be useful for reporting to granting agencies to know how a published dataset is being used, whether for analysis, republishing, visualization, remixing, citation, description, correction, rating, critique, or feedback. Some of these uses have much clearer value to the granting agency than others.

In the current model, it seems that feedback is the sole term that inherits from oa:annotation. I think of feedback as just one type of usage, and it seems more logical to me to have all types of usage inherit from oa: annotation, so that one can annotate the dataset with any of them. I imagine the original dataset would be the target and the new usage would be the body of an annotation with a motivation like "commenting" or "describing", or an extension motivation such as "visualizing" or "analyzing" or "remixing".

-Annette

Received on Monday, 27 July 2015 08:11:12 UTC