Re: thoughts on the dataset usage vocab

Hi Annette,

Sorry for the long delay responding back to you.   I'll respond in two

1)  In terms of the question about granularity I am wondering if the
resolution for 169 (
discussed at the July 3 meeting  is sufficient to meet your needs including
both the Dataset and Distribution in the model.  We still need to update
the model.  Does this meeting your granularity needs?

2) From a usage standpoint the way you are describing usage and the
previous discussions (See
and search for "Eric Stephan: good comments about vocab reuse" for
beginning of discussion on
where we looked at reusing the PROV vocabulary to describe usage.  I did a
bit more digging and found a discussion on a class "prov:Usage" .  The example shows
how "prov:Role" shows the context of the usage.




On Mon, Jul 27, 2015 at 1:10 AM, Annette Greiner <> wrote:

> Hi Eric,
> Folllowing up on Friday's discussion about the DUV, first I want to say
> that you have done a great job of thinking through a lot of the
> relationshipis between existing vocabularies and terms that we would need
> in a dataset usage vocabulary. There is already a lot of good information
> and useful stuff in there. I'd like to push things a little further toward
> addressing the use cases that come to mind for me when I think about what a
> dataset usage vocabulary might offer. As a developer, I want to find out
> about uses that others are making of the data that I make available, and
> there are a few aspects of those usages that are of particular interest. I
> think it would be very helpful if the vocabulary could provide means of
> expressing them.
> It would be interesting to know whether others are using the full dataset
> or parts thereof. That helps me understand what is deemed useful and helps
> prioritize future work. One of the reasons I've been thinking of
> positioning an instance of dataset usage as an oa:annotation is that those
> annotations can apply at a pretty granular level, so it would be possible
> to express the usage of a subset of a dataset.
> It would be useful to know whether others are using a dataset that I've
> published as an ongoing dependency or not. That is, did they pull the data
> once and are they using it without need to pull again, or are they calling
> the API at runtime? It's pretty common for at least one project I've worked
> on (the Materials Project) to have users that pull from their API a single
> time, to get a database of their own from which they can work locally. It
> is also possible for them to create a new web application that calls the
> API at runtime, which creates a dependency. If I needed to inform those who
> were using my API on an ongoing basis of some issue, knowing which people's
> work had dependencies on it would be a great help.
> It would be useful for reporting to granting agencies to know how a
> published dataset is being used, whether for analysis, republishing,
> visualization, remixing, citation, description, correction, rating,
> critique, or  feedback. Some of these uses have much clearer value to the
> granting agency than others.
> In the current model, it seems that feedback is the sole term that
> inherits from oa:annotation. I think of feedback as just one type of usage,
> and it seems more logical to me to have all types of usage inherit from oa:
> annotation, so that one can annotate the dataset with any of them. I
> imagine the original dataset would be the target and the new usage would be
> the body of an annotation with a motivation like "commenting" or
> "describing", or an extension motivation such as "visualizing" or
> "analyzing" or "remixing".
> -Annette

Received on Tuesday, 11 August 2015 21:54:49 UTC