RE: Different dataset views and services in Dataset Schema

An interesting and useful idea, but it drags the topic of data provenance
(and data governance) into the basic topic of describing the contents of a
dataset. It would be better to deal with data provenance as a separate
vocabulary effort as it may not be interesting to most users of public
datasets.

Thanks, Jim

 

From: Will Pugh [mailto:will.pugh@socrata.com] 
Sent: Saturday, July 14, 2012 4:53 PM
To: public-vocabs@w3.org
Subject: Different dataset views and services in Dataset Schema

 

Hi folks,

 

I'm new to  <http://schema.org/> schema.org, but just looked at the new
Datasets Schema.  The initial proposal looks great.  Seems very simple
(which is a good thing), however, there were a few concepts I wanted to run
by this group that I didn't see in there.

 

My understanding is that the main goal of the  <http://schema.org/>
schema.org it to create schemas useful to search engines, rather than the
broader goals of projects like Linked Data that want to create a "Global
Data Space".  Is this a correct assessment?

 

With that assumption, I've got a few scenarios I wanted to ask about, with
the idea that these scenarios may describe relationships interesting to
search engines.  

 

1)  Is there a way to describe "derived datasets"?  So, for example, take
data set "2011-Report-to-Congress-on-White-House-Staff" on
<http://opendata.socrata.com/> opendata.socrata.com.  It is pretty
straightforward how to model that in the Datasets schema.  However, now take
the different views people have built on top of this data set, such as a
view that ONLY shows White House Staff with salaries greater than $100,000.
This view acts in every way like a dataset, and can be thought of as one.
It can be viewed as HTML, downloaded as CSV, JSON, etc.

 

It seems like it might be useful for this "derived dataset" to be able to
state that it comes from another dataset.  Something like a property:

    derivedFrom : Dataset

 

Without knowing too much about the internals of the big search engines, it
seems like this information could be useful for how they choose to either
cluster results together or make the results on separate entries.

 

2)  Would it make sense to describe an API on top of a dataset instead of
simply a dataset.  For example, one way to access a Dataset may be to
download a JSON or CSV file.  Another, might be to call an API that takes
sort/filter/grouping clauses on top of the dataset.  How would this API be
represented?  

 

3)  Would it make sense to have a type which refers to a view or a dataset?
For example, if I have a page that contains a graph that contains number of
people with different salaries at the White House, would it make sense to be
able to express to a search engine that the graph is using the
"2011-Report-to-Congress-on-White-House-Staff" dataset?

 

 

 

    Thanks,

    --Will

Received on Monday, 16 July 2012 16:10:56 UTC