- From: Jim Rhyne <jrhyne@thematix.com>
- Date: Mon, 16 Jul 2012 09:10:06 -0700
- To: "'Will Pugh'" <will.pugh@socrata.com>, <public-vocabs@w3.org>
- Message-ID: <01b501cd636d$788a0a30$699e1e90$@com>
An interesting and useful idea, but it drags the topic of data provenance (and data governance) into the basic topic of describing the contents of a dataset. It would be better to deal with data provenance as a separate vocabulary effort as it may not be interesting to most users of public datasets. Thanks, Jim From: Will Pugh [mailto:will.pugh@socrata.com] Sent: Saturday, July 14, 2012 4:53 PM To: public-vocabs@w3.org Subject: Different dataset views and services in Dataset Schema Hi folks, I'm new to <http://schema.org/> schema.org, but just looked at the new Datasets Schema. The initial proposal looks great. Seems very simple (which is a good thing), however, there were a few concepts I wanted to run by this group that I didn't see in there. My understanding is that the main goal of the <http://schema.org/> schema.org it to create schemas useful to search engines, rather than the broader goals of projects like Linked Data that want to create a "Global Data Space". Is this a correct assessment? With that assumption, I've got a few scenarios I wanted to ask about, with the idea that these scenarios may describe relationships interesting to search engines. 1) Is there a way to describe "derived datasets"? So, for example, take data set "2011-Report-to-Congress-on-White-House-Staff" on <http://opendata.socrata.com/> opendata.socrata.com. It is pretty straightforward how to model that in the Datasets schema. However, now take the different views people have built on top of this data set, such as a view that ONLY shows White House Staff with salaries greater than $100,000. This view acts in every way like a dataset, and can be thought of as one. It can be viewed as HTML, downloaded as CSV, JSON, etc. It seems like it might be useful for this "derived dataset" to be able to state that it comes from another dataset. Something like a property: derivedFrom : Dataset Without knowing too much about the internals of the big search engines, it seems like this information could be useful for how they choose to either cluster results together or make the results on separate entries. 2) Would it make sense to describe an API on top of a dataset instead of simply a dataset. For example, one way to access a Dataset may be to download a JSON or CSV file. Another, might be to call an API that takes sort/filter/grouping clauses on top of the dataset. How would this API be represented? 3) Would it make sense to have a type which refers to a view or a dataset? For example, if I have a page that contains a graph that contains number of people with different salaries at the White House, would it make sense to be able to express to a search engine that the graph is using the "2011-Report-to-Congress-on-White-House-Staff" dataset? Thanks, --Will
Received on Monday, 16 July 2012 16:10:56 UTC