W3C home > Mailing lists > Public > public-vocabs@w3.org > July 2012

Different dataset views and services in Dataset Schema

From: Will Pugh <will.pugh@socrata.com>
Date: Sat, 14 Jul 2012 16:53:15 -0700
Message-ID: <CAEhPSgjbL+pW1jPzOaVZta=mibbrDqNyA9JQdpR74P5mvJhMOA@mail.gmail.com>
To: public-vocabs@w3.org
Hi folks,

I'm new to schema.org, but just looked at the new Datasets Schema.  The
initial proposal looks great.  Seems very simple (which is a good thing),
however, there were a few concepts I wanted to run by this group that I
didn't see in there.

My understanding is that the main goal of the schema.org it to create
schemas useful to search engines, rather than the broader goals of projects
like Linked Data that want to create a "Global Data Space".  Is this a
correct assessment?

With that assumption, I've got a few scenarios I wanted to ask about, with
the idea that these scenarios may describe relationships interesting to
search engines.

1)  Is there a way to describe "derived datasets"?  So, for example, take
data set "2011-Report-to-Congress-on-White-House-Staff" on
opendata.socrata.com.  It is pretty straightforward how to model that in
the Datasets schema.  However, now take the different views people have
built on top of this data set, such as a view that ONLY shows White House
Staff with salaries greater than $100,000.  This view acts in every way
like a dataset, and can be thought of as one.  It can be viewed as HTML,
downloaded as CSV, JSON, etc.

It seems like it might be useful for this "derived dataset" to be able to
state that it comes from another dataset.  Something like a property:
    derivedFrom : Dataset

Without knowing too much about the internals of the big search engines, it
seems like this information could be useful for how they choose to either
cluster results together or make the results on separate entries.

2)  Would it make sense to describe an API on top of a dataset instead of
simply a dataset.  For example, one way to access a Dataset may be to
download a JSON or CSV file.  Another, might be to call an API that takes
sort/filter/grouping clauses on top of the dataset.  How would this API be
represented?

3)  Would it make sense to have a type which refers to a view or a dataset?
 For example, if I have a page that contains a graph that contains number
of people with different salaries at the White House, would it make sense
to be able to express to a search engine that the graph is using
the "2011-Report-to-Congress-on-White-House-Staff" dataset?



    Thanks,
    --Will
Received on Monday, 16 July 2012 05:28:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 16 July 2012 05:28:27 GMT