W3C home > Mailing lists > Public > public-vocabs@w3.org > July 2012

Re: Different dataset views and services in Dataset Schema

From: Will Pugh <will.pugh@socrata.com>
Date: Wed, 18 Jul 2012 00:28:44 -0700
Message-ID: <CAEhPSgjWjFnvLj7icA4-WCKXNJ_isjMeuEo24JYDomr9tfDMvQ@mail.gmail.com>
To: Joshua Shinavier <josh@fortytwo.net>
Cc: public-vocabs@w3.org
On Mon, Jul 16, 2012 at 9:35 AM, Joshua Shinavier <josh@fortytwo.net> wrote:

> Hi Will,
> Thanks for your suggestions.  I would have replied sooner, but I
> missed your email the first time around.
> On Sat, Jul 14, 2012 at 7:53 PM, Will Pugh <will.pugh@socrata.com> wrote:
> [...]
> > My understanding is that the main goal of the schema.org it to create
> > schemas useful to search engines, rather than the broader goals of
> projects
> > like Linked Data that want to create a "Global Data Space".  Is this a
> > correct assessment?
> I believe so, but there are others on this list who could give a more
> authoritative and complete answer.
> > With that assumption, I've got a few scenarios I wanted to ask about,
> with
> > the idea that these scenarios may describe relationships interesting to
> > search engines.
> >
> > 1)  Is there a way to describe "derived datasets"?
> No, although I think this is a good idea.  I imagine this would be
> useful from a licensing perspective (however, schema.org does not deal
> with licensing) as well as for making the related / super-dataset
> discoverable.  However, I don't think it's very specific to datasets;
> IMHO, it would make more sense at the CreativeWork level.  If such a
> term were in DCAT, perhaps it would make sense to include an
> equivalent term in the extension and then propose that it be moved up
> to CreativeWork.

Interesting point.  Derived Work in CreativeWork does make sense.  The
reason I was leaning towards something in Dataset, though,  is that
"DerivedWork" might have different meaning for a general CreativeWork than
for a Dataset specifically.  In specific, a general "DerivedWork" could
imply changes on top of the original.  E.g.  If I have a photo of the
president and photoshop myself in, the result could be a "DerivedWork" from
the original photo.

However, the concept I was trying to express is one where the data itself
is not changed.  Instead one where filters, sorts or aggregations are
layered on top of the dataset to "tell a story" from the data, or to make
some point from the data.  Perhaps a different name would be better, like
OriginalDataset or ParentDataset?

> > 2)  Would it make sense to describe an API on top of a dataset instead of
> > simply a dataset.
> This is a very important question.  It would be reasonable to allow a
> Dataset distribution to be either a data download, a web service, or a
> feed, as in DCAT, *if* there were a straightforward mapping to
> schema.org types and properties.  However, schema.org does not have an
> equivalent of DCAT's Distribution class (which is a superclass of
> Download, WebService, and Feed), and I don't even see a proposal for
> feed or web service types.  That means that in order to allow the
> distribution property to point to any of the three types of resources,
> either schema.org would need to allow multiple types in the range of a
> property, or we would have to add four new types to schema.org just
> for distributions.  Alternatively, separate properties could be added
> for feeds and web services.  In any case, two additional types would
> need to be added to schema.org.  Since those types are relatively
> fundamental, I suspect they would need to be the subject of other,
> individual proposals.
> > 3)  Would it make sense to have a type which refers to a view or a
> dataset?
> > For example, if I have a page that contains a graph that contains number
> of
> > people with different salaries at the White House, would it make sense
> to be
> > able to express to a search engine that the graph is using the
> > "2011-Report-to-Congress-on-White-House-Staff" dataset?

This could be a more general derived from concept in Creative Work,
although the reason I thought it might make sense to specifically call out
a view or a presentation for a dataset would be to help search engines
cluster results better.  For example, when I google "German Shepard", one
thing I get is a list of images of German Shepards.  If we made it easy for
Search Engines to recognize charts or visualizations for a specific
dataset, then you could imagine it could do the same thing with results.

If I Google "White House Salaries", you would expect it should be able to
not only find  "2011-Report-to-Congress-on-White-House-Staff", but also
show images from a number fo the visualizations created from it.


> This looks like another use for the derivedFrom property you suggested
> above.
> Best regards,
> Joshua
> >
> >
> >
> >     Thanks,
> >     --Will
Received on Wednesday, 18 July 2012 07:29:14 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:48:47 UTC