- From: Will Pugh <will.pugh@socrata.com>
- Date: Wed, 18 Jul 2012 00:28:44 -0700
- To: Joshua Shinavier <josh@fortytwo.net>
- Cc: public-vocabs@w3.org
- Message-ID: <CAEhPSgjWjFnvLj7icA4-WCKXNJ_isjMeuEo24JYDomr9tfDMvQ@mail.gmail.com>
On Mon, Jul 16, 2012 at 9:35 AM, Joshua Shinavier <josh@fortytwo.net> wrote: > Hi Will, > > Thanks for your suggestions. I would have replied sooner, but I > missed your email the first time around. > > > > On Sat, Jul 14, 2012 at 7:53 PM, Will Pugh <will.pugh@socrata.com> wrote: > [...] > > My understanding is that the main goal of the schema.org it to create > > schemas useful to search engines, rather than the broader goals of > projects > > like Linked Data that want to create a "Global Data Space". Is this a > > correct assessment? > > > I believe so, but there are others on this list who could give a more > authoritative and complete answer. > > > > > With that assumption, I've got a few scenarios I wanted to ask about, > with > > the idea that these scenarios may describe relationships interesting to > > search engines. > > > > 1) Is there a way to describe "derived datasets"? > > > No, although I think this is a good idea. I imagine this would be > useful from a licensing perspective (however, schema.org does not deal > with licensing) as well as for making the related / super-dataset > discoverable. However, I don't think it's very specific to datasets; > IMHO, it would make more sense at the CreativeWork level. If such a > term were in DCAT, perhaps it would make sense to include an > equivalent term in the extension and then propose that it be moved up > to CreativeWork. > Interesting point. Derived Work in CreativeWork does make sense. The reason I was leaning towards something in Dataset, though, is that "DerivedWork" might have different meaning for a general CreativeWork than for a Dataset specifically. In specific, a general "DerivedWork" could imply changes on top of the original. E.g. If I have a photo of the president and photoshop myself in, the result could be a "DerivedWork" from the original photo. However, the concept I was trying to express is one where the data itself is not changed. Instead one where filters, sorts or aggregations are layered on top of the dataset to "tell a story" from the data, or to make some point from the data. Perhaps a different name would be better, like OriginalDataset or ParentDataset? > > > > > 2) Would it make sense to describe an API on top of a dataset instead of > > simply a dataset. > > > This is a very important question. It would be reasonable to allow a > Dataset distribution to be either a data download, a web service, or a > feed, as in DCAT, *if* there were a straightforward mapping to > schema.org types and properties. However, schema.org does not have an > equivalent of DCAT's Distribution class (which is a superclass of > Download, WebService, and Feed), and I don't even see a proposal for > feed or web service types. That means that in order to allow the > distribution property to point to any of the three types of resources, > either schema.org would need to allow multiple types in the range of a > property, or we would have to add four new types to schema.org just > for distributions. Alternatively, separate properties could be added > for feeds and web services. In any case, two additional types would > need to be added to schema.org. Since those types are relatively > fundamental, I suspect they would need to be the subject of other, > individual proposals. > > > > > 3) Would it make sense to have a type which refers to a view or a > dataset? > > For example, if I have a page that contains a graph that contains number > of > > people with different salaries at the White House, would it make sense > to be > > able to express to a search engine that the graph is using the > > "2011-Report-to-Congress-on-White-House-Staff" dataset? > This could be a more general derived from concept in Creative Work, although the reason I thought it might make sense to specifically call out a view or a presentation for a dataset would be to help search engines cluster results better. For example, when I google "German Shepard", one thing I get is a list of images of German Shepards. If we made it easy for Search Engines to recognize charts or visualizations for a specific dataset, then you could imagine it could do the same thing with results. If I Google "White House Salaries", you would expect it should be able to not only find "2011-Report-to-Congress-on-White-House-Staff", but also show images from a number fo the visualizations created from it. Thanks, --Will > > > This looks like another use for the derivedFrom property you suggested > above. > > > Best regards, > > Joshua > > > > > > > > > > Thanks, > > --Will >
Received on Wednesday, 18 July 2012 07:29:14 UTC