- From: Dan Scott <denials@gmail.com>
- Date: Mon, 9 Dec 2013 13:51:44 -0500
- To: Thad Guidry <thadguidry@gmail.com>
- Cc: Owen Stephens <owen@ostephens.com>, Henry Andrews <hha1@cornell.edu>, "public-schemabibex@w3c.org" <public-schemabibex@w3c.org>, Peter Olson <polson@marvel.com>
On Mon, Dec 9, 2013 at 1:29 PM, Thad Guidry <thadguidry@gmail.com> wrote: > That's how the Type schema is setup in Freebase as well. > Were an Imprint is a Publisher as well. And we type an Imprint as a > Publisher. > > Please read the description along the top of our Publisher Type in Freebase: > https://www.freebase.com/book/publishing_company?schema= > > Publisher: > This type is used for any company or organization that has published a book. > It should also be used for imprints, which might sometimes be subsidiaries > of a larger company, sometimes be a division of a company or organization, > or sometimes just a brand. Nice. +1 > On Mon, Dec 9, 2013 at 11:42 AM, Owen Stephens <owen@ostephens.com> wrote: >> >> I agree from a data/modelling point of view - I'm just pointing out that >> when one has a label in hand like "Pergamon" it is difficult to know whether >> this is the Publisher or the Imprint (in this case it could be either as >> Pergamon was a publisher and is now an Imprint owned by Elsevier I think). >> How to handle the situation where the data publisher either doesn't know or >> doesn't care about this distinction? >> >> My argument is that if we create a property of 'imprint' or 'publisher' >> then we have to accept and expect that the values put into these will >> inevitably be a mixture of both. Yes, but I think that describes life on the structured data web, and to be honest I don't see how that really impacts our designs. Low quality data is going to be published no matter what we do; if we just had one type with two properties (like "Book" with "author" and "title"), I'm sure there would be examples in the wild where the properties got each others values. At the scale of the web, it's up to the search engines and other processors to figure out quality and authority metrics (like "Oh, comics.org has lots of high quality stuff: A+ for its data; 3 pages with a handful of types on Dan Scott's website? E for effort, thanks for trying--at least until we find signals that this is high-quality niche stuff from, say, links from other pages") and how that impacts what they do with the data. For one thing, I assume that the search engines are going to do a lot of contrast & compare to find agreement on what the "truth" is across all of the pages they have crawled (and any other sources to which they have access). As one example, Freebase already has high quality standards set for accepting incoming data from automated systems (can't remember offhand if the bar is 99% or 99.9% quality); Google's Knowledge Graph builds from there. So I don't think we should worry too much about the example of a professor augmenting their bibliography page with some schema.org types and properties. While applications (like, say, a Zotero translator to extract citations) will have to do their best with what they find on a page like that, we should not deliberately oversimplify our design and hamstring the high-quality, high-volume data publishers like comics.org who have demonstrated why something like "imprint" should be handled separately from "publisher" and who already publish that data separately in their systems today; similarly, we should not hobble the search engines and applications that can be built on top of that data. Thanks, Dan
Received on Monday, 9 December 2013 18:52:12 UTC