Re: ComicSeries publisher/imprint

On Mon, Dec 9, 2013 at 1:29 PM, Thad Guidry <thadguidry@gmail.com> wrote:
> That's how the Type schema is setup in Freebase as well.
> Were an Imprint is a Publisher as well.  And we type an Imprint as a
> Publisher.
>
> Please read the description along the top of our Publisher Type in Freebase:
> https://www.freebase.com/book/publishing_company?schema=
>
> Publisher:
> This type is used for any company or organization that has published a book.
> It should also be used for imprints, which might sometimes be subsidiaries
> of a larger company, sometimes be a division of a company or organization,
> or sometimes just a brand.

Nice. +1

> On Mon, Dec 9, 2013 at 11:42 AM, Owen Stephens <owen@ostephens.com> wrote:
>>
>> I agree from a data/modelling point of view - I'm just pointing out that
>> when one has a label in hand like "Pergamon" it is difficult to know whether
>> this is the Publisher or the Imprint (in this case it could be either as
>> Pergamon was a publisher and is now an Imprint owned by Elsevier I think).
>> How to handle the situation where the data publisher either doesn't know or
>> doesn't care about this distinction?
>>
>> My argument is that if we create a property of 'imprint' or 'publisher'
>> then we have to accept and expect that the values put into these will
>> inevitably be a mixture of both.

Yes, but I think that describes life on the structured data web, and
to be honest I don't see how that really impacts our designs. Low
quality data is going to be published no matter what we do; if we just
had one type with two properties (like "Book" with "author" and
"title"), I'm sure there would be examples in the wild where the
properties got each others values. At the scale of the web, it's up to
the search engines and other processors to figure out quality and
authority metrics (like "Oh, comics.org has lots of high quality
stuff: A+ for its data; 3 pages with a handful of types on Dan Scott's
website? E for effort, thanks for trying--at least until we find
signals that this is high-quality niche stuff from, say, links from
other pages") and how that impacts what they do with the data. For one
thing, I assume that the search engines are going to do a lot of
contrast & compare to find agreement on what the "truth" is across all
of the pages they have crawled (and any other sources to which they
have access). As one example, Freebase already has high quality
standards set for accepting incoming data from automated systems
(can't remember offhand if the bar is 99% or 99.9% quality); Google's
Knowledge Graph builds from there.

So I don't think we should worry too much about the example of a
professor augmenting their bibliography page with some schema.org
types and properties. While applications (like, say, a Zotero
translator to extract citations) will have to do their best with what
they find on a page like that, we should not deliberately oversimplify
our design and hamstring the high-quality, high-volume data publishers
like comics.org who have demonstrated why something like "imprint"
should be handled separately from "publisher" and who already publish
that data separately in their systems today; similarly, we should not
hobble the search engines and applications that can be built on top of
that data.

Thanks,
Dan

Received on Monday, 9 December 2013 18:52:12 UTC