W3C home > Mailing lists > Public > public-digipub-ig@w3.org > April 2014

RE: schema.org and ONIX... AND THEMA

From: Bill Kasdorf <bkasdorf@apexcovantage.com>
Date: Tue, 8 Apr 2014 15:59:38 +0000
To: "Madans, Phil" <Phil.Madans@hbgusa.com>, Ivan Herman <ivan@w3.org>
CC: W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <fd1a19db378b4ad4bd9a27358c26a532@CO2PR06MB572.namprd06.prod.outlook.com>
Actually, based on the interviews I've had, I think Thema is _exactly_ what is needed, from a global point of view. BISAC is just US/North American-centric, and it is way too large and complex to work in the context of schema.org.

What I think we will actually wind up recommending, though, are schema.org _properties_ that can accommodate various vocabularies. Ideally I'd like to see perhaps a recommended vocabulary (as is the case for the new educational metadata in schema.org) but the ability to accommodate any vocabulary (Thema, BISAC, BIC, PRISM, IPTC NewsCodes, etc.).

For publishers that already have BISAC subjects at the title level, it is of course obvious that the first priority is to give them the means to incorporate that. But remember we are talking about W3C, not EPUB, here. What I am hearing is not that publishers need to transmit subject metadata at the title level with products (they already do that), but that they need a very simple, streamlined, globally understood way to communicate subject information at a very granular level. Schema.org enables that, down to the <span> level (but of course would most commonly be used at the chapter or other section level).

Another reason I am suggesting starting with Thema is that I heard over and over that "metadata is just too damn complex" and "ONIX is just too damn complex." I of course always point out the Graham Bell / Mark Bide observation (citing them appropriately, of course!) that it is complex because what people want to do is complex.

But one of the big insights from my interviews is that people are looking for something FAR more streamlined.

Keep in mind that schema.org is based on built-in recognition from all the browsers and search engines.

The example I usually cite is recipes. PRISM has a very rich recipe vocabulary because the publishers that got together to create it understand all the complexities and nuances involved in fully characterizing recipe content. But what is in schema.org is hrecipe, which is a small subset of what is in the PRISM recipe vocabulary. And that's why when you ask Google for recipes that include matzoh, it finds some. Publishers want that "Google will bring people to my content" functionality out of the box.

--Bill K

-----Original Message-----
From: Madans, Phil [mailto:Phil.Madans@hbgusa.com] 
Sent: Tuesday, April 08, 2014 11:46 AM
To: Bill Kasdorf; Ivan Herman
Cc: W3C Digital Publishing IG
Subject: RE: schema.org and ONIX... AND THEMA


Thema is a subject classification, that is global, yes, but fundamentally Thema is no different than BISAC Subjects.  Thema would be transmitted in ONIX as any other subject classification. The real issue though is Thema would also have the same limitations as using BISAC categories--lack of granularity from a  contextual  standpoint--as pointed out in your interviews. I think it is still too high level. Thema is certainly a good tool but I wouldn't see it as an answer In itself.

Phil Madans | Executive Director of Digital Publishing Technology | Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 | phil.madans@hbgusa.com

-----Original Message-----
From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
Sent: Tuesday, April 08, 2014 11:09 AM
To: Ivan Herman
Cc: W3C Digital Publishing IG
Subject: RE: schema.org and ONIX... AND THEMA

Hi, Ivan--

This was actually going to be one of my recommendations as a result of the interviewing I've been doing. That this "super simple schema.org" subset would be useful is clear. What I do want to point out, though, is that I think Thema rather than ONIX would be the better choice. Thema is about subject classifications (plus it is a global standard) and that is the kind of metadata publishers need to embed at a granular level in their content. ONIX is really a messaging format that contains a huge amount of stuff that would not be used in that way at all--it is product level metadata primarily. So whether it might make sense to look at portions of ONIX, I think Thema is the place to start, and I think it would find very quick adoption.

First of all, apologies for the delayed response. I am head-down on a big report for the EU that I have to finish by tomorrow, and I have been neglecting non-critical e-mails (and cancelling meetings!). I see that there are a whole bunch of replies to this that I have not yet looked at. But I wanted to go on record for saying that this was exactly what I was thinking would be called for (whether ONIX or Thema).

-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org]
Sent: Monday, April 07, 2014 11:23 PM
To: Bill Kasdorf
Cc: W3C Digital Publishing IG
Subject: schema.org and ONIX...


I am currently at a Linked Data Workshop at a conference in Seoul, which had a keynote from R. Guha, who is, in some sense, the "father" of schema.org. Listening to him (combining also with my past experience), and also referring to the note I sent around earlier this morning[1] I am more and more serious in thinking that a stripped-down version of ONIX defined in schema.org might be a great idea. Of course, we have to see whether there is a business interest and business case for this: is there a use case for publishers as well as for search engines? But if the answer is yes on both, than this may be an important thing to do.

I do know Guha personally relatively well, as well as Dan Brickley, who is the other person running schema.org's vocabulary development. I would be happy to make the links and go into the discussions but, of course, the question is whether publishers, as well as institutions like Bowker, would be interested by something like that. I think that clarifying this, ie, set up the use cases, would be perfectly in line with the IG's charter (although we probably would have to spawn a different group to make the specification itself, but that is all right.)

What do you think?


[1] http://www.publishersweekly.com/pw/by-topic/international/london-book-fair/article/61722-london-book-fair-2014-publishers-and-internet-standards.html

Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.
Received on Tuesday, 8 April 2014 16:00:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:35:50 UTC