RE: schema.org and ONIX... AND THEMA

Okay, we're in synch, Phil--thanks! I think the only question is the schema.org issue, which is what started this thread. Maybe ONIX is the right fit there; I'm certainly open to that. But my guess is that there would need to be an extremely complicated discussion of what to include and what to leave out, and I can see that becoming a bit of a black hole. . . . ;-)

A solution that associates an ONIX record with a document (like EPUB does) might be a simple solution, but that is not what schema.org is looking for, unless I'm misunderstanding. I still think, because Thema is a global subject vocabulary, there will be a strong inclination for schema.org to say "that works for our purposes."

BTW we are having _exactly_ this same problematic discussion in the EDUPUB world, where the LRMI metadata in schema.org has been deemed insufficient by IMS Global, who advocates a MUCH more extensive vocabulary--because they are in the thick of all the nuances and distinctions needed for educational content. Just as you're saying Thema is really not rich enough for you, IMS Global is saying LRMI (thus schema.org) doesn't work for them, is not rich enough for them. I'm sure the magazine folks will say the same thing about hrecipe (again, schema.org).

--Bill

-----Original Message-----
From: Madans, Phil [mailto:Phil.Madans@hbgusa.com] 
Sent: Tuesday, April 08, 2014 12:33 PM
To: Bill Kasdorf; Ivan Herman
Cc: W3C Digital Publishing IG
Subject: RE: schema.org and ONIX... AND THEMA

Bill,

I absolutely agree with you about ONIX--see my comments on the interviews you had about it. But if you really want to get down to the chapter level, I do think you are going to need something more granular and contextually relevant than Thema.  Thema is a subject classification for the product. It was based on BIC and is not any more streamlined than BISAC. I've worked with it.  We are implementing it here.

If you are talking about streamlined delivery, I agree schema.org is something we should look at, certainly.

On the other hand. There has also been precedent for using ONIX elements in other contexts. The Metadata Book Extension for GS1's Global Data Synchronization Network is a small subset of ONIX 3.0 elements used under a license agreement with EDItEUR.  It was done that way so if Publishers wanted to start using GDSN they could use existing systems and not have to do new development work, which is always a major roadblock to adoption and implementation As we know, this is a prime issue with ONIX 3.0 adoption here in the US and other places right now. I think we should keep this in mind in our discussions.

Phil

------------------------------------------------------------
Phil Madans | Executive Director of Digital Publishing Technology | Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 | phil.madans@hbgusa.com

-----Original Message-----
From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
Sent: Tuesday, April 08, 2014 12:00 PM
To: Madans, Phil; Ivan Herman
Cc: W3C Digital Publishing IG
Subject: RE: schema.org and ONIX... AND THEMA

Actually, based on the interviews I've had, I think Thema is _exactly_ what is needed, from a global point of view. BISAC is just US/North American-centric, and it is way too large and complex to work in the context of schema.org.

What I think we will actually wind up recommending, though, are schema.org _properties_ that can accommodate various vocabularies. Ideally I'd like to see perhaps a recommended vocabulary (as is the case for the new educational metadata in schema.org) but the ability to accommodate any vocabulary (Thema, BISAC, BIC, PRISM, IPTC NewsCodes, etc.).

For publishers that already have BISAC subjects at the title level, it is of course obvious that the first priority is to give them the means to incorporate that. But remember we are talking about W3C, not EPUB, here. What I am hearing is not that publishers need to transmit subject metadata at the title level with products (they already do that), but that they need a very simple, streamlined, globally understood way to communicate subject information at a very granular level. Schema.org enables that, down to the <span> level (but of course would most commonly be used at the chapter or other section level).

Another reason I am suggesting starting with Thema is that I heard over and over that "metadata is just too damn complex" and "ONIX is just too damn complex." I of course always point out the Graham Bell / Mark Bide observation (citing them appropriately, of course!) that it is complex because what people want to do is complex.

But one of the big insights from my interviews is that people are looking for something FAR more streamlined.

Keep in mind that schema.org is based on built-in recognition from all the browsers and search engines.

The example I usually cite is recipes. PRISM has a very rich recipe vocabulary because the publishers that got together to create it understand all the complexities and nuances involved in fully characterizing recipe content. But what is in schema.org is hrecipe, which is a small subset of what is in the PRISM recipe vocabulary. And that's why when you ask Google for recipes that include matzoh, it finds some. Publishers want that "Google will bring people to my content" functionality out of the box.

--Bill K

-----Original Message-----
From: Madans, Phil [mailto:Phil.Madans@hbgusa.com]
Sent: Tuesday, April 08, 2014 11:46 AM
To: Bill Kasdorf; Ivan Herman
Cc: W3C Digital Publishing IG
Subject: RE: schema.org and ONIX... AND THEMA

Bill,

Thema is a subject classification, that is global, yes, but fundamentally Thema is no different than BISAC Subjects.  Thema would be transmitted in ONIX as any other subject classification. The real issue though is Thema would also have the same limitations as using BISAC categories--lack of granularity from a  contextual  standpoint--as pointed out in your interviews. I think it is still too high level. Thema is certainly a good tool but I wouldn't see it as an answer In itself.

Phil
------------------------------------------------------------
Phil Madans | Executive Director of Digital Publishing Technology | Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 | phil.madans@hbgusa.com

-----Original Message-----
From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
Sent: Tuesday, April 08, 2014 11:09 AM
To: Ivan Herman
Cc: W3C Digital Publishing IG
Subject: RE: schema.org and ONIX... AND THEMA

Hi, Ivan--

This was actually going to be one of my recommendations as a result of the interviewing I've been doing. That this "super simple schema.org" subset would be useful is clear. What I do want to point out, though, is that I think Thema rather than ONIX would be the better choice. Thema is about subject classifications (plus it is a global standard) and that is the kind of metadata publishers need to embed at a granular level in their content. ONIX is really a messaging format that contains a huge amount of stuff that would not be used in that way at all--it is product level metadata primarily. So whether it might make sense to look at portions of ONIX, I think Thema is the place to start, and I think it would find very quick adoption.

First of all, apologies for the delayed response. I am head-down on a big report for the EU that I have to finish by tomorrow, and I have been neglecting non-critical e-mails (and cancelling meetings!). I see that there are a whole bunch of replies to this that I have not yet looked at. But I wanted to go on record for saying that this was exactly what I was thinking would be called for (whether ONIX or Thema).

-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org]
Sent: Monday, April 07, 2014 11:23 PM
To: Bill Kasdorf
Cc: W3C Digital Publishing IG
Subject: schema.org and ONIX...

Bill,

I am currently at a Linked Data Workshop at a conference in Seoul, which had a keynote from R. Guha, who is, in some sense, the "father" of schema.org. Listening to him (combining also with my past experience), and also referring to the note I sent around earlier this morning[1] I am more and more serious in thinking that a stripped-down version of ONIX defined in schema.org might be a great idea. Of course, we have to see whether there is a business interest and business case for this: is there a use case for publishers as well as for search engines? But if the answer is yes on both, than this may be an important thing to do.

I do know Guha personally relatively well, as well as Dan Brickley, who is the other person running schema.org's vocabulary development. I would be happy to make the links and go into the discussions but, of course, the question is whether publishers, as well as institutions like Bowker, would be interested by something like that. I think that clarifying this, ie, set up the use cases, would be perfectly in line with the IG's charter (although we probably would have to spawn a different group to make the specification itself, but that is all right.)

What do you think?

Ivan

[1] http://www.publishersweekly.com/pw/by-topic/international/london-book-fair/article/61722-london-book-fair-2014-publishers-and-internet-standards.html

----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf






This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.
This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.

Received on Tuesday, 8 April 2014 17:12:47 UTC