RE: schema.org and ONIX... from Bill Kasdorf on 2014-04-08 (public-digipub-ig@w3.org from April 2014)

From: Bill Kasdorf <bkasdorf@apexcovantage.com>
Date: Tue, 8 Apr 2014 21:51:25 +0000
To: Ivan Herman <ivan@w3.org>
CC: Luc Audrain <LAUDRAIN@hachette-livre.fr>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <4ac07f0919494e6cafdbd86d9f29551d@CO2PR06MB572.namprd06.prod.outlook.com>
I will have to comment later on the meatier parts of this message, but:

--Re "We should not underestimate the amount of work": This is why I was suggesting starting with Thema. It is actually just a vocabulary for subject classifications, so it probably just pertains to an already-existing property of schema.org. What I was hearing from several of my interviews was the need to associate subject metadata below the level of the publication, which schema.org gets us (remember not all of these publications are "on the Web" thought they should still be able to use the OWP). As Phil Madans pointed out, Thema is pretty "bare bones" compared to BISAC, but I would suggest that that's a virtue in this context. BISAC is so huge and complex that publishers often don't "get it right" and recipients like Bowker and Nielsen feel they have to "fix" it (Apex has done this work for both of them for many years). Thema can't describe things at as meaningful a level of detail but on the other hand it would be easy to implement and has the big virtue of being a long-needed global subject vocabulary. And compared to ONIX: well, there's another gigantic set of metadata; Thema is just one tiny slice of what is in ONIX. It's not an either/or; Thema (that is, subject classifications in general) is one of many things that ONIX accommodates, but ONIX is not the _only_ place Thema (or BISAC, or BIC, etc.) are used. Strikes me as a good place to start. PLUS (here's a big one): ONIX (as we are normally thinking about it) is just for BOOKS!!!! (It's supply chain metadata, a messaging format for the book supply chain.) I keep pointing out that we are talking about PUBLICATIONS. Journals and magazines and newspapers and corporate publications etc. don't know from ONIX, they have their own schemes. But I think Thema subject classifications might be useful to them as well (e.g. I have gotten IPTC interested in it; their news schemes are not the same thing).

--Re timing of a call: I'm back next Tuesday and am available the rest of next week and all the following week (gone again most of the last week of the month). My main concern is that I would prefer this NOT be discussed in detail in this coming Monday's call because I will not be able to join that one.

-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org] 
Sent: Tuesday, April 08, 2014 5:32 PM
To: Bill Kasdorf
Cc: Luc Audrain; W3C Digital Publishing IG
Subject: Re: schema.org and ONIX...

Wow, I see I have did strike some chord here:-) which is great.

On a very practical level: yes, I believe having a separate call discussing this would be good and useful. Like Bill, I am out this week; being at the WWW2014 conference in Seoul is obviously an obstacle (as an aside, I will speak about digital publishing this afternoon as well as on Friday on another local event, so continue doing my preaching:-). I will also have some days off around Easter week-end. When could we, roughly have a call? We could set up a doodle if we have some available periods: next week, the week after, both, neither?

I cannot judge the THEMA/ONIX issue, I leave this to you guys. My question is different, though. Where do ONIX data reside these days? As I said, if it is hidden in databases only, then it is invisible to Google, hence schema.org may be useless. Put it another way, is there enough pages on the Web, usually crawled by Google that does or may include ONIX data? I would certainly hope so, but we have to be sure (and you have to tell me...).

Another point worth knowing about. When schema.org came about, it was focused on HTML pages that use microdata syntax to add schema.org terms (RDFa Lite followed after a while). This is of course possible, but, for many sites, this was a bit awkward: systems may have that type of metadata in databases with the HTML pages generated automatically, and artificially adding microdata to the pages was an extra hassle. As a result, about a year ago, schema.org added the possibility to add JSON-LD into an HTML page using a special <script> tag. That made the life for such systems way easier and I suspect that this is also something that this industry may take an advantage of. (Schema.org has recently renewed their pages with examples in three syntaxes everywhere; eg, scroll to the bottom of [1].)

Finally, we have to realize one more thing. The work to be done is not 'simply' to convert a mini-ONIX into schema.org. The work is to harmonize this, whenever possible, with what is already in schema.org (see [1] below) and add the missing properties and classes or modify the description of existing ones. We should not underestimate the amount of work...

Cheers

Ivan


[1] http://schema.org/Book



On 09 Apr 2014, at 24:10 , Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:

> Just going through the responses . . . and as for this one, regrettably, Luc, I will not be able to attend LBF this year. So if you've been looking for me, you can stop trying . . . ;-) but I would love to talk with you about this in any case. BTW I will have to miss the DPUB call next week.
> 
> -----Original Message-----
> From: AUDRAIN LUC [mailto:LAUDRAIN@hachette-livre.fr] 
> Sent: Tuesday, April 08, 2014 4:12 AM
> To: Ivan Herman
> Cc: Bill Kasdorf; W3C Digital Publishing IG
> Subject: Re: schema.org and ONIX...
> 
> Hi Ivan and Bill,
> 
> That's a very good exercise and I will share thoughts with Bill at London Book Fair if possible. 
> I'm really interested as I'm wondering what it will bring for more ebooks discoverability on the Web beyond the ONIX feeds we provide already to distributors and digital bookstores. 
> 
> Best,
> Luc
> 
> 
>> Le 8 avr. 2014 à 05:24, "Ivan Herman" <ivan@w3.org> a écrit :
>> 
>> Bill,
>> 
>> I am currently at a Linked Data Workshop at a conference in Seoul, which had a keynote from R. Guha, who is, in some sense, the "father" of schema.org. Listening to him (combining also with my past experience), and also referring to the note I sent around earlier this morning[1] I am more and more serious in thinking that a stripped-down version of ONIX defined in schema.org might be a great idea. Of course, we have to see whether there is a business interest and business case for this: is there a use case for publishers as well as for search engines? But if the answer is yes on both, than this may be an important thing to do.
>> 
>> I do know Guha personally relatively well, as well as Dan Brickley, who is the other person running schema.org's vocabulary development. I would be happy to make the links and go into the discussions but, of course, the question is whether publishers, as well as institutions like Bowker, would be interested by something like that. I think that clarifying this, ie, set up the use cases, would be perfectly in line with the IG's charter (although we probably would have to spawn a different group to make the specification itself, but that is all right.)
>> 
>> What do you think?
>> 
>> Ivan
>> 
>> [1] http://www.publishersweekly.com/pw/by-topic/international/london-book-fair/article/61722-london-book-fair-2014-publishers-and-internet-standards.html
>> 
>> ----
>> Ivan Herman, W3C 
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> GPG: 0x343F1A3D
>> FOAF: http://www.ivan-herman.net/foaf
>> 
>> 
>> 
>> 
>> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf
Received on Tuesday, 8 April 2014 21:51:56 UTC