W3C home > Mailing lists > Public > public-digipub-ig@w3.org > April 2014

RE: schema.org and ONIX...

From: Julie Morris <Julie@bisg.org>
Date: Mon, 14 Apr 2014 14:19:54 +0000
To: Gerardo Capiel <gerardoc@benetech.org>, Ivan Herman <ivan@w3.org>
CC: Madi Weland Solomon <madi.solomon@pearson.com>, Bill Kasdorf <bkasdorf@apexcovantage.com>, "Madans, Phil" <Phil.Madans@hbgusa.com>, "Luc Audrain" <LAUDRAIN@hachette-livre.fr>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <7DE8F103C3FB7F43BBB9AAC838CB59F23AE3066F@mbx022-w1-ca-4.exch022.domain.local>
Gerardo, this is excellent!  http://www.a11ymetadata.org/the-specification/metadata-crosswalk/

In our conversations about schema.org and ONIX within BISG, we've been looking at both accessibility and LRMI metadata in schema.org in particular, because it seems to map quite easily to ONIX, and use cases are immediately apparent. It looks like your work with the crosswalk accomplishes exactly what we had in mind for the accessibility portion.

Our Common Core Working Group is about to publish a document with recommendations for Common Core in ONIX, which contains a table mapping essential ONIX metadata for educational standards to the schema.org/LRMI properties. It should be ready within a month, and I will share it once it's complete.

One thing that we came across within the Common Core work, and that I think will come up again as more ONIX elements are mapped to schema.org, is that because of the distinct purposes of ONIX and schema.org, a 1:1 mapping is not always possible, or useful. This may be obvious, but wanted to point it out in any case.

Looking forward to continuing this conversation.


Julie Morris
Project Manager: Standards & Best Practices
Book Industry Study Group | BISG.org<http://www.bisg.org/>
Tel: 646-336-7141 Ext 14
Email: julie@bisg.org<mailto:julie@bisg.org>

From: Gerardo Capiel [mailto:gerardoc@benetech.org]
Sent: Friday, April 11, 2014 9:33 PM
To: Ivan Herman
Cc: Madi Weland Solomon; Bill Kasdorf; Madans, Phil; Luc Audrain; W3C Digital Publishing IG
Subject: Re: schema.org and ONIX...

I've been meaning to reply to this thread all week, but have finally had a free moment.  Benetech recently went through the process of proposing to Schema.org<http://Schema.org> new properties related to accessibility (e.g., http://schema.org/accessibilityFeature).  EDItEUR was partially involved in our working group for defining the accessibility properties based on their experience with accessibility fields in ONIX (Code List 196<http://www.editeur.org/files/ONIX%20for%20books%20-%20code%20lists/ONIX_BookProduct_CodeLists_Issue_24.html>).  We created a crosswalk between the ONIX accessibility properties and the Schema.org<http://Schema.org> properties and recommended terms/enumerations:

EPUB 3.0.1 also supports Schema.org<http://Schema.org> properties and the accessibility properties are included as part of the EPUB 3 Accessibility Guidelines:

The primary use cases for Schema.org<http://Schema.org> revolve around search.  At this time, it seems Google is choosing to provide filters on most of the new Schema.org<http://Schema.org> properties via it's Custom Search Engine product:


For example here's a search that leverages both accessibility<http://a11ymetadata.org> and Learning Resource Metadata Initiative (LRMI<http://lrmi.net>) properties (cited by Madi below) to find science books which have images descriptions and are targeted at students aged 13:

Here's another search for any resource where the LRMI AlignmentObject type property targetName is narrowed to "Physics" and the keyword "light" is used:

The Schema.org/LRMI<http://Schema.org/LRMI> AlignmentObject is quite interesting in that it allows for multiple types of vocabularies for defining educational standards (e.g. Common Core):

The folks at Centre for Educational Technology, Interoperability and Standards in the U.K. recently published a blog post regarding CSE and LRMI:

It's likely that Schema.org<http://Schema.org> already has many of properties we need and then the question is what guidance should we give to publishers regarding the proper use of the properties and what vocabularies to use (e.g., Thema, Bowker, LCC), since Schema.org<http://Schema.org> does not dictate vocabularies.  There are number of new proposals to Schema.org<http://Schema.org> that may be of interest:


My recommendation if we were to add additional properties is to have Ivan involve Dan Brickley who was instrumental in accessibility gaining Schema.org<http://Schema.org> adoption.


Gerardo Capiel
VP of Engineering

650-644-3405 - Twitter: @gcapiel<http://twitter.com/gcapiel> - GPG: 0x859F11C4
Fork, Code, Do Social Good: http://benetech.github.com/

On Apr 11, 2014, at 2:26 PM, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>

On 11 Apr 2014, at 22:49 , Solomon, Madi <madi.solomon@pearson.com<mailto:madi.solomon@pearson.com>> wrote:

Thanks Ivan for sparking this.  I'm with Bill on the Thema subject-vocab starter, and can offer Use Cases around schema.org<http://schema.org>.  Pearson is committed to the Learning Resource Metadata, the educational extension of schema.org<http://schema.org> and recognises Subject as major entry point for education.

On a related note, there has been some recent activity in the Open Linked Education Data Community Group, which I Chair but have woefully neglected, that involves the Open University and Open Knowledge Foundation.  Details to share once I have them, but there are possibilities here.

Ivan and I have approached Graham Bell and EDitEUr to explore the possibility of providing Thema subject terms with URIs, to which he was intrigued but hesitant.  Might be good to check in with him again?

I think that is is absolutely necessary to contact him indeed. Maybe we should have our own ideas clarified first; I am not really in position to make a choice (if there is such a choice) between Thema and Onix, for example...

Look forward to finding our way together on this.   Count me in.

It is still not clear when we should have a chat to speed up things. Bill?


Madi Solomon

Madi Weland Solomon
Director, Semantic Platforms and Metadata
>From US: (011 44) 207 010 2335
D: +44 (0)20 7010 2335
M: +44 (0)79 7077 3449

On 9 April 2014 15:48, Bill Kasdorf <bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>> wrote:
Thanks, Phil, very helpful as always.

This thread has turned a lightbulb on for me (in fact a couple).

First: we are really talking about two distinctly different use cases here:
--Transmitting publication-level metadata (for which a subset of ONIX in schema.org<http://schema.org> is what we are looking at doing).
--Embedding subject metadata at all levels in a publication to make it discoverable and enable drilling down to points within a publication based on that subject metadata (for which I was suggesting Thema would be the place to start, using the schema.org<http://schema.org> mechanism).

Those are really two related but different things, and I think they are both important to do.

Thanks so much for your clarification on Thema! I have not studied it, and I had always understood it to be much simpler than BISAC. Glad to be corrected on that!

One important issue we have here is what I think of as the "comprehensive vs. concise" dilemma.

--I personally always gravitate to "comprehensive" solutions, e.g.. "publishers want more precise descriptions, which require much more extensive vocabularies"; "different types of publishers use different schemes and vocabularies (most of them extensive for the above reason) and we need to let them do that"; "keywords, without a controlled vocabulary, are something many publishers want to use"; etc. Let a thousand flowers bloom! ;-) (AKA "good luck with that.")

--The problem is that from the point of view of any receiving system, this quickly becomes unworkable. Systems want things that are clear, specific, and simple so that functionality can be reliably delivered in a programmatic fashion. That's why schema.org<http://schema.org> vocabularies are typically so much more bare-bones than the vocabularies used by the various interest groups (book publishers, magazine publishers, educational publishers, news publishers, journal publishers, etc..). The receiving system says "don't tell me what I _might_ get, tell me, if you want me to do X, what I _will_ get."

A classic example for which I must assume at least part of the blame: the metadata model in EPUB 3. That model can actually _already_ express all of the above. No problem. It's already there. But guess what? No reading system that I know of actually does _anything_ with that metadata. Being Mr. Idealistic, I still hope they will. And within certain closed systems (known sources, known recipients, agreed-upon process and vocabulary) it can work just fine. But if I had held my breath for our wonderful <meta> and prefix mechanism to get any actual use in the real world I would have been dead long ago. ;-)

--Bill K

-----Original Message-----
From: Madans, Phil [mailto:Phil.Madans@hbgusa.com<http://hbgusa.com>]
Sent: Wednesday, April 09, 2014 10:09 AM
To: Bill Kasdorf; Ivan Herman
Cc: Luc Audrain; W3C Digital Publishing IG
Subject: RE: schema.org<http://schema.org> and ONIX...

As far as a separate meeting to discuss.  I am out most of next week but have some availability Tuesday and Wednesday.  Otherwise I'll be back on the 22nd and free after that.

A couple of other points.  ONIX is a message transmitted among trading partners, so it does mostly reside in those databases.  Also ONIX needs to be parsed.  A lot of the data is transmitted using code lists, including BISAC Categories.  You can the literals if you want, of course. One of the issues with ONIX is that ONIX records vary wildly by sender. By the way, ONIX for Books is only one of the available ONIX messages.  There is ONIX for Subscription Products and ONIX for Licensing Terms and Rights. But Bill's point is spot on.  There is no metadata scheme used by Publishing as a whole.

Bill, maybe I misstated my thoughts on Thema.  Thema is not more bare bones than BISAC categories. BISAC has 3822 codes.  Thema has 2497 codes plus another 2000 qualifiers for geography, etc. (Thanks, Dave Cramer, for counting:)).  It is actually more complex than BISAC in that sense.  There are mappings from the existing Subject Classifications to Thema, but they are necessarily high level and so even less granular. This is not a push for BISAC by any  means.  I don't think any of the Subject Classifications are what we are looking for.  They are all good for what they do,  I just don't think they are what we want. Although when we were talking in BISG about creating a new vocabulary more geared to online search, Google was mentioned as having a very good one, which makes sense.  We never went further in the conversation and decided to create a Best Practice for Keyword creation instead--which should be published in the next month or two.

Keywords should be part of our discussion.  There is going to be a lot of activity around Keywords here in the US very shortly. Book Publishers are looking at Keywords to help search and discovery.


Phil Madans | Executive Director of Digital Publishing Technology | Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 | phil.madans@hbgusa.com<mailto:phil.madans@hbgusa.com>

-----Original Message-----
From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com<http://apexcovantage.com>]
Sent: Tuesday, April 08, 2014 5:51 PM
To: Ivan Herman
Cc: Luc Audrain; W3C Digital Publishing IG
Subject: RE: schema.org<http://schema.org> and ONIX...

I will have to comment later on the meatier parts of this message, but:

--Re "We should not underestimate the amount of work": This is why I was suggesting starting with Thema. It is actually just a vocabulary for subject classifications, so it probably just pertains to an already-existing property of schema.org<http://schema.org>. What I was hearing from several of my interviews was the need to associate subject metadata below the level of the publication, which schema.org<http://schema.org> gets us (remember not all of these publications are "on the Web" thought they should still be able to use the OWP). As Phil Madans pointed out, Thema is pretty "bare bones" compared to BISAC, but I would suggest that that's a virtue in this context. BISAC is so huge and complex that publishers often don't "get it right" and recipients like Bowker and Nielsen feel they have to "fix" it (Apex has done this work for both of them for many years). Thema can't describe things at as meaningful a level of detail but on the other hand it would be easy to implement and has the big virtue of being a long-needed global subject vocabulary. And compared to ONIX: well, there's another gigantic set of metadata; Thema is just one tiny slice of what is in ONIX. It's not an either/or; Thema (that is, subject classifications in general) is one of many things that ONIX accommodates, but ONIX is not the _only_ place Thema (or BISAC, or BIC, etc.) are used. Strikes me as a good place to start. PLUS (here's a big one): ONIX (as we are normally thinking about it) is just for BOOKS!!!! (It's supply chain metadata, a messaging format for the book supply chain.) I keep pointing out that we are talking about PUBLICATIONS. Journals and magazines and newspapers and corporate publications etc. don't know from ONIX, they have their own schemes. But I think Thema subject classifications might be useful to them as well (e.g. I have gotten IPTC interested in it; their news schemes are not the same thing).

--Re timing of a call: I'm back next Tuesday and am available the rest of next week and all the following week (gone again most of the last week of the month). My main concern is that I would prefer this NOT be discussed in detail in this coming Monday's call because I will not be able to join that one.

-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org<http://w3.org>]
Sent: Tuesday, April 08, 2014 5:32 PM
To: Bill Kasdorf
Cc: Luc Audrain; W3C Digital Publishing IG
Subject: Re: schema.org<http://schema.org> and ONIX...

Wow, I see I have did strike some chord here:-) which is great.

On a very practical level: yes, I believe having a separate call discussing this would be good and useful. Like Bill, I am out this week; being at the WWW2014 conference in Seoul is obviously an obstacle (as an aside, I will speak about digital publishing this afternoon as well as on Friday on another local event, so continue doing my preaching:-). I will also have some days off around Easter week-end. When could we, roughly have a call? We could set up a doodle if we have some available periods: next week, the week after, both, neither?

I cannot judge the THEMA/ONIX issue, I leave this to you guys. My question is different, though. Where do ONIX data reside these days? As I said, if it is hidden in databases only, then it is invisible to Google, hence schema.org<http://schema.org> may be useless. Put it another way, is there enough pages on the Web, usually crawled by Google that does or may include ONIX data? I would certainly hope so, but we have to be sure (and you have to tell me...).

Another point worth knowing about. When schema.org<http://schema.org> came about, it was focused on HTML pages that use microdata syntax to add schema.org<http://schema.org> terms (RDFa Lite followed after a while). This is of course possible, but, for many sites, this was a bit awkward: systems may have that type of metadata in databases with the HTML pages generated automatically, and artificially adding microdata to the pages was an extra hassle. As a result, about a year ago, schema.org<http://schema.org> added the possibility to add JSON-LD into an HTML page using a special <script> tag. That made the life for such systems way easier and I suspect that this is also something that this industry may take an advantage of. (Schema.org<http://Schema.org> has recently renewed their pages with examples in three syntaxes everywhere; eg, scroll to the bottom of [1].)

Finally, we have to realize one more thing. The work to be done is not 'simply' to convert a mini-ONIX into schema.org<http://schema.org>. The work is to harmonize this, whenever possible, with what is already in schema.org<http://schema.org> (see [1] below) and add the missing properties and classes or modify the description of existing ones. We should not underestimate the amount of work...



[1] http://schema.org/Book

On 09 Apr 2014, at 24:10 , Bill Kasdorf <bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>> wrote:

Just going through the responses . . . and as for this one, regrettably, Luc, I will not be able to attend LBF this year. So if you've been looking for me, you can stop trying . . . ;-) but I would love to talk with you about this in any case. BTW I will have to miss the DPUB call next week..

-----Original Message-----
From: AUDRAIN LUC [mailto:LAUDRAIN@hachette-livre.fr<http://hachette-livre.fr>]
Sent: Tuesday, April 08, 2014 4:12 AM
To: Ivan Herman
Cc: Bill Kasdorf; W3C Digital Publishing IG
Subject: Re: schema.org<http://schema.org> and ONIX...

Hi Ivan and Bill,

That's a very good exercise and I will share thoughts with Bill at London Book Fair if possible.
I'm really interested as I'm wondering what it will bring for more ebooks discoverability on the Web beyond the ONIX feeds we provide already to distributors and digital bookstores.


Le 8 avr. 2014 à 05:24, "Ivan Herman" <ivan@w3.org<mailto:ivan@w3.org>> a écrit :


I am currently at a Linked Data Workshop at a conference in Seoul, which had a keynote from R. Guha, who is, in some sense, the "father" of schema.org<http://schema.org>. Listening to him (combining also with my past experience), and also referring to the note I sent around earlier this morning[1] I am more and more serious in thinking that a stripped-down version of ONIX defined in schema.org<http://schema.org> might be a great idea. Of course, we have to see whether there is a business interest and business case for this: is there a use case for publishers as well as for search engines? But if the answer is yes on both, than this may be an important thing to do.

I do know Guha personally relatively well, as well as Dan Brickley, who is the other person running schema.org<http://schema.org>'s vocabulary development. I would be happy to make the links and go into the discussions but, of course, the question is whether publishers, as well as institutions like Bowker, would be interested by something like that. I think that clarifying this, ie, set up the use cases, would be perfectly in line with the IG's charter (although we probably would have to spawn a different group to make the specification itself, but that is all right.)

What do you think?


[1] http://www.publishersweekly.com/pw/by-topic/international/london-book-fair/article/61722-london-book-fair-2014-publishers-and-internet-standards.html

Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.

Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf
Received on Monday, 14 April 2014 14:20:29 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:35:50 UTC