Re: schema.org and ONIX... from Gerardo Capiel on 2014-04-12 (public-digipub-ig@w3.org from April 2014)

From: Gerardo Capiel <gerardoc@benetech.org>
Date: Sat, 12 Apr 2014 01:32:33 +0000
To: Ivan Herman <ivan@w3.org>
CC: Madi Weland Solomon <madi.solomon@pearson.com>, Bill Kasdorf <bkasdorf@apexcovantage.com>, "Madans, Phil" <Phil.Madans@hbgusa.com>, "Luc Audrain" <LAUDRAIN@hachette-livre.fr>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <AF42D6D2-5D21-47CB-A30A-88AF82DA7FC6@benetech.org>
I've been meaning to reply to this thread all week, but have finally had a free moment.  Benetech recently went through the process of proposing to Schema.org new properties related to accessibility (e.g., http://schema.org/accessibilityFeature).  EDItEUR was partially involved in our working group for defining the accessibility properties based on their experience with accessibility fields in ONIX (Code List 196).  We created a crosswalk between the ONIX accessibility properties and the Schema.org properties and recommended terms/enumerations:
http://www.a11ymetadata.org/the-specification/metadata-crosswalk/

EPUB 3.0.1 also supports Schema.org properties and the accessibility properties are included as part of the EPUB 3 Accessibility Guidelines:
http://www.idpf.org/epub/301/spec/epub-changes.html#sec-pub-reserved-prefixes
http://www.idpf.org/accessibility/guidelines/content/meta/schema.org.php

The primary use cases for Schema.org revolve around search.  At this time, it seems Google is choosing to provide filters on most of the new Schema.org properties via it's Custom Search Engine product:

http://googlecustomsearch.blogspot.com/2014/03/create-search-engine-with-schemaorg.html
http://googlecustomsearch.blogspot.com/2013/12/use-your-expertise-build-topical-search.html

For example here's a search that leverages both accessibility and Learning Resource Metadata Initiative (LRMI) properties (cited by Madi below) to find science books which have images descriptions and are targeted at students aged 13:
http://www.google.com/cse?cx=001043429226464649088%3As0bmhsefbzq&ie=UTF-8&q=science+more%3Ap%3Abook-typicalAgeRange%3A13+more%3Ap%3Abook-accessibilityfeature%3Aalternativetext%2Clongdescription

Here's another search for any resource where the LRMI AlignmentObject type property targetName is narrowed to "Physics" and the keyword "light" is used:
http://www.google.com/cse?cx=001043429226464649088%3As0bmhsefbzq&ie=UTF-8&q=light+more%3Ap%3Aalignmentobject-targetName%3APhysics

The Schema.org/LRMI AlignmentObject is quite interesting in that it allows for multiple types of vocabularies for defining educational standards (e.g. Common Core):
http://schema.org/AlignmentObject

The folks at Centre for Educational Technology, Interoperability and Standards in the U.K. recently published a blog post regarding CSE and LRMI:
https://groups.google.com/forum/#!topic/lrmi/S0dkMahh2UU

It's likely that Schema.org already has many of properties we need and then the question is what guidance should we give to publishers regarding the proper use of the properties and what vocabularies to use (e.g., Thema, Bowker, LCC), since Schema.org does not dictate vocabularies.  There are number of new proposals to Schema.org that may be of interest:

http://www.w3.org/wiki/WebSchemas/ScholarlyArticle
http://www.w3.org/wiki/WebSchemas/Audiobook
http://www.w3.org/community/schemabibex/

My recommendation if we were to add additional properties is to have Ivan involve Dan Brickley who was instrumental in accessibility gaining Schema.org adoption.

Gerardo

Gerardo Capiel
VP of Engineering
benetech

650-644-3405 - Twitter: @gcapiel - GPG: 0x859F11C4
Fork, Code, Do Social Good: http://benetech.github.com/

On Apr 11, 2014, at 2:26 PM, Ivan Herman <ivan@w3.org>
 wrote:

> 
> On 11 Apr 2014, at 22:49 , Solomon, Madi <madi.solomon@pearson.com> wrote:
> 
>> Thanks Ivan for sparking this.  I'm with Bill on the Thema subject-vocab starter, and can offer Use Cases around schema.org.  Pearson is committed to the Learning Resource Metadata, the educational extension of schema.org and recognises Subject as major entry point for education. 
>> 
>> On a related note, there has been some recent activity in the Open Linked Education Data Community Group, which I Chair but have woefully neglected, that involves the Open University and Open Knowledge Foundation.  Details to share once I have them, but there are possibilities here.
>> 
>> Ivan and I have approached Graham Bell and EDitEUr to explore the possibility of providing Thema subject terms with URIs, to which he was intrigued but hesitant.  Might be good to check in with him again?
> 
> I think that is is absolutely necessary to contact him indeed. Maybe we should have our own ideas clarified first; I am not really in position to make a choice (if there is such a choice) between Thema and Onix, for example...
> 
>> 
>> Look forward to finding our way together on this.   Count me in.
> 
> It is still not clear when we should have a chat to speed up things. Bill?
> 
> Ivan
> 
>> 
>> Madi Solomon
>> 
>> Madi Weland Solomon
>> Director, Semantic Platforms and Metadata
>> From US: (011 44) 207 010 2335
>> D: +44 (0)20 7010 2335
>> M: +44 (0)79 7077 3449
>> 
>> 
>> 
>> 
>> 
>> On 9 April 2014 15:48, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
>> Thanks, Phil, very helpful as always.
>> 
>> This thread has turned a lightbulb on for me (in fact a couple).
>> 
>> First: we are really talking about two distinctly different use cases here:
>> --Transmitting publication-level metadata (for which a subset of ONIX in schema.org is what we are looking at doing).
>> --Embedding subject metadata at all levels in a publication to make it discoverable and enable drilling down to points within a publication based on that subject metadata (for which I was suggesting Thema would be the place to start, using the schema.org mechanism).
>> 
>> Those are really two related but different things, and I think they are both important to do.
>> 
>> Thanks so much for your clarification on Thema! I have not studied it, and I had always understood it to be much simpler than BISAC. Glad to be corrected on that!
>> 
>> One important issue we have here is what I think of as the "comprehensive vs. concise" dilemma.
>> 
>> --I personally always gravitate to "comprehensive" solutions, e.g.. "publishers want more precise descriptions, which require much more extensive vocabularies"; "different types of publishers use different schemes and vocabularies (most of them extensive for the above reason) and we need to let them do that"; "keywords, without a controlled vocabulary, are something many publishers want to use"; etc. Let a thousand flowers bloom! ;-) (AKA "good luck with that.")
>> 
>> --The problem is that from the point of view of any receiving system, this quickly becomes unworkable. Systems want things that are clear, specific, and simple so that functionality can be reliably delivered in a programmatic fashion. That's why schema.org vocabularies are typically so much more bare-bones than the vocabularies used by the various interest groups (book publishers, magazine publishers, educational publishers, news publishers, journal publishers, etc..). The receiving system says "don't tell me what I _might_ get, tell me, if you want me to do X, what I _will_ get."
>> 
>> A classic example for which I must assume at least part of the blame: the metadata model in EPUB 3. That model can actually _already_ express all of the above. No problem. It's already there. But guess what? No reading system that I know of actually does _anything_ with that metadata. Being Mr. Idealistic, I still hope they will. And within certain closed systems (known sources, known recipients, agreed-upon process and vocabulary) it can work just fine. But if I had held my breath for our wonderful <meta> and prefix mechanism to get any actual use in the real world I would have been dead long ago. ;-)
>> 
>> --Bill K
>> 
>> -----Original Message-----
>> From: Madans, Phil [mailto:Phil.Madans@hbgusa.com]
>> Sent: Wednesday, April 09, 2014 10:09 AM
>> To: Bill Kasdorf; Ivan Herman
>> Cc: Luc Audrain; W3C Digital Publishing IG
>> Subject: RE: schema.org and ONIX...
>> 
>> As far as a separate meeting to discuss.  I am out most of next week but have some availability Tuesday and Wednesday.  Otherwise I'll be back on the 22nd and free after that.
>> 
>> A couple of other points.  ONIX is a message transmitted among trading partners, so it does mostly reside in those databases.  Also ONIX needs to be parsed.  A lot of the data is transmitted using code lists, including BISAC Categories.  You can the literals if you want, of course. One of the issues with ONIX is that ONIX records vary wildly by sender. By the way, ONIX for Books is only one of the available ONIX messages.  There is ONIX for Subscription Products and ONIX for Licensing Terms and Rights. But Bill's point is spot on.  There is no metadata scheme used by Publishing as a whole.
>> 
>> Bill, maybe I misstated my thoughts on Thema.  Thema is not more bare bones than BISAC categories. BISAC has 3822 codes.  Thema has 2497 codes plus another 2000 qualifiers for geography, etc. (Thanks, Dave Cramer, for counting:)).  It is actually more complex than BISAC in that sense.  There are mappings from the existing Subject Classifications to Thema, but they are necessarily high level and so even less granular. This is not a push for BISAC by any  means.  I don't think any of the Subject Classifications are what we are looking for.  They are all good for what they do,  I just don't think they are what we want. Although when we were talking in BISG about creating a new vocabulary more geared to online search, Google was mentioned as having a very good one, which makes sense.  We never went further in the conversation and decided to create a Best Practice for Keyword creation instead--which should be published in the next month or two.
>> 
>> Keywords should be part of our discussion.  There is going to be a lot of activity around Keywords here in the US very shortly. Book Publishers are looking at Keywords to help search and discovery.
>> 
>> Phil
>> 
>> ------------------------------------------------------------
>> Phil Madans | Executive Director of Digital Publishing Technology | Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 | phil.madans@hbgusa.com
>> 
>> -----Original Message-----
>> From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
>> Sent: Tuesday, April 08, 2014 5:51 PM
>> To: Ivan Herman
>> Cc: Luc Audrain; W3C Digital Publishing IG
>> Subject: RE: schema.org and ONIX...
>> 
>> I will have to comment later on the meatier parts of this message, but:
>> 
>> --Re "We should not underestimate the amount of work": This is why I was suggesting starting with Thema. It is actually just a vocabulary for subject classifications, so it probably just pertains to an already-existing property of schema.org. What I was hearing from several of my interviews was the need to associate subject metadata below the level of the publication, which schema.org gets us (remember not all of these publications are "on the Web" thought they should still be able to use the OWP). As Phil Madans pointed out, Thema is pretty "bare bones" compared to BISAC, but I would suggest that that's a virtue in this context. BISAC is so huge and complex that publishers often don't "get it right" and recipients like Bowker and Nielsen feel they have to "fix" it (Apex has done this work for both of them for many years). Thema can't describe things at as meaningful a level of detail but on the other hand it would be easy to implement and has the big virtue of being a long-needed global subject vocabulary. And compared to ONIX: well, there's another gigantic set of metadata; Thema is just one tiny slice of what is in ONIX. It's not an either/or; Thema (that is, subject classifications in general) is one of many things that ONIX accommodates, but ONIX is not the _only_ place Thema (or BISAC, or BIC, etc.) are used. Strikes me as a good place to start. PLUS (here's a big one): ONIX (as we are normally thinking about it) is just for BOOKS!!!! (It's supply chain metadata, a messaging format for the book supply chain.) I keep pointing out that we are talking about PUBLICATIONS. Journals and magazines and newspapers and corporate publications etc. don't know from ONIX, they have their own schemes. But I think Thema subject classifications might be useful to them as well (e.g. I have gotten IPTC interested in it; their news schemes are not the same thing).
>> 
>> --Re timing of a call: I'm back next Tuesday and am available the rest of next week and all the following week (gone again most of the last week of the month). My main concern is that I would prefer this NOT be discussed in detail in this coming Monday's call because I will not be able to join that one.
>> 
>> -----Original Message-----
>> From: Ivan Herman [mailto:ivan@w3.org]
>> Sent: Tuesday, April 08, 2014 5:32 PM
>> To: Bill Kasdorf
>> Cc: Luc Audrain; W3C Digital Publishing IG
>> Subject: Re: schema.org and ONIX...
>> 
>> Wow, I see I have did strike some chord here:-) which is great.
>> 
>> On a very practical level: yes, I believe having a separate call discussing this would be good and useful. Like Bill, I am out this week; being at the WWW2014 conference in Seoul is obviously an obstacle (as an aside, I will speak about digital publishing this afternoon as well as on Friday on another local event, so continue doing my preaching:-). I will also have some days off around Easter week-end. When could we, roughly have a call? We could set up a doodle if we have some available periods: next week, the week after, both, neither?
>> 
>> I cannot judge the THEMA/ONIX issue, I leave this to you guys. My question is different, though. Where do ONIX data reside these days? As I said, if it is hidden in databases only, then it is invisible to Google, hence schema.org may be useless. Put it another way, is there enough pages on the Web, usually crawled by Google that does or may include ONIX data? I would certainly hope so, but we have to be sure (and you have to tell me...).
>> 
>> Another point worth knowing about. When schema.org came about, it was focused on HTML pages that use microdata syntax to add schema.org terms (RDFa Lite followed after a while). This is of course possible, but, for many sites, this was a bit awkward: systems may have that type of metadata in databases with the HTML pages generated automatically, and artificially adding microdata to the pages was an extra hassle. As a result, about a year ago, schema.org added the possibility to add JSON-LD into an HTML page using a special <script> tag. That made the life for such systems way easier and I suspect that this is also something that this industry may take an advantage of. (Schema.org has recently renewed their pages with examples in three syntaxes everywhere; eg, scroll to the bottom of [1].)
>> 
>> Finally, we have to realize one more thing. The work to be done is not 'simply' to convert a mini-ONIX into schema.org. The work is to harmonize this, whenever possible, with what is already in schema.org (see [1] below) and add the missing properties and classes or modify the description of existing ones. We should not underestimate the amount of work...
>> 
>> Cheers
>> 
>> Ivan
>> 
>> 
>> [1] http://schema.org/Book
>> 
>> 
>> 
>> On 09 Apr 2014, at 24:10 , Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
>> 
>>> Just going through the responses . . . and as for this one, regrettably, Luc, I will not be able to attend LBF this year. So if you've been looking for me, you can stop trying . . . ;-) but I would love to talk with you about this in any case. BTW I will have to miss the DPUB call next week..
>>> 
>>> -----Original Message-----
>>> From: AUDRAIN LUC [mailto:LAUDRAIN@hachette-livre.fr]
>>> Sent: Tuesday, April 08, 2014 4:12 AM
>>> To: Ivan Herman
>>> Cc: Bill Kasdorf; W3C Digital Publishing IG
>>> Subject: Re: schema.org and ONIX...
>>> 
>>> Hi Ivan and Bill,
>>> 
>>> That's a very good exercise and I will share thoughts with Bill at London Book Fair if possible.
>>> I'm really interested as I'm wondering what it will bring for more ebooks discoverability on the Web beyond the ONIX feeds we provide already to distributors and digital bookstores.
>>> 
>>> Best,
>>> Luc
>>> 
>>> 
>>>> Le 8 avr. 2014 à 05:24, "Ivan Herman" <ivan@w3.org> a écrit :
>>>> 
>>>> Bill,
>>>> 
>>>> I am currently at a Linked Data Workshop at a conference in Seoul, which had a keynote from R. Guha, who is, in some sense, the "father" of schema.org. Listening to him (combining also with my past experience), and also referring to the note I sent around earlier this morning[1] I am more and more serious in thinking that a stripped-down version of ONIX defined in schema.org might be a great idea. Of course, we have to see whether there is a business interest and business case for this: is there a use case for publishers as well as for search engines? But if the answer is yes on both, than this may be an important thing to do.
>>>> 
>>>> I do know Guha personally relatively well, as well as Dan Brickley, who is the other person running schema.org's vocabulary development. I would be happy to make the links and go into the discussions but, of course, the question is whether publishers, as well as institutions like Bowker, would be interested by something like that. I think that clarifying this, ie, set up the use cases, would be perfectly in line with the IG's charter (although we probably would have to spawn a different group to make the specification itself, but that is all right.)
>>>> 
>>>> What do you think?
>>>> 
>>>> Ivan
>>>> 
>>>> [1] http://www.publishersweekly.com/pw/by-topic/international/london-book-fair/article/61722-london-book-fair-2014-publishers-and-internet-standards.html
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> GPG: 0x343F1A3D
>>>> FOAF: http://www.ivan-herman.net/foaf
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> GPG: 0x343F1A3D
>> FOAF: http://www.ivan-herman.net/foaf
>> 
>> 
>> 
>> 
>> 
>> 
>> This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C 
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> FOAF: http://www.ivan-herman.net/foaf
> 
> 
> 
> 
>
Received on Saturday, 12 April 2014 01:33:07 UTC