Re: schema.org and ONIX... from Solomon, Madi on 2014-04-11 (public-digipub-ig@w3.org from April 2014)

From: Solomon, Madi <madi.solomon@pearson.com>
Date: Fri, 11 Apr 2014 14:49:34 +0100
To: Bill Kasdorf <bkasdorf@apexcovantage.com>
Cc: "Madans, Phil" <Phil.Madans@hbgusa.com>, Ivan Herman <ivan@w3.org>, Luc Audrain <LAUDRAIN@hachette-livre.fr>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <CAJ=MLAr27M2LxqeUHDvgCigNo2LsLad-Mx5ixcpsD8w8P95g4Q@mail.gmail.com>
Thanks Ivan for sparking this.  I'm with Bill on the Thema subject-vocab
starter, and can offer Use Cases around schema.org.  Pearson is committed
to the Learning Resource Metadata, the educational extension of
schema.organd recognises Subject as major entry point for education.

On a related note, there has been some recent activity in the Open Linked
Education Data Community Group, which I Chair but have woefully neglected,
that involves the Open University and Open Knowledge Foundation.  Details
to share once I have them, but there are possibilities here.

Ivan and I have approached Graham Bell and EDitEUr to explore the
possibility of providing Thema subject terms with URIs, to which he was
intrigued but hesitant.  Might be good to check in with him again?

Look forward to finding our way together on this.   Count me in.

Madi Solomon

*Madi Weland Solomon*
Director, Semantic Platforms and Metadata
>From US: (011 44) 207 010 2335
D: +44 (0)20 7010 2335
M: +44 (0)79 7077 3449





On 9 April 2014 15:48, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:

> Thanks, Phil, very helpful as always.
>
> This thread has turned a lightbulb on for me (in fact a couple).
>
> First: we are really talking about two distinctly different use cases here:
> --Transmitting publication-level metadata (for which a subset of ONIX in
> schema.org is what we are looking at doing).
> --Embedding subject metadata at all levels in a publication to make it
> discoverable and enable drilling down to points within a publication based
> on that subject metadata (for which I was suggesting Thema would be the
> place to start, using the schema.org mechanism).
>
> Those are really two related but different things, and I think they are
> both important to do.
>
> Thanks so much for your clarification on Thema! I have not studied it, and
> I had always understood it to be much simpler than BISAC. Glad to be
> corrected on that!
>
> One important issue we have here is what I think of as the "comprehensive
> vs. concise" dilemma.
>
> --I personally always gravitate to "comprehensive" solutions, e.g.
> "publishers want more precise descriptions, which require much more
> extensive vocabularies"; "different types of publishers use different
> schemes and vocabularies (most of them extensive for the above reason) and
> we need to let them do that"; "keywords, without a controlled vocabulary,
> are something many publishers want to use"; etc. Let a thousand flowers
> bloom! ;-) (AKA "good luck with that.")
>
> --The problem is that from the point of view of any receiving system, this
> quickly becomes unworkable. Systems want things that are clear, specific,
> and simple so that functionality can be reliably delivered in a
> programmatic fashion. That's why schema.org vocabularies are typically so
> much more bare-bones than the vocabularies used by the various interest
> groups (book publishers, magazine publishers, educational publishers, news
> publishers, journal publishers, etc.). The receiving system says "don't
> tell me what I _might_ get, tell me, if you want me to do X, what I _will_
> get."
>
> A classic example for which I must assume at least part of the blame: the
> metadata model in EPUB 3. That model can actually _already_ express all of
> the above. No problem. It's already there. But guess what? No reading
> system that I know of actually does _anything_ with that metadata. Being
> Mr. Idealistic, I still hope they will. And within certain closed systems
> (known sources, known recipients, agreed-upon process and vocabulary) it
> can work just fine. But if I had held my breath for our wonderful <meta>
> and prefix mechanism to get any actual use in the real world I would have
> been dead long ago. ;-)
>
> --Bill K
>
> -----Original Message-----
> From: Madans, Phil [mailto:Phil.Madans@hbgusa.com]
> Sent: Wednesday, April 09, 2014 10:09 AM
> To: Bill Kasdorf; Ivan Herman
> Cc: Luc Audrain; W3C Digital Publishing IG
> Subject: RE: schema.org and ONIX...
>
> As far as a separate meeting to discuss.  I am out most of next week but
> have some availability Tuesday and Wednesday.  Otherwise I'll be back on
> the 22nd and free after that.
>
> A couple of other points.  ONIX is a message transmitted among trading
> partners, so it does mostly reside in those databases.  Also ONIX needs to
> be parsed.  A lot of the data is transmitted using code lists, including
> BISAC Categories.  You can the literals if you want, of course. One of the
> issues with ONIX is that ONIX records vary wildly by sender. By the way,
> ONIX for Books is only one of the available ONIX messages.  There is ONIX
> for Subscription Products and ONIX for Licensing Terms and Rights. But
> Bill's point is spot on.  There is no metadata scheme used by Publishing as
> a whole.
>
> Bill, maybe I misstated my thoughts on Thema.  Thema is not more bare
> bones than BISAC categories. BISAC has 3822 codes.  Thema has 2497 codes
> plus another 2000 qualifiers for geography, etc. (Thanks, Dave Cramer, for
> counting:)).  It is actually more complex than BISAC in that sense.  There
> are mappings from the existing Subject Classifications to Thema, but they
> are necessarily high level and so even less granular. This is not a push
> for BISAC by any  means.  I don't think any of the Subject Classifications
> are what we are looking for.  They are all good for what they do,  I just
> don't think they are what we want. Although when we were talking in BISG
> about creating a new vocabulary more geared to online search, Google was
> mentioned as having a very good one, which makes sense.  We never went
> further in the conversation and decided to create a Best Practice for
> Keyword creation instead--which should be published in the next month or
> two.
>
> Keywords should be part of our discussion.  There is going to be a lot of
> activity around Keywords here in the US very shortly. Book Publishers are
> looking at Keywords to help search and discovery.
>
> Phil
>
> ------------------------------------------------------------
> Phil Madans | Executive Director of Digital Publishing Technology |
> Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 |
> phil.madans@hbgusa.com
>
> -----Original Message-----
> From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com]
> Sent: Tuesday, April 08, 2014 5:51 PM
> To: Ivan Herman
> Cc: Luc Audrain; W3C Digital Publishing IG
> Subject: RE: schema.org and ONIX...
>
> I will have to comment later on the meatier parts of this message, but:
>
> --Re "We should not underestimate the amount of work": This is why I was
> suggesting starting with Thema. It is actually just a vocabulary for
> subject classifications, so it probably just pertains to an
> already-existing property of schema.org. What I was hearing from several
> of my interviews was the need to associate subject metadata below the level
> of the publication, which schema.org gets us (remember not all of these
> publications are "on the Web" thought they should still be able to use the
> OWP). As Phil Madans pointed out, Thema is pretty "bare bones" compared to
> BISAC, but I would suggest that that's a virtue in this context. BISAC is
> so huge and complex that publishers often don't "get it right" and
> recipients like Bowker and Nielsen feel they have to "fix" it (Apex has
> done this work for both of them for many years). Thema can't describe
> things at as meaningful a level of detail but on the other hand it would be
> easy to implement and has the big virtue of being a long-needed global
> subject vocabulary. And compared to ONIX: well, there's another gigantic
> set of metadata; Thema is just one tiny slice of what is in ONIX. It's not
> an either/or; Thema (that is, subject classifications in general) is one of
> many things that ONIX accommodates, but ONIX is not the _only_ place Thema
> (or BISAC, or BIC, etc.) are used. Strikes me as a good place to start.
> PLUS (here's a big one): ONIX (as we are normally thinking about it) is
> just for BOOKS!!!! (It's supply chain metadata, a messaging format for the
> book supply chain.) I keep pointing out that we are talking about
> PUBLICATIONS. Journals and magazines and newspapers and corporate
> publications etc. don't know from ONIX, they have their own schemes. But I
> think Thema subject classifications might be useful to them as well (e.g. I
> have gotten IPTC interested in it; their news schemes are not the same
> thing).
>
> --Re timing of a call: I'm back next Tuesday and am available the rest of
> next week and all the following week (gone again most of the last week of
> the month). My main concern is that I would prefer this NOT be discussed in
> detail in this coming Monday's call because I will not be able to join that
> one.
>
> -----Original Message-----
> From: Ivan Herman [mailto:ivan@w3.org]
> Sent: Tuesday, April 08, 2014 5:32 PM
> To: Bill Kasdorf
> Cc: Luc Audrain; W3C Digital Publishing IG
> Subject: Re: schema.org and ONIX...
>
> Wow, I see I have did strike some chord here:-) which is great.
>
> On a very practical level: yes, I believe having a separate call
> discussing this would be good and useful. Like Bill, I am out this week;
> being at the WWW2014 conference in Seoul is obviously an obstacle (as an
> aside, I will speak about digital publishing this afternoon as well as on
> Friday on another local event, so continue doing my preaching:-). I will
> also have some days off around Easter week-end. When could we, roughly have
> a call? We could set up a doodle if we have some available periods: next
> week, the week after, both, neither?
>
> I cannot judge the THEMA/ONIX issue, I leave this to you guys. My question
> is different, though. Where do ONIX data reside these days? As I said, if
> it is hidden in databases only, then it is invisible to Google, hence
> schema.org may be useless. Put it another way, is there enough pages on
> the Web, usually crawled by Google that does or may include ONIX data? I
> would certainly hope so, but we have to be sure (and you have to tell
> me...).
>
> Another point worth knowing about. When schema.org came about, it was
> focused on HTML pages that use microdata syntax to add schema.org terms
> (RDFa Lite followed after a while). This is of course possible, but, for
> many sites, this was a bit awkward: systems may have that type of metadata
> in databases with the HTML pages generated automatically, and artificially
> adding microdata to the pages was an extra hassle. As a result, about a
> year ago, schema.org added the possibility to add JSON-LD into an HTML
> page using a special <script> tag. That made the life for such systems way
> easier and I suspect that this is also something that this industry may
> take an advantage of. (Schema.org has recently renewed their pages with
> examples in three syntaxes everywhere; eg, scroll to the bottom of [1].)
>
> Finally, we have to realize one more thing. The work to be done is not
> 'simply' to convert a mini-ONIX into schema.org. The work is to harmonize
> this, whenever possible, with what is already in schema.org (see [1]
> below) and add the missing properties and classes or modify the description
> of existing ones. We should not underestimate the amount of work...
>
> Cheers
>
> Ivan
>
>
> [1] http://schema.org/Book
>
>
>
> On 09 Apr 2014, at 24:10 , Bill Kasdorf <bkasdorf@apexcovantage.com>
> wrote:
>
> > Just going through the responses . . . and as for this one, regrettably,
> Luc, I will not be able to attend LBF this year. So if you've been looking
> for me, you can stop trying . . . ;-) but I would love to talk with you
> about this in any case. BTW I will have to miss the DPUB call next week.
> >
> > -----Original Message-----
> > From: AUDRAIN LUC [mailto:LAUDRAIN@hachette-livre.fr]
> > Sent: Tuesday, April 08, 2014 4:12 AM
> > To: Ivan Herman
> > Cc: Bill Kasdorf; W3C Digital Publishing IG
> > Subject: Re: schema.org and ONIX...
> >
> > Hi Ivan and Bill,
> >
> > That's a very good exercise and I will share thoughts with Bill at
> London Book Fair if possible.
> > I'm really interested as I'm wondering what it will bring for more
> ebooks discoverability on the Web beyond the ONIX feeds we provide already
> to distributors and digital bookstores.
> >
> > Best,
> > Luc
> >
> >
> >> Le 8 avr. 2014 à 05:24, "Ivan Herman" <ivan@w3.org> a écrit :
> >>
> >> Bill,
> >>
> >> I am currently at a Linked Data Workshop at a conference in Seoul,
> which had a keynote from R. Guha, who is, in some sense, the "father" of
> schema.org. Listening to him (combining also with my past experience),
> and also referring to the note I sent around earlier this morning[1] I am
> more and more serious in thinking that a stripped-down version of ONIX
> defined in schema.org might be a great idea. Of course, we have to see
> whether there is a business interest and business case for this: is there a
> use case for publishers as well as for search engines? But if the answer is
> yes on both, than this may be an important thing to do.
> >>
> >> I do know Guha personally relatively well, as well as Dan Brickley, who
> is the other person running schema.org's vocabulary development. I would
> be happy to make the links and go into the discussions but, of course, the
> question is whether publishers, as well as institutions like Bowker, would
> be interested by something like that. I think that clarifying this, ie, set
> up the use cases, would be perfectly in line with the IG's charter
> (although we probably would have to spawn a different group to make the
> specification itself, but that is all right.)
> >>
> >> What do you think?
> >>
> >> Ivan
> >>
> >> [1]
> http://www.publishersweekly.com/pw/by-topic/international/london-book-fair/article/61722-london-book-fair-2014-publishers-and-internet-standards.html
> >>
> >> ----
> >> Ivan Herman, W3C
> >> Digital Publishing Activity Lead
> >> Home: http://www.w3.org/People/Ivan/
> >> mobile: +31-641044153
> >> GPG: 0x343F1A3D
> >> FOAF: http://www.ivan-herman.net/foaf
> >>
> >>
> >>
> >>
> >>
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> FOAF: http://www.ivan-herman.net/foaf
>
>
>
>
>
>
> This may contain confidential material. If you are not an intended
> recipient, please notify the sender, delete immediately, and understand
> that no disclosure or reliance on the information herein is permitted.
> Hachette Book Group may monitor email to and from our network.
>
>
Received on Friday, 11 April 2014 13:59:31 UTC