- From: Young,Jeff (OR) <jyoung@oclc.org>
- Date: Sat, 6 Jul 2013 15:32:58 +0000
- To: "<vls@tusco.net>" <vls@tusco.net>
- CC: "kcoyle@kcoyle.net" <kcoyle@kcoyle.net>, "public-schemabibex@w3.org" <public-schemabibex@w3.org>, "Wallis,Richard" <Richard.Wallis@oclc.org>, "David.Newman@wellsfargo.com" <David.Newman@wellsfargo.com>, "Godby,Jean" <godby@oclc.org>, "em@zepheira.com" <em@zepheira.com>
- Message-ID: <71E70FA5-9643-4C6E-AE08-68479B722C5D@oclc.org>
Note that there is a collection of examples that has been started that are library-centric: http://www.w3.org/community/schemabibex/wiki/Examples/mylib Jeff Sent from my iPad On Jul 6, 2013, at 10:53 AM, "Tom Adamich" <vls@tusco.net<mailto:vls@tusco.net>> wrote: Thanks, Karen, for leading this discussion back to the "library-centric" mission of both SchemaBibEx and BIBFRAME. Yes, the metadata has the potential to be leveraged in other environments (including commercial enterprise); however, I agree with your request to remain on task and reminding us of the timeframe associated with this group's work. ...Lead on:) Tom Tom Adamich, MLS President Visiting Librarian Service P.O. Box 932 New Philadelphia, OH 44663 330-364-4410 vls@tusco.net<mailto:vls@tusco.net> -----Original Message----- From: Karen Coyle [mailto:kcoyle@kcoyle.net] Sent: Friday, July 05, 2013 5:35 PM To: public-schemabibex@w3.org<mailto:public-schemabibex@w3.org> Subject: Re: Kill the Record! (Was: BIBFRAME and schema.org<http://schema.org>) Corey, I share your fear about over-engineering. I tend to put use of productOntology in that category, though, because examples I've seen make use of greater detail than I think we currently represent in library data online -- and I'm not convinced that more detail is needed. Users seem to care about whether something is print, online, or on disk (DVD, CD). We've started mixing books and articles (print and online) in our discovery systems, and users seem comfortable with that. I suspect that they favor "can I get it now?" as a primary selection criterion. Hardback and paperback? Not so much. This is why I'd like to understand better what publishers need, since they have a different use case: different versions and formats have different prices, and they need to show that. For a library, I doubt if "paperback" and "hardback" are deciding selection factors for users. When I see examples that have these in them it is a bit jarring, especially since that data isn't reliably coded in our records. I would prefer to initially base schema.org<http://schema.org> thinking on library *displays* rather than library *records*. It's rather astonishing how little of what is coded in MARC ends up on the screen in the basic user displays, as well as how little of it feeds indexing. I second an earlier comment by Ed Summers that we should concentrate on what we can do today with schema.org<http://schema.org>, and add to it as library data online undergoes changes that require new capabilities. Current displays are a place to start, and once we have conquered those we can move on. Remember, this group is supposed to disband in Fall of 2013. Thus, once again, can we look at holdings displays and come up with a reasonable solution? I think that schema.org<http://schema.org> has a good 90% or more of what we need for basic bibliographic description. But getting users to library holdings isn't yet covered. kc On 7/5/13 1:16 PM, Corey A Harper wrote: Hi Karen, I take your point, and agree that it's really a question of what we intend to convey. I just worry very much that this group has been inclined to over-engineer much of this, and as a result will render it not very useful to anyone outside of a very small group -- ostensibly the same very small group that are perfectly comfortable with MARC now. If that's what we're trying to do, then honestly, my vote becomes to just stick with MARC -- we don't gain much if we decide to build something new from whole cloth instead of looking seriously at the patterns that others--those we want to work with--are already using. That said, I checked some schema.org<http://schema.org> <http://schema.org> deployments of books (kmart & B&N) and found no product typing at all, so it could be that common usage hasn't been established yet. I agree re: availability of statistics. I suspect we may have to rely on ourselves for that. I often mention commoncrawl here, but will again, as they make 40 TB worth of data from over 5 billion web pages available, have it hosted on AWS, and even provide tutorials for running EC2 Map Reduce jobs against it: http://aws.amazon.com/datasets/41740 http://commoncrawl.org/mapreduce-for-the-masses/ I suspect searching for the productontology.org<http://productontology.org> <http://productontology.org> prefix somewhere in microdata or rdfa across the full set would probably cost a couple hundred bucks on EC2, though. If someone had 40TB of space kicking around in a hadoop cluster of their own, though.... My gut feeling, regardless, is that YES, we should use that "Monographic Series" article, as well as others. If we make this a prominent usage pattern, I believe the library community will spend the time cleaning these articles up, and adding new ones where there are gaps. Perhaps in the process we make both WikiPedia AND the Product Ontology AND schema.org<http://schema.org> <http://schema.org> better than they are now. -Corey On Fri, Jul 5, 2013 at 3:01 PM, Karen Coyle <kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net>> wrote: Cory, I don't think that what I propose is "non-conforming." I think we need to make choices amongst the conforming ones. I assume that we will be making some kind of cross-walk from library data to schema.org<http://schema.org> <http://schema.org>, and that best practice will be that coded format x (e.g. from the LDR or 007 in MARC) will have a defined value in schema.org<http://schema.org> <http://schema.org> that means approximately the same thing. Do we choose "paperback", "mass paperback" or just "book"? It really is a question of what we intend to convey with the schema.org<http://schema.org> <http://schema.org> data, what we see it linking to most usefully, what is most accurate, and what is going to be easiest to produce. As an example, if you look at that list on WP you see that it has "book series", which is primarily what libraries would call "readers' series" - Harry Potter, "A is for Alibi...," "Narnia", etc. So although it says "series" it isn't the same as what is in an 8XX field. There IS an article for "monographic series". The monographic series article is pretty piss-poor, however, and needs a serious amount of work. Should we use it as is? Does it represent the same concept as the 8XX fields? I love WP, I do, but there's a great variation in the quality of the pages. Nothing on WP can be taken at face value - we need to be smart about it, and even pro-active, if we are to take WP links to be *definitional* of our data elements. I'm not comfortable with assuming that any page on WP is by definition authoritative. (I'm in the midst of a huge revision of the DDC pages which were TOTALLY inaccurate, so this is something I'm painfully aware of at the moment.) In addition, we will have to make choices when WP divides the world differently from us. Finally, although productontology is available for use, it isn't the only possibility. I know that Jeff favors it, but we need to keep an eye on practice to see if it becomes standard practice, and if it is used by search engines. I hope that some statistics will be available that provide guidance. kc On 7/5/13 10:57 AM, Corey A Harper wrote: Hi Karen, Can you say a bit more about "I'm not convinced, having looked at some of the pages, that WP shares the conceptual model that we'll find in our data."? I'm not sure I understand what problems you foresee, nor what you believe the ramifications of those problems to be. I struggle with the idea that "..we then need to develop some best practices for library data, knowing that non-library data will take its own direction." I'm rather averse to maintaining our own little, non-conforming corner of the Web without a really clear understanding of the impact--on users--of this perceived conceptual incompatibility. Thanks, -Corey On Fri, Jul 5, 2013 at 1:47 PM, Karen Coyle <kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>> wrote: Yes, Jeff, I realize that. I had rather hoped for a link that you had found useful for books, like: http://en.wikipedia.org/wiki/____Category:Books_by_type <http://en.wikipedia.org/wiki/__Category:Books_by_type> <http://en.wikipedia.org/wiki/__Category:Books_by_type <http://en.wikipedia.org/wiki/Category:Books_by_type>> Naturally, this is a mish-mosh of physical types (paperback), product types (mass-market paperback), genres (airport novel) and topics (book size). I don't know if there is a better approach within WP. While it is great that these Wikipedia pages exist, I think before using them we should look beyond their titles to the content of the pages to make sure that WP and our metadata are talking about the same thing. I'm not convinced, having looked at some of the pages, that WP shares the conceptual model that we'll find in our data. With that as a starting point, we then need to develop some best practices for library data, knowing that non-library data will take its own direction. I would like to hear from anyone in the publishing community about their needs for specification of product types. I assume that the preferred list would original in ONIX. kc On 7/5/13 8:50 AM, Young,Jeff (OR) wrote: You can think of the option like this: Anything in Wikipedia can be treated as an owl:Class by changing the URI prefix. For example, this Wikipedia page describes murals: http://en.wikipedia.org/wiki/____Mural <http://en.wikipedia.org/wiki/__Mural> <http://en.wikipedia.org/wiki/__Mural <http://en.wikipedia.org/wiki/Mural>> In contrast, you can say something *is* a mural by using this hacked URI in an rdf:type: http://www.productontology.____org/id/Mural <http://www.productontology.__org/id/Mural <http://www.productontology.org/id/Mural>> Jeff Sent from my iPad On Jul 5, 2013, at 11:42 AM, "Karen Coyle" <kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>>> wrote: What are the options provided by productontology? kc On 7/5/13 8:26 AM, Young,Jeff (OR) wrote: True. This list has always seemed simplistic to me, though. As you've suggested, EBook in particular deserves to be treated as a class so more detailed properties can be included. The other two are just the tip if the iceberg. Sent from my iPad On Jul 5, 2013, at 11:20 AM, "Karen Coyle" <kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>>> wrote: Note that schema.org<http://schema.org> <http://schema.org> <http://schema.org> <http://schema.org> has http://schema.org/____BookFormatType <http://schema.org/__BookFormatType> <http://schema.org/__BookFormatType <http://schema.org/BookFormatType>>, which has Ebook Hardback Paperback kc On 7/5/13 7:43 AM, Young,Jeff (OR) wrote: For paperbacks and similar things, I've started using Product Ontology to tag the item/manifestation descriptions for example: @prefix schema: <http://schema.org/> . @prefix pto: <http://www.productontology.____org/id/ <http://www.productontology.__org/id/ <http://www.productontology.org/id/>>> . :book1 a schema:Book, schema:ProductModel, pto:Paperback ; etc. The coverage isn't perfect, but it has the advantage of being backed up by Wikipedia. Jeff Sent from my iPad On Jul 5, 2013, at 10:35 AM, "Ross Singer" <rxs@talis.com<mailto:rxs@talis.com> <mailto:rxs@talis.com> <mailto:rxs@talis.com <mailto:rxs@talis.com>> <mailto:rxs@talis.com <mailto:rxs@talis.com> <mailto:rxs@talis.com <mailto:rxs@talis.com>>> <mailto:rxs@talis.com <mailto:rxs@talis.com> <mailto:rxs@talis.com <mailto:rxs@talis.com>>>> wrote: On Jul 5, 2013, at 10:25 AM, "Young,Jeff (OR)" <jyoung@oclc.org<mailto:jyoung@oclc.org> <mailto:jyoung@oclc.org> <mailto:jyoung@oclc.org <mailto:jyoung@oclc.org>> <mailto:jyoung@oclc.org <mailto:jyoung@oclc.org> <mailto:jyoung@oclc.org <mailto:jyoung@oclc.org>>> <mailto:jyoung@oclc.org <mailto:jyoung@oclc.org> <mailto:jyoung@oclc.org <mailto:jyoung@oclc.org>>>> wrote: Aside, I would argue that the defining characteristic of Item is that it has "location". For physical items that location can be determined by geolocation (for example). For Web items (aka Web documents), the location can be determined by its URL. +1 I would say there are arguably more defining characteristics than that (I'm still going to argue that "paperback" isn't actually a part of the manifestation, simply an inference of the sum of the format of the items), but this, I would argue, is definitely the least common denominator and applies well for our entity model in schema.org<http://schema.org> <http://schema.org> <http://schema.org> <http://schema.org> <http://schema.org>. -Ross. Jeff Sent from my iPad On Jul 5, 2013, at 9:55 AM, "Ross Singer" <rxs@talis.com<mailto:rxs@talis.com> <mailto:rxs@talis.com> <mailto:rxs@talis.com <mailto:rxs@talis.com>> <mailto:rxs@talis.com <mailto:rxs@talis.com> <mailto:rxs@talis.com <mailto:rxs@talis.com>>> <mailto:rxs@talis.com <mailto:rxs@talis.com> <mailto:rxs@talis.com <mailto:rxs@talis.com>>>> wrote: But this all really how many angels can fit on the head of a pin, isn't it? We've already established that we're not interested in defining any strict interpretation of FRBR in schema.org<http://schema.org> <http://schema.org> <http://schema.org> <http://schema.org> <http://schema.org/>: we're just trying to define a way to describe things in HTML that computers can parse. Yes, I think we need to establish what an item is, no I don't think we have to use FRBR as a strict guide. -Ross. On Jul 5, 2013, at 8:51 AM, James Weinheimer <weinheimer.jim.l@gmail.com<mailto:weinheimer.jim.l@gmail.com> <mailto:weinheimer.jim.l@gmail.com> <mailto:weinheimer.jim.l@__gmail.com <mailto:weinheimer.jim.l@gmail.com>> <mailto:weinheimer.jim.l@ <mailto:weinheimer.jim.l@>__gma__il.com <http://gmail.com> <mailto:weinheimer.jim.l@__gmail.com <mailto:weinheimer.jim.l@gmail.com>>> <mailto:weinheimer.jim.l@ <mailto:weinheimer.jim.l@>__gma__il.com <http://gmail.com> <mailto:weinheimer.jim.l@__gmail.com <mailto:weinheimer.jim.l@gmail.com>>>> wrote: On 05/07/2013 13:30, Ross Singer wrote: <snip> I guess I don't understand why offering epub, pdf, and html versions of the same resource doesn't constitute "items". If you look at an article in arxiv.org<http://arxiv.org> <http://arxiv.org> <http://arxiv.org> <http://arxiv.org> <http://arxiv.org/>, for example, where else in WEMI would you put the available file formats? Basically, format should be tied to the item, although for physical items, any manifestation's item will generally be the same format (although I don't see why a scan of a paperback would become a new endeavor, honestly). In the end, I don't see how digital is any different than print in this regard. </snip> Because manifestations are defined by their format (among other things). Therefore, a movie of, e.g. Moby Dick that is a videocassette is considered to be a different manifestation from that of a DVD. Each one is described separately. So, if you have multiple copies of the same format for the same content those are called copies. But if you have different formats for the same content, those are different manifestations. The examples in arxiv.org<http://arxiv.org> <http://arxiv.org> <http://arxiv.org> <http://arxiv.org> <http://arxiv.org/> are just like I mentioned in archive.org<http://archive.org> <http://archive.org> <http://archive.org> <http://archive.org> <http://archive.org/> and they follow a different sort of structure. You do not see this in a library catalog, where each format will get a different manifestation, so that each format can be described. As a result, things work quite differently. Look for e.g. Moby Dick in Worldcat, and you will see all kinds of formats available in the left-hand column. https://www.worldcat.org/____search?qt=worldcat_org_all&q=____moby+dick <https://www.worldcat.org/__search?qt=worldcat_org_all&q=__moby+dick> <https://www.worldcat.org/__search?qt=worldcat_org_all&q=__moby+dick <https://www.worldcat.org/search?qt=worldcat_org_all&q=moby+dick>> When you click on an individual record, http://www.worldcat.org/oclc/____62208367 <http://www.worldcat.org/oclc/__62208367> <http://www.worldcat.org/oclc/__62208367 <http://www.worldcat.org/oclc/62208367>> you will see where all of the copies of this particular format of this particular expression are located. This is the manifestation. And its purpose is to organize all of the *copies*, as is done here. In the IA, we see something different: http://archive.org/details/____mobydickorwhale02melvuoft <http://archive.org/details/__mobydickorwhale02melvuoft> <http://archive.org/details/__mobydickorwhale02melvuoft <http://archive.org/details/mobydickorwhale02melvuoft>>, where this display brings together the different manifestations: pdf, text, etc. There is no corresponding concept in FRBR for what we see in the Internet Archive, or in arxiv.org<http://arxiv.org> <http://arxiv.org> <http://arxiv.org> <http://arxiv.org> <http://arxiv.org/>. I am not complaining or finding fault, but what I am saying is that the primary reason this sort of thing works for digital materials is because there are no real "duplicates". (There are other serious problems that I won't mention here) In my opinion, introducing the Internet Archive-type structure into a library-type catalog based on physical materials with multitudes of copies would result in a completely incoherent hash. This is why I am saying that FRBR does not translate well to digital materials on the internet. Getting rid of the concept of the "record" has been the supposed remedy, but it seems to me that the final result (i.e. what the user will experience) will still be the incoherent mash I mentioned above: where innumerable items and multiple manifestations will be mashed together. Perhaps somebody could come up with a way to make this coherent and useful, but I have never seen anything like it and cannot imagine how it could work. -- *James Weinheimer* weinheimer.jim.l@gmail.com<mailto:weinheimer.jim.l@gmail.com> <mailto:weinheimer.jim.l@gmail.com> <mailto:weinheimer.jim.l@__gmail.com <mailto:weinheimer.jim.l@gmail.com>> <mailto:weinheimer.jim.l@ <mailto:weinheimer.jim.l@>__gma__il.com <http://gmail.com> <mailto:weinheimer.jim.l@__gmail.com <mailto:weinheimer.jim.l@gmail.com>>> *First Thus* http://catalogingmatters.__blo__gspot.com/ <http://blogspot.com/> <http://catalogingmatters.__blogspot.com/ <http://catalogingmatters.blogspot.com/>> *First Thus Facebook Page* https://www.facebook.com/____FirstThus <https://www.facebook.com/__FirstThus> <https://www.facebook.com/__FirstThus <https://www.facebook.com/FirstThus>> *Cooperative Cataloging Rules* http://sites.google.com/site/____opencatalogingrules/ <http://sites.google.com/site/__opencatalogingrules/> <http://sites.google.com/site/__opencatalogingrules/ <http://sites.google.com/site/opencatalogingrules/>> *Cataloging Matters Podcasts* http://blog.jweinheimer.net/p/____cataloging-matters-podcasts.____html <http://blog.jweinheimer.net/p/__cataloging-matters-podcasts.__html> <http://blog.jweinheimer.net/__p/cataloging-matters-podcasts.__html <http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html>> -- Karen Coyle kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>> http://kcoyle.net ph: 1-510-540-7596 <tel:1-510-540-7596> <tel:1-510-540-7596 <tel:1-510-540-7596>> m: 1-510-435-8234 <tel:1-510-435-8234> <tel:1-510-435-8234 <tel:1-510-435-8234>> skype: kcoylenet -- Karen Coyle kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>> http://kcoyle.net ph: 1-510-540-7596 <tel:1-510-540-7596> <tel:1-510-540-7596 <tel:1-510-540-7596>> m: 1-510-435-8234 <tel:1-510-435-8234> <tel:1-510-435-8234 <tel:1-510-435-8234>> skype: kcoylenet -- Karen Coyle kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>> http://kcoyle.net ph: 1-510-540-7596 <tel:1-510-540-7596> <tel:1-510-540-7596 <tel:1-510-540-7596>> m: 1-510-435-8234 <tel:1-510-435-8234> <tel:1-510-435-8234 <tel:1-510-435-8234>> skype: kcoylenet -- Corey A Harper Metadata Services Librarian New York University Libraries 20 Cooper Square, 3rd Floor New York, NY 10003-7112 212.998.2479 <tel:212.998.2479> corey.harper@nyu.edu<mailto:corey.harper@nyu.edu> <mailto:corey.harper@nyu.edu> <mailto:corey.harper@nyu.edu <mailto:corey.harper@nyu.edu>> -- Karen Coyle kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net> http://kcoyle.net ph: 1-510-540-7596 <tel:1-510-540-7596> m: 1-510-435-8234 <tel:1-510-435-8234> skype: kcoylenet -- Corey A Harper Metadata Services Librarian New York University Libraries 20 Cooper Square, 3rd Floor New York, NY 10003-7112 212.998.2479 <tel:212.998.2479> corey.harper@nyu.edu<mailto:corey.harper@nyu.edu> <mailto:corey.harper@nyu.edu> -- Karen Coyle kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Received on Saturday, 6 July 2013 15:33:46 UTC