- From: Young,Jeff (OR) <jyoung@oclc.org>
- Date: Sat, 6 Jul 2013 15:32:58 +0000
- To: "<vls@tusco.net>" <vls@tusco.net>
- CC: "kcoyle@kcoyle.net" <kcoyle@kcoyle.net>, "public-schemabibex@w3.org" <public-schemabibex@w3.org>, "Wallis,Richard" <Richard.Wallis@oclc.org>, "David.Newman@wellsfargo.com" <David.Newman@wellsfargo.com>, "Godby,Jean" <godby@oclc.org>, "em@zepheira.com" <em@zepheira.com>
- Message-ID: <71E70FA5-9643-4C6E-AE08-68479B722C5D@oclc.org>
Note that there is a collection of examples that has been started that are library-centric:
http://www.w3.org/community/schemabibex/wiki/Examples/mylib
Jeff
Sent from my iPad
On Jul 6, 2013, at 10:53 AM, "Tom Adamich" <vls@tusco.net<mailto:vls@tusco.net>> wrote:
Thanks, Karen, for leading this discussion back to the "library-centric"
mission of both SchemaBibEx and BIBFRAME. Yes, the metadata has the
potential to be leveraged in other environments (including commercial
enterprise); however, I agree with your request to remain on task and
reminding us of the timeframe associated with this group's work.
...Lead on:)
Tom
Tom Adamich, MLS
President
Visiting Librarian Service
P.O. Box 932
New Philadelphia, OH 44663
330-364-4410
vls@tusco.net<mailto:vls@tusco.net>
-----Original Message-----
From: Karen Coyle [mailto:kcoyle@kcoyle.net]
Sent: Friday, July 05, 2013 5:35 PM
To: public-schemabibex@w3.org<mailto:public-schemabibex@w3.org>
Subject: Re: Kill the Record! (Was: BIBFRAME and schema.org<http://schema.org>)
Corey, I share your fear about over-engineering. I tend to put use of
productOntology in that category, though, because examples I've seen make
use of greater detail than I think we currently represent in library data
online -- and I'm not convinced that more detail is needed.
Users seem to care about whether something is print, online, or on disk
(DVD, CD). We've started mixing books and articles (print and online) in our
discovery systems, and users seem comfortable with that. I suspect that they
favor "can I get it now?" as a primary selection criterion.
Hardback and paperback? Not so much.
This is why I'd like to understand better what publishers need, since they
have a different use case: different versions and formats have different
prices, and they need to show that. For a library, I doubt if "paperback"
and "hardback" are deciding selection factors for users.
When I see examples that have these in them it is a bit jarring, especially
since that data isn't reliably coded in our records.
I would prefer to initially base schema.org<http://schema.org> thinking on library
*displays* rather than library *records*. It's rather astonishing how little
of what is coded in MARC ends up on the screen in the basic user displays,
as well as how little of it feeds indexing. I second an earlier comment by
Ed Summers that we should concentrate on what we can do today with
schema.org<http://schema.org>, and add to it as library data online undergoes changes that
require new capabilities. Current displays are a place to start, and once we
have conquered those we can move on. Remember, this group is supposed to
disband in Fall of 2013.
Thus, once again, can we look at holdings displays and come up with a
reasonable solution? I think that schema.org<http://schema.org> has a good 90% or more of what
we need for basic bibliographic description. But getting users to library
holdings isn't yet covered.
kc
On 7/5/13 1:16 PM, Corey A Harper wrote:
Hi Karen,
I take your point, and agree that it's really a question of what we
intend to convey. I just worry very much that this group has been
inclined to over-engineer much of this, and as a result will render it
not very useful to anyone outside of a very small group -- ostensibly
the same very small group that are perfectly comfortable with MARC now.
If that's what we're trying to do, then honestly, my vote becomes to
just stick with MARC -- we don't gain much if we decide to build
something new from whole cloth instead of looking seriously at the
patterns that others--those we want to work with--are already using.
That said, I checked some schema.org<http://schema.org> <http://schema.org> deployments of
books (kmart & B&N) and found no product typing at all, so it could be
that common usage hasn't been established yet.
I agree re: availability of statistics. I suspect we may have to rely on
ourselves for that. I often mention commoncrawl here, but will again, as
they make 40 TB worth of data from over 5 billion web pages available,
have it hosted on AWS, and even provide tutorials for running EC2 Map
Reduce jobs against it:
http://aws.amazon.com/datasets/41740
http://commoncrawl.org/mapreduce-for-the-masses/
I suspect searching for the productontology.org<http://productontology.org>
<http://productontology.org> prefix somewhere in microdata or rdfa
across the full set would probably cost a couple hundred bucks on EC2,
though. If someone had 40TB of space kicking around in a hadoop cluster
of their own, though....
My gut feeling, regardless, is that YES, we should use that "Monographic
Series" article, as well as others. If we make this a prominent usage
pattern, I believe the library community will spend the time cleaning
these articles up, and adding new ones where there are gaps. Perhaps in
the process we make both WikiPedia AND the Product Ontology AND
schema.org<http://schema.org> <http://schema.org> better than they are now.
-Corey
On Fri, Jul 5, 2013 at 3:01 PM, Karen Coyle <kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net>> wrote:
Cory, I don't think that what I propose is "non-conforming." I think
we need to make choices amongst the conforming ones. I assume that
we will be making some kind of cross-walk from library data to
schema.org<http://schema.org> <http://schema.org>, and that best practice will be that
coded format x (e.g. from the LDR or 007 in MARC) will have a
defined value in schema.org<http://schema.org> <http://schema.org> that means
approximately the same thing. Do we choose "paperback", "mass
paperback" or just "book"? It really is a question of what we intend
to convey with the schema.org<http://schema.org> <http://schema.org> data, what we see
it linking to most usefully, what is most accurate, and what is
going to be easiest to produce.
As an example, if you look at that list on WP you see that it has
"book series", which is primarily what libraries would call
"readers' series" - Harry Potter, "A is for Alibi...," "Narnia",
etc. So although it says "series" it isn't the same as what is in an
8XX field. There IS an article for "monographic series". The
monographic series article is pretty piss-poor, however, and needs a
serious amount of work. Should we use it as is? Does it represent
the same concept as the 8XX fields?
I love WP, I do, but there's a great variation in the quality of the
pages. Nothing on WP can be taken at face value - we need to be
smart about it, and even pro-active, if we are to take WP links to
be *definitional* of our data elements. I'm not comfortable with
assuming that any page on WP is by definition authoritative. (I'm in
the midst of a huge revision of the DDC pages which were TOTALLY
inaccurate, so this is something I'm painfully aware of at the
moment.) In addition, we will have to make choices when WP divides
the world differently from us.
Finally, although productontology is available for use, it isn't the
only possibility. I know that Jeff favors it, but we need to keep an
eye on practice to see if it becomes standard practice, and if it is
used by search engines. I hope that some statistics will be
available that provide guidance.
kc
On 7/5/13 10:57 AM, Corey A Harper wrote:
Hi Karen,
Can you say a bit more about "I'm not convinced, having looked
at some
of the pages, that WP shares the conceptual model that we'll
find in our
data."? I'm not sure I understand what problems you foresee, nor
what
you believe the ramifications of those problems to be.
I struggle with the idea that "..we then need to develop some best
practices for library data, knowing that non-library data will
take its
own direction." I'm rather averse to maintaining our own little,
non-conforming corner of the Web without a really clear
understanding of
the impact--on users--of this perceived conceptual
incompatibility.
Thanks,
-Corey
On Fri, Jul 5, 2013 at 1:47 PM, Karen Coyle <kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>> wrote:
Yes, Jeff, I realize that. I had rather hoped for a link
that you
had found useful for books, like:
http://en.wikipedia.org/wiki/____Category:Books_by_type
<http://en.wikipedia.org/wiki/__Category:Books_by_type>
<http://en.wikipedia.org/wiki/__Category:Books_by_type
<http://en.wikipedia.org/wiki/Category:Books_by_type>>
Naturally, this is a mish-mosh of physical types (paperback),
product types (mass-market paperback), genres (airport
novel) and
topics (book size). I don't know if there is a better
approach
within WP.
While it is great that these Wikipedia pages exist, I think
before
using them we should look beyond their titles to the
content of the
pages to make sure that WP and our metadata are talking
about the
same thing. I'm not convinced, having looked at some of the
pages,
that WP shares the conceptual model that we'll find in our
data.
With that as a starting point, we then need to develop some
best
practices for library data, knowing that non-library data
will take
its own direction.
I would like to hear from anyone in the publishing
community about
their needs for specification of product types. I assume
that the
preferred list would original in ONIX.
kc
On 7/5/13 8:50 AM, Young,Jeff (OR) wrote:
You can think of the option like this: Anything in
Wikipedia can be
treated as an owl:Class by changing the URI prefix. For
example,
this
Wikipedia page describes murals:
http://en.wikipedia.org/wiki/____Mural
<http://en.wikipedia.org/wiki/__Mural>
<http://en.wikipedia.org/wiki/__Mural
<http://en.wikipedia.org/wiki/Mural>>
In contrast, you can say something *is* a mural by
using this
hacked URI
in an rdf:type:
http://www.productontology.____org/id/Mural
<http://www.productontology.__org/id/Mural
<http://www.productontology.org/id/Mural>>
Jeff
Sent from my iPad
On Jul 5, 2013, at 11:42 AM, "Karen Coyle"
<kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>
<mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>>> wrote:
What are the options provided by productontology?
kc
On 7/5/13 8:26 AM, Young,Jeff (OR) wrote:
True. This list has always seemed simplistic to
me,
though. As you've
suggested, EBook in particular deserves to be
treated as
a class so
more detailed properties can be included. The
other two
are just the
tip if the iceberg.
Sent from my iPad
On Jul 5, 2013, at 11:20 AM, "Karen Coyle"
<kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>
<mailto:kcoyle@kcoyle.net
<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net
<mailto:kcoyle@kcoyle.net>>>>
wrote:
Note that schema.org<http://schema.org> <http://schema.org>
<http://schema.org>
<http://schema.org> has
http://schema.org/____BookFormatType
<http://schema.org/__BookFormatType>
<http://schema.org/__BookFormatType
<http://schema.org/BookFormatType>>, which has
Ebook
Hardback
Paperback
kc
On 7/5/13 7:43 AM, Young,Jeff (OR) wrote:
For paperbacks and similar things, I've
started
using Product Ontology
to tag the item/manifestation
descriptions for
example:
@prefix schema: <http://schema.org/> .
@prefix pto:
<http://www.productontology.____org/id/
<http://www.productontology.__org/id/
<http://www.productontology.org/id/>>> .
:book1
a schema:Book, schema:ProductModel,
pto:Paperback ;
etc.
The coverage isn't perfect, but it has
the
advantage of being backed up
by Wikipedia.
Jeff
Sent from my iPad
On Jul 5, 2013, at 10:35 AM, "Ross
Singer"
<rxs@talis.com<mailto:rxs@talis.com> <mailto:rxs@talis.com>
<mailto:rxs@talis.com <mailto:rxs@talis.com>>
<mailto:rxs@talis.com
<mailto:rxs@talis.com> <mailto:rxs@talis.com
<mailto:rxs@talis.com>>>
<mailto:rxs@talis.com
<mailto:rxs@talis.com> <mailto:rxs@talis.com
<mailto:rxs@talis.com>>>>
wrote:
On Jul 5, 2013, at 10:25 AM,
"Young,Jeff
(OR)" <jyoung@oclc.org<mailto:jyoung@oclc.org>
<mailto:jyoung@oclc.org> <mailto:jyoung@oclc.org
<mailto:jyoung@oclc.org>>
<mailto:jyoung@oclc.org
<mailto:jyoung@oclc.org>
<mailto:jyoung@oclc.org
<mailto:jyoung@oclc.org>>>
<mailto:jyoung@oclc.org
<mailto:jyoung@oclc.org>
<mailto:jyoung@oclc.org
<mailto:jyoung@oclc.org>>>> wrote:
Aside, I would argue that the
defining
characteristic of Item is that
it has "location". For physical
items
that location can be determined
by geolocation (for example).
For Web
items (aka Web documents), the
location can be determined by
its URL.
+1
I would say there are arguably more
defining
characteristics than that
(I'm still going to argue that
"paperback"
isn't actually a part of
the manifestation, simply an
inference of
the sum of the format of the
items), but this, I would argue, is
definitely the least common
denominator and applies well for
our entity
model in schema.org<http://schema.org>
<http://schema.org> <http://schema.org>
<http://schema.org>
<http://schema.org>.
-Ross.
Jeff
Sent from my iPad
On Jul 5, 2013, at 9:55 AM, "Ross
Singer" <rxs@talis.com<mailto:rxs@talis.com>
<mailto:rxs@talis.com>
<mailto:rxs@talis.com
<mailto:rxs@talis.com>>
<mailto:rxs@talis.com
<mailto:rxs@talis.com>
<mailto:rxs@talis.com
<mailto:rxs@talis.com>>>
<mailto:rxs@talis.com
<mailto:rxs@talis.com>
<mailto:rxs@talis.com
<mailto:rxs@talis.com>>>> wrote:
But this all really how
many angels
can fit on the head of a pin,
isn't it?
We've already established
that we're
not interested in defining
any
strict interpretation of
FRBR in
schema.org<http://schema.org> <http://schema.org> <http://schema.org>
<http://schema.org>
<http://schema.org/>:
we're just trying to define
a way to
describe things in HTML that
computers can parse.
Yes, I think we need to
establish
what an item is, no I don't
think
we have to use FRBR as a
strict guide.
-Ross.
On Jul 5, 2013, at 8:51 AM,
James
Weinheimer
<weinheimer.jim.l@gmail.com<mailto:weinheimer.jim.l@gmail.com>
<mailto:weinheimer.jim.l@gmail.com>
<mailto:weinheimer.jim.l@__gmail.com
<mailto:weinheimer.jim.l@gmail.com>>
<mailto:weinheimer.jim.l@
<mailto:weinheimer.jim.l@>__gma__il.com <http://gmail.com>
<mailto:weinheimer.jim.l@__gmail.com
<mailto:weinheimer.jim.l@gmail.com>>>
<mailto:weinheimer.jim.l@
<mailto:weinheimer.jim.l@>__gma__il.com <http://gmail.com>
<mailto:weinheimer.jim.l@__gmail.com
<mailto:weinheimer.jim.l@gmail.com>>>> wrote:
On 05/07/2013 13:30,
Ross Singer
wrote:
<snip>
I guess I don't
understand
why offering epub,
pdf, and html
versions of the same
resource doesn't
constitute
"items".
If you look at an
article in
arxiv.org<http://arxiv.org> <http://arxiv.org> <http://arxiv.org>
<http://arxiv.org>
<http://arxiv.org/>, for
example, where else
in WEMI
would you put the
available file
formats?
Basically, format
should be
tied to the item,
although for
physical items, any
manifestation's
item will
generally be the
same format (although
I
don't see why a
scan of a
paperback would
become a new
endeavor,
honestly).
In the end, I don't
see how
digital is any
different
than print in
this regard.
</snip>
Because manifestations
are
defined by their format
(among other
things). Therefore, a
movie of,
e.g. Moby Dick that is a
videocassette is
considered to
be a different
manifestation from
that of a DVD. Each one
is
described separately.
So, if you
have
multiple copies of the
same
format for the same
content
those are
called copies. But if
you have
different formats for
the same
content, those are
different
manifestations.
The examples in
arxiv.org<http://arxiv.org> <http://arxiv.org>
<http://arxiv.org>
<http://arxiv.org>
<http://arxiv.org/> are
just like I
mentioned in
archive.org<http://archive.org> <http://archive.org>
<http://archive.org>
<http://archive.org>
<http://archive.org/>
and they
follow a
different sort of
structure. You
do not see this in a
library
catalog, where each
format will
get a different
manifestation, so
that each format can be
described.
As a result, things
work quite
differently. Look for
e.g. Moby Dick
in Worldcat, and you
will see
all kinds of formats
available
in the
left-hand column.
https://www.worldcat.org/____search?qt=worldcat_org_all&q=____moby+dick
<https://www.worldcat.org/__search?qt=worldcat_org_all&q=__moby+dick>
<https://www.worldcat.org/__search?qt=worldcat_org_all&q=__moby+dick
<https://www.worldcat.org/search?qt=worldcat_org_all&q=moby+dick>>
When you click on an
individual
record,
http://www.worldcat.org/oclc/____62208367
<http://www.worldcat.org/oclc/__62208367>
<http://www.worldcat.org/oclc/__62208367
<http://www.worldcat.org/oclc/62208367>>
you will see where all
of the
copies of this
particular format
of this particular
expression are
located. This is the
manifestation. And its
purpose
is to organize
all of the *copies*, as
is done
here.
In the IA, we see
something
different:
http://archive.org/details/____mobydickorwhale02melvuoft
<http://archive.org/details/__mobydickorwhale02melvuoft>
<http://archive.org/details/__mobydickorwhale02melvuoft
<http://archive.org/details/mobydickorwhale02melvuoft>>,
where this
display brings together
the
different
manifestations: pdf, text,
etc. There is no
corresponding
concept in FRBR for
what we see in
the Internet Archive, or
in
arxiv.org<http://arxiv.org> <http://arxiv.org> <http://arxiv.org>
<http://arxiv.org>
<http://arxiv.org/>.
I am not complaining or
finding
fault, but what I am
saying is that
the primary reason this
sort of
thing works for digital
materials
is because there are no
real
"duplicates". (There
are other
serious
problems that I won't
mention
here) In my opinion,
introducing the
Internet Archive-type
structure
into a library-type
catalog based
on physical materials
with
multitudes of copies
would
result in a
completely incoherent
hash.
This is why I am saying
that
FRBR does not translate
well to
digital materials on
the internet.
Getting rid of the
concept of
the "record" has been
the supposed
remedy, but it seems to
me that
the final result (i.e.
what the
user will experience)
will still
be the incoherent mash
I mentioned
above: where
innumerable items
and multiple
manifestations will be
mashed together. Perhaps
somebody could come up
with a
way to make
this coherent and
useful, but I
have never seen
anything like it
and cannot imagine how
it could
work.
--
*James Weinheimer*
weinheimer.jim.l@gmail.com<mailto:weinheimer.jim.l@gmail.com> <mailto:weinheimer.jim.l@gmail.com>
<mailto:weinheimer.jim.l@__gmail.com
<mailto:weinheimer.jim.l@gmail.com>>
<mailto:weinheimer.jim.l@
<mailto:weinheimer.jim.l@>__gma__il.com <http://gmail.com>
<mailto:weinheimer.jim.l@__gmail.com
<mailto:weinheimer.jim.l@gmail.com>>>
*First Thus*
http://catalogingmatters.__blo__gspot.com/ <http://blogspot.com/>
<http://catalogingmatters.__blogspot.com/
<http://catalogingmatters.blogspot.com/>>
*First Thus Facebook
Page*
https://www.facebook.com/____FirstThus
<https://www.facebook.com/__FirstThus>
<https://www.facebook.com/__FirstThus
<https://www.facebook.com/FirstThus>>
*Cooperative Cataloging
Rules*
http://sites.google.com/site/____opencatalogingrules/
<http://sites.google.com/site/__opencatalogingrules/>
<http://sites.google.com/site/__opencatalogingrules/
<http://sites.google.com/site/opencatalogingrules/>>
*Cataloging Matters
Podcasts*
http://blog.jweinheimer.net/p/____cataloging-matters-podcasts.____html
<http://blog.jweinheimer.net/p/__cataloging-matters-podcasts.__html>
<http://blog.jweinheimer.net/__p/cataloging-matters-podcasts.__html
<http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html>>
--
Karen Coyle
kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>
<mailto:kcoyle@kcoyle.net
<mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net
<mailto:kcoyle@kcoyle.net>>> http://kcoyle.net
ph: 1-510-540-7596 <tel:1-510-540-7596>
<tel:1-510-540-7596 <tel:1-510-540-7596>>
m: 1-510-435-8234 <tel:1-510-435-8234>
<tel:1-510-435-8234 <tel:1-510-435-8234>>
skype: kcoylenet
--
Karen Coyle
kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>
<mailto:kcoyle@kcoyle.net
<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net
<mailto:kcoyle@kcoyle.net>>>
http://kcoyle.net
ph: 1-510-540-7596 <tel:1-510-540-7596>
<tel:1-510-540-7596 <tel:1-510-540-7596>>
m: 1-510-435-8234 <tel:1-510-435-8234>
<tel:1-510-435-8234 <tel:1-510-435-8234>>
skype: kcoylenet
--
Karen Coyle
kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net>
<mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>
http://kcoyle.net
ph: 1-510-540-7596 <tel:1-510-540-7596> <tel:1-510-540-7596
<tel:1-510-540-7596>>
m: 1-510-435-8234 <tel:1-510-435-8234> <tel:1-510-435-8234
<tel:1-510-435-8234>>
skype: kcoylenet
--
Corey A Harper
Metadata Services Librarian
New York University Libraries
20 Cooper Square, 3rd Floor
New York, NY 10003-7112
212.998.2479 <tel:212.998.2479>
corey.harper@nyu.edu<mailto:corey.harper@nyu.edu> <mailto:corey.harper@nyu.edu>
<mailto:corey.harper@nyu.edu <mailto:corey.harper@nyu.edu>>
--
Karen Coyle
kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> <mailto:kcoyle@kcoyle.net> http://kcoyle.net
ph: 1-510-540-7596 <tel:1-510-540-7596>
m: 1-510-435-8234 <tel:1-510-435-8234>
skype: kcoylenet
--
Corey A Harper
Metadata Services Librarian
New York University Libraries
20 Cooper Square, 3rd Floor
New York, NY 10003-7112
212.998.2479 <tel:212.998.2479>
corey.harper@nyu.edu<mailto:corey.harper@nyu.edu> <mailto:corey.harper@nyu.edu>
--
Karen Coyle
kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Saturday, 6 July 2013 15:33:46 UTC