Re: Content-Carrier Proposal

Hi Ed,
On 13/02/2013 09:44, "Ed Summers" <ehs@pobox.com> wrote:

> I'm not sure if it is still relevant, but I took a look at the
> Content-Carrier proposal. I am uncomfortable with the idea of
> introducing the additionalType schema.org property for Microdata
> output. Do we know for a fact that itemtype can't take multiple URIs,
> similar to RDFa's typeof? If that is the case I would prefer seeing if
> we can get Microdata changed (hey it's a living standard) rather than
> introducing additionalType, which will put us on a slippery slope of
> making generic Microdata <-> RDF tooling more difficult.

In principle I agree with you - it is a kludge of a solution and Microdata
could be improved by the ability to support multiple type URIs.

In practice, this is something that the the public-vocabs list covered when
additionalType was proposed.  If you want to raise it again, that is the
place, or do it in the Microdata community and then recommend that
additionalType is deprecated in Schema when successful.

Pragmatically, it is the 'current' way in Schema to indicate multiple types
for Microdata so serves the purpose.  So we can use it - the resultant
recommended mark-up can be deprecated when the Microdata spec changes.


> 
> Also, I would like to see the proposal outline which actual types are
> needed that are not currently present in Schema.org. Perhaps there
> some types that are missing from a bibliographic perspective, and it
> would be useful to have them in Schema.org? If so lets get them into
> the proposal. A wholesale import of carrier types from MARC-land is
> something we should try to avoid however.

The issue, from my point of view, is that the list of possible types is
large and evolving.  Identifying which types are missing is a classic
'proving a negative' task.

As you suggest, Schema.org, by their adoption of external enumerations, have
demonstrated they would not want to manage such a list.  If someone wants to
step up to manage a comprehensive authority for these and publish them in an
easily consumable way that could be useful.

> If OCLC cares about full
> fidelity transformation between MARC and RDFa/Microdata I suggest they
> put up their own type vocabulary at oclc.org and start using it. I
> don't think it belongs at schema.org.

The original 'Library' extension proposal that accompanied the OCLC WorldCat
linked data release last year, highlighted some of the carrier types
(catalogued by libraries which contribute records to WorldCat) that were
missing from Schema.  I am confident that that proposal will be superseded
by recommendations from this group.

I see the proposal we are discussing here as having the objective of
enabling the description of types relevant to a data publisher, now and in
the future, by the linking to authoritative descriptions from one or more
sources in a way that can be understood by the search engines committed to
Schema.org.

> I mean Kit, really? :-)

I could not possibly comment on libraries who need to reference types such
Kit in their cataloguing processes ;-)

> 
> Lastly, the Product Types Ontology definitely seems useful, but can't
> we use it already?

Yes.

> Why does schema.org need to change?

It doesn't.  Which is why this proposal is one for recommended practice for
our community - not changes to Schema.  From the proposal: " In that way
this approach also has the benefit of not requiring any extensions to
Schema.org."

> I personally
> feel that if alignment with Wikipedia is really what you want then

I am identifying a source(s) of type URIs that are maintainable by someone
and recognisable by the organisations behind Schema.

There will obviously be several sources for such identifiers (mime-types,
Marc types, etc.) available both inside the library community and on the
wider web - I hope we are not suggesting that librarians are the only
community capable of describing types of things.

If someone describing their data can not find an authoritative identifier
for a type, they stand a good chance of finding one via Product Ontology in
Wikipedia - if the feel they can enhance the description they can do that in
Wikipedia, if they cannot find one they can create one.

The elegant feature of productontoloy.org is, that if you interrogate the
URIs it returns, it identifies the resource as a sub-type of
http://schema.org/Product which allows you to use it as-is in a typeof or
additionalType.

> Wikidata URIs will prove to be more useful in the long run as it is
> mainlined into Wikipedia, e.g. http://www.wikidata.org/wiki/Q199769
> instead of http://www.productontology.org/id/Laser_printer ... In
> principle listing the Product Type Ontology as one source of possible
> additional types seems fine.

Good

> But I would much prefer us talk about
> what types need to be in Schema.org that aren't already present,

I think it is clear that Schema.org has no desire to maintain such lists of
types - I get the impression that BookFormatType was a step too far for some
;-)

I believe that this is a wider discussion about authoritative type
identifiers, their curation and maintenance which is for the library
community in general Obviously of interest to this group and those involved
in BIBFRAME in particular.

> and
> for us to document best practices for using schema.org as a vocabulary
> for bibliographic data.

This is what this proposal is - a recommendation for best practice using
what is available now.

~Richard
> 
> So I guess that's a -1 from me on the Content-Carrier proposal as it stands.
> 
> //Ed
> 
> [1] http://www.w3.org/community/schemabibex/wiki/Content-Carrier
> 
> On Thu, Feb 7, 2013 at 11:36 PM, Karen Coyle <kcoyle@kcoyle.net> wrote:
>> All,
>> 
>> It turns out that I was confusing music and movies. Music cataloging does
>> consider the score and each performance different expressions of the same
>> work. Movie cataloging considers the screenplay and the movie to be
>> different works -- there's an example in an LC training document [1]. I
>> don't think this changes our approach. We still have the option of hanging
>> audiobook off of book or making it a separate entry under creativeWork. If
>> we hang it off of book then we also will need to add it to the value list
>> http://schema.org/BookFormatType which so far has
>>  - eBook
>>  - hardcover
>>  - paperback
>> 
>> Then there is the question of abridgement. I don't think this is the
>> information intended for Version in creativeWork, but the definition of that
>> is quite terse. In /Book there is "bookEdition" which I suppose could
>> contain this information. Note that there are abridged versions of texts as
>> well as audio readings, but there are also many other types of information
>> that one could consider for editions, like annotated editions, illustrated
>> editions, adapted versions (e.g. for children), etc.
>> 
>> I looked into ONIX, although I can't guarantee that I found all the right
>> places in that complex documentation. If you haven't ever looked at the ONIX
>> code lists -- do take a look. [2] ONIX makes MARC look abbreviated. In the
>> set of lists for ONIX for Books, list 81 is the "Product type" and
>> "Audiobook" is one of those product types. So is "game" "musical recording"
>> "software" and other things that go beyond books... so I'm not sure this
>> helps us.
>> 
>> 
>> We also will need a property for the narrator/reader. ONIX uses "Read by"
>> rather than "Narrator", although the audible.com site displays "Narrator" so
>> that's what I'm used to seeing. I think we should use what ONIX uses since
>> that will be what the audiobook providers are used to. Plus, it seems clear.
>> I did not find a specific term in the MARC relator list.[3]
>> 
>> I will draft up "audiobook" under creativeWork and see how it goes. If we
>> don't like it, we can try the other way.
>> 
>> kc
>> 
>> p.s. And we haven't even gotten to the hard stuff yet: serials! EEEK!
>> 
>> [1]
>> http://www.loc.gov/marc/marc-functional-analysis/multiple-versions.html#engli
>> sh-patient
>> [2] http://www.editeur.org/ONIX/book/codelists/current.html
>> [3] http://loc.gov/marc/relators/relaterm.html
>> 
>> 
>> On 2/7/13 9:09 AM, Karen Coyle wrote:
>>> 
>>> Difference between an audiobook and a book or ebook is the same as the
>>> difference between a recording of a symphony and the printed score for
>>> that symphony. The audiobook is a performance; it has a performer; it
>>> has a separate copyright; it may be abridged; other liberties may have
>>> been taken. An ebook is a new carrier for the same text as the paper
>>> book. It (presumably) has the same words (and thus same ISTC), same
>>> copyright, same list of creators. I see book/ebook as a classic
>>> content/carrier difference. I see book/audiobook as a larger difference
>>> than a carrier change.
>>> 
>>> I believe that music folks would consider a score and a performance to
>>> be different FRBR:Works. Two different performances would be different
>>> expressions. However, audiobook is probably the same Work in the minds
>>> of most users, albeit different expressions. So calling it both a "Book"
>>> and an "Audiobook" makes sense to me. But it will need *at least* one
>>> additional field for performer. It turns out that in public libraries
>>> (and on audiobook sites online) users are as interested in the performer
>>> as they are the actual author of the text. There are folks who would
>>> listen to a grocery list if it were read by Simon Prebble ;-).
>>> 
>>> kc
>>> 
>>> On 2/7/13 7:52 AM, Richard Wallis wrote:
>>>> 
>>>> Karen,
>>>> 
>>>> I don't think it is a format property we are talking about.  I donšt
>>>> think it is about the arbitrary separation of attributes in to Content
>>>> or Carrier
>>>> 
>>>> We are trying, in this approach, to identify the sum of basic types of
>>>> thing that the composite thing we are describing is.
>>>> 
>>>> So sticking with our example of an audiobook in WMA format on a CD :
>>>> 
>>>>   * It is a CreativeWork
>>>>   * It may be considered a Book
>>>>   * It is an AudioBook
>>>>   * It is WMA
>>>>   * It is a CD
>>>>   * It has the attributes of a MediaObject
>>>> 
>>>> 
>>>> Summing together the properties you get from picking one of those as the
>>>> main type (some might choose CD, others Audiobook, or Book ­ all valid
>>>> ways to describe our thing) and adding the remainder as additionalType
>>>> properties.   Which elements are then not available to describe it that
>>>> you think are missing?
>>>> 
>>>> You may be right that an audiobook is something that deserves its own
>>>> sub-type of Book ­ in which case does Ebook?  Or do we just recommend a
>>>> new BookFormatType - the current Schema answer for Ebook is to do just
>>>> that which delivers no extra properties to describe the Ebook specific
>>>> attributes.
>>>> 
>>>> ~Richard.
>>>> 
>>>> 
>>>> 
>>>> On 07/02/2013 13:30, "Karen Coyle" <kcoyle@kcoyle.net> wrote:
>>>> 
>>>>> I'm fine with tossing in a whole list of "types", but I don't see what
>>>>> this has to do with content/carrier if it can contain both. So maybe
>>>>> what we're talking about here, instead, is a more general "format"? And
>>>>> it would include "book" "picture book" "large print" "MP3" "movie"
>>>>> "BlueRay" "Operetta" "Map" and whatever else? If so, I would rename the
>>>>> page to reflect that.
>>>>> 
>>>>> Also, audio book is going to need some very specific data elements that
>>>>> we don't have yet in schema.org. So I still maintain that audiobook is
>>>>> its own thing, not just an additional format on metadata for a book.
>>>>> 
>>>>> kc
>>>>> 
>>>>> On 2/7/13 4:39 AM, Laura Dawson wrote:
>>>>>> 
>>>>>> This is essentially how it is accomplished in ONIX as well. There's a
>>>>>> series of composite tags that can describe the "format" quite
>>>>>> adequately.
>>>>>> 
>>>>>> From: Richard Wallis <richard.wallis@oclc.org
>>>>>> <mailto:richard.wallis@oclc.org>>
>>>>>> Date: Thursday, February 7, 2013 5:27 AM
>>>>>> To: Karen Coyle <kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>,
>>>>>> <public-schemabibex@w3.org <mailto:public-schemabibex@w3.org>>
>>>>>> Subject: Re: Content-Carrier Proposal
>>>>>> Resent-From: <public-schemabibex@w3.org
>>>>>> <mailto:public-schemabibex@w3.org>>
>>>>>> Resent-Date: Thu, 07 Feb 2013 10:29:17 +0000
>>>>>> 
>>>>>> Re: Content-Carrier Proposal
>>>>>> Sticking with the Product Ontology approach for a moment ­ an audiobook
>>>>>> in WMA on a cd would just be a combination of multiple types thus:
>>>>>> 
>>>>>> http://schema.org/Book
>>>>>>     additionalType:http://www.productontology.org/id/Audiobook
>>>>>>     additionalType:http://www.productontology.org/id/
>>>>>> Windows_Media_Audio
>>>>>>     additionalType:http://www.productontology.org/id/ Compact_Disc
>>>>>> 
>>>>>> The sub-types of MeadiaObject, as you suggest, may also be fertile
>>>>>> ground for other types to combine. So by adding:
>>>>>> 
>>>>>>     additionalType:http://schema.org/ MeadiaObject
>>>>>> 
>>>>>> To the example above, you could utilise the duration, region, etc.
>>>>>> properties that come with it to helpfully expand the description.
>>>>>> 
>>>>>> I think part of the issue is the natural [librarian] urge to identify
>>>>>> what is content and what is carrier.  In some of the examples we are
>>>>>> discussing there are three or more elements ­ audiobook, mp3, CD ­
>>>>>> film,
>>>>>> iso file, DVD ­ resulting in confusion about what to do with the middle
>>>>>> ones.  Personally I believe trying to enforce that categorisation of
>>>>>> attributes is not helpful.   MP3, paperback, European region DRM
>>>>>> protected, DVD, punched card, Kindle format, and/or a box set are all,
>>>>>> often, cumulative attributes of equal weight and importance.
>>>>>> 
>>>>>> Within the library metadata community, deciding what are content vs
>>>>>> what
>>>>>> are carrier attributes has been a topic of of much, often inconclusive,
>>>>>> discussion that surfaces as each new format, device or encoding
>>>>>> emerges.
>>>>>>   I get the feeling that whatever is decided, the rest of the world
>>>>>> just
>>>>>> treats them as attributes of the thing.  Libraries have used these
>>>>>> categorisations to help them build [facets in] user interfaces, which
>>>>>> they could continue to do based on their local practices, but without
>>>>>> enforcing that view on the non-library consumers of bib data.
>>>>>> 
>>>>>> So what I am trying to say in my long-winded way is that I donšt
>>>>>> believe
>>>>>> we need content/carrier specific properties adding to Schema.org types
>>>>>> to adequately describe these features.  We can achieve the same by
>>>>>> using
>>>>>> the additionalType property, combining schema types onto CreativeWork
>>>>>> sub-types, and external types such as those sourced from
>>>>>> productontology.org, to build a description of the thing in question.
>>>>>> 
>>>>>> ~Richard.
>>>>>> 
>>>>>> On 05/02/2013 19:25, "Karen Coyle" <kcoyle@kcoyle.net> wrote:
>>>>>> 
>>>>>>     I've looked again at the content-carrier proposal and I believe
>>>>>> that it
>>>>>>     confounds content and carrier, so maybe we need a bit more
>>>>>>     clarification.
>>>>>> 
>>>>>>     The proposal uses "audiobook on CD" for carrier. Clearly, however,
>>>>>>     "audiobook" is a creative work with producers, a reader (very
>>>>>> important
>>>>>>     - audio book readers are becoming famed for their performances),
>>>>>> a date
>>>>>>     of creation, not to mention information like "abridged/un
>>>>>> abridged" and
>>>>>>     separate copyrights. An audiobook can have a number of carriers,
>>>>>>     including being digital in WMA or MP3 format, with or without
>>>>>>     specific DRM.
>>>>>> 
>>>>>>     Carrier needs to be defined much like mime types -- very strictly
>>>>>>     limited to the physical form or digital encoding of the content,
>>>>>> but not
>>>>>>     the content genre. If this makes sense to folks, then perhaps we
>>>>>> can
>>>>>>     come up with a shared definition and some examples.
>>>>>> 
>>>>>>     The difficulty, as I see it, is with the combination of physical
>>>>>> carrier
>>>>>>     ("Compact Disc") and encoding ("MP3 w. Overdrive DRM"). To what
>>>>>> extent
>>>>>>     can we make assumptions that a "CD" is a "CD" for all purposes? For
>>>>>>     example, with DVDs, there are those horrid region codes that you
>>>>>> must
>>>>>>     specify or people don't know if they can play the DVD in their
>>>>>> player.
>>>>>>     So "DVD" alone does not define the encoded DRM; instead, there
>>>>>> are two
>>>>>>     parts: physical carrier (DVD) and encoding (region-limited DRM).
>>>>>> Or I
>>>>>>     can copy a large file to DVD that is a .iso file. Are these both
>>>>>>     carrier?
>>>>>> 
>>>>>>     We might want to look at the sub-types of
>>>>>> http://schema.org/MediaObject
>>>>>> 
>>>>>>     These appear to be intended only for online/embedded media, but
>>>>>> probably
>>>>>>     have some overlap with our case.
>>>>>> 
>>>>>>     kc
>>>>>> 
>>>>>>     On 2/4/13 4:22 AM, Ivan Herman wrote:
>>>>>>> 
>>>>>>> Richard,
>>>>>>> 
>>>>>>> as also discussed off-line, I changed the microdata/RDFa coding a
>>>>>>> bit. The
>>>>>>> previous solution in microdata was
>>>>>>> 
>>>>>>> <span property="additionalType" href="..." >
>>>>>>> 
>>>>>>> but that is invalid HTML5 (@href can appear on <link> and <a> elements
>>>>>>> only). I added <link> to the encoding instead (microdata allows the
>>>>>>> usage of
>>>>>>> <link> anywhere, not only in the header).
>>>>>>> 
>>>>>>> I have also changed the RDFa part to be more in line with that
>>>>>>> version of
>>>>>>> microdata by folding the type specification into @typeof directly
>>>>>>> (RDFa
>>>>>>> allows that, the usage of explicit rdf:type or
>>>>>>> schema:additionalType is,
>>>>>>> though correct, unnecessary...)
>>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> Ivan
>>>>>>> 
>>>>>>> On Feb 2, 2013, at 22:04 , Richard Wallis <richard.wallis@oclc.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I have just added a Content-Carrier proposal to the Wiki.
>>>>>>>> 
>>>>>>>> It does not propose extension of the vocabulary as such, but I
>>>>>>>> have linked
>>>>>>>> it from the Vocabulary Proposals page
>>>>>>>> <http://www.w3.org/community/schemabibex/wiki/Vocabulary_Proposals>
>>>>>> 
>>>>>>     as it is a proposal as to a recommended way to apply the current
>>>>>>     vocabulary to address an issue that concerns this group.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ~Richard.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ----
>>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>> Home:http://www.w3.org/People/Ivan/
>>>>>>> mobile: +31-641044153
>>>>>>> FOAF:http://www.ivan-herman.net/foaf.rdf
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>     --
>>>>>>     Karen Coyle
>>>>>> kcoyle@kcoyle.net http://kcoyle.net
>>>>>>     ph: 1-510-540-7596
>>>>>>     m: 1-510-435-8234
>>>>>>     skype: kcoylenet
>>>>>> 
>>>>>> 
>>>>>> 
>>> 
>> 
>> --
>> Karen Coyle
>> kcoyle@kcoyle.net http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
>> 
> 
> 

Received on Wednesday, 13 February 2013 11:58:00 UTC