Re: Why we want to have separate Periodical and (Periodical)Issu(e|ance) types from Dan Scott on 2013-11-22 (public-schemabibex@w3.org from November 2013)

From: Dan Scott <denials@gmail.com>
Date: Fri, 22 Nov 2013 11:43:36 -0500
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: "public-schemabibex@w3.org" <public-schemabibex@w3.org>
Message-ID: <CAAY5AM1mJcagyvEbdPW0kxgEZ6hZwdrc43T2TDuF8qk3cg_mnA@mail.gmail.com>
On Fri, Nov 22, 2013 at 9:53 AM, Karen Coyle <kcoyle@kcoyle.net> wrote:
>
>
> On 11/21/13 5:42 PM, Dan Scott wrote:
>
>>
>> Yes, you have mentioned this a number of times now. As I said on the
>> call, we're working with structured data. One benefit lies in being
>> able to define Periodical as an entity in and of itself, then refer to
>> it from the separate issues, instead of repeating the core Periodical
>> information in each instance of an issue (and worse, in each instance
>> of an article in each instance of a Periodical). If you refer to two
>> separate issues of the same periodical on the same page, and you
>> haven't broken Periodical out separately from Issue, then you have to
>> repeat all that core Periodical information with slightly different
>> volume / number / date information. You could determine that they're
>> the same Periodical by comparing their ISSN and name, I suppose, but
>> that seems like a very twisted and artificial way to achieve what
>> should be a very basic operation.
>
>
> I honestly don't see this as a mark-up use case. So I would like to see an
> example (preferably of a real web page) where this type of structuring would
> be used in the mark-up.

I have included such an example in the Periodical proposal from the
very start; see "Example 1: A list of the issues of a given
periodical, and the articles that were published in each issue." The
example uses an ellipsis to indicate that further articles would
follow. Perhaps I need to make that clear.

For another example, On November 1 I wrote: "One use case is "here's
everything we know about <Time magazine> or <Laurentian University
student newspaper>", listing all of the issues and articles contained
in those issues and linking to the articles where feasible. I can
imagine search engines and discovery layers greedily gobbling that
up." (http://lists.w3.org/Archives/Public/public-schemabibex/2013Nov/0005.html)

And for another example that points at a real web page, in a separate
email on November 1, I wrote: "The citation for "International Journal
of Sustainable Development & World Ecology" just lists 2007, for
example, but as we can see at
http://www.tandfonline.com/loi/tsdw20?open=14&repitition=0#vol_14
there were six issues published in 2007. [...] In many cases, the
publishers themselves know a lot (see the Taylor & Francis link
above), and could augment the mechanical issues/articles lists that
they already publish with structured data fairly easily _if_ we help
them with a reasonable set of types & properties and provide a
reasonable pattern to follow. (Do we have any periodical publishers on
this list? That would be fantastic!) The obvious motivation for
publishers would be to drive more traffic to their "Pay us $$ to
access a copy of this article..." business model. Heh."
(http://lists.w3.org/Archives/Public/public-schemabibex/2013Nov/0006.html)

> I do not see a problem with having some repetition in marking up a page
> like:

It's not a problem if someone chooses to repeat the information. It's
a problem if we _force_ them to repeat the information because we
failed to give them a way to cleanly and rationally provide structure
for their data.

>
> Le Boeuf, P. (2012). Foreword. Cataloging & Classification Quarterly,
> 50(5-7), 355–359. doi:10.1080/01639374.2012.682001
>
> MADISON, Olivia M.A. The origins of the IFLA study on Functional
> Requirements for Bibliographic Records. In: LE BŒUF, Patrick. Ed. Functional
> Requirements for Bibliographic Records (FRBR): Hype, or Cure-All? .
> Binghamton, NY: the Haworth Press, 2005.
>
> Le Boeuf, P. (2005).Musical Works in the FRBR Model or "Quasi la Stessa
> Cosa": Variations on a Theme by Umberto Eco. Cataloging & Classification
> Quarterly, 39(3-4), 103-124. doi:10.1080/01639374.2012.682001
>
> Schmidt, R. (2012). Composing in Real Time: Jazz Performances as “Works” in
> the FRBR Model. Cataloging & Classification Quarterly, 50(5-7), 653–669.
> doi:10.1080/01639374.2012.68160
>
> Are you saying that you feel a need to have "Cataloging & Classification
> Quarterly, 50(5-7)" coded in only a single entry on that page? I am assuming
> that each entry stands alone, and it needs to be marked up something like
> (and, yes, this is very pseudo-codey):
>
> Article
>   author "Le Boeuf, P."
>   name "Foreward"
>   Periodical
>     name "Cataloging..."
>     volume "50"
>     issue "5-7"
>   pages "355-359"
>   id "doi:10.1080/01639374.2012.682001"

Note that you have inlined "author" here as a text value, where a more
structured approach would inline or link to a Person or Organization.
That's because schema.org supports structured data. But processors do
accept that humans will sometimes be lazy or confused and will do
their best to deal with non-structured data.

The use case for separating issue/volume out of Periodical is
absolutely parallel. I want to support non-lazy, clear-thinking
development of systems. So in actual RDFa Lite markup, that would look
something like:

<div vocab="http://schema.org/" typeof="Article">
  <span property="author" typeof="Author"><link property="url"
href="http://viaf.org/viaf/22193216" /><span property="name">Le Boeuf,
P.</span></span>
  (<span property="datePublished">2005</span>).
  <span property="name">Musical Works in the FRBR Model or "Quasi la
Stessa Cosa": Variations on a Theme by Umberto Eco</span>.
  <span property="partOfIssue" typeof="Issuance">
    <link property="url" href="http://www.tandfonline.com/toc/wccq20/39/3-4" />
    <span property="partOfPeriodical" typeof="Periodical">
      <link property="url" href="http://www.tandfonline.com/loi/wccq20" />
      <span property="name">Cataloging &amp; Classification Quarterly</span>,
    </span>
    <span property="issueVolume">39</span>(<span
property="issueNumber">3-4</span>), <span
property="pagination">103-124</span>.
  </span>
  <a property="url"
href="http://dx.doi.org/10.1080/01639374.2012.682001">doi:10.1080/01639374.2012.682001</a>
</div>

Notice:

* The "author" property is a full-fledged Person with a link to an
authoritative URL. Linked data, win!
* The periodical property is a full-fledged Periodical with a link to
a URL (not going to say it is authoritative, but certainly works as an
identifier). Linked data, win!
* The "partOfIssue" property is a full-fledged Issuance that is, in
this case, described inline and links to
http://www.tandfonline.com/toc/wccq20/39/3-4 for a URL. (Oh, hey look,
publishers feel that it's important to separate out and describe their
issues onto separate web pages!). Linked data win! We could also link
to http://catalogingandclassificationquarterly.com/ccq39nr3-4.html via
sameAs but that just ends up being a table of contents (aside: seems
like schema.org could use a property and type(s) for ToCs) so it's not
a very satisfying place to link (maybe for "description"...)
* We've turned that DOI into a clickable link and authoritative URL
for the article.

And the markup works. Toss it into Google's Structured Data Testing
Tool or http://rdfa.info/play. Now that I've taken the time to mark it
up and test it, I'll add that to the examples in the proposal so that
we can knock off the "enhanced citation" use case.

> And in the case where the page represents an issue with, say, its table of
> contents, then:
>
> Periodical
>   name
>   volume
>   issue
>   date
>     Article1
>       author...
>     Article2
>       author
>
> Which tells me that we don't have a hierarchical structure between
> Periodical and Article, but have two things that can be used together in
> various ways.

Except of course "issue" can simply be a single Issu(e|ance) type that
wraps all of the Article types. Bump issue and everything over to the
right one level and it works quite nicely.

I agree that we have things (but not just two) that can be used
together in various ways. As meaningless as that statement is, I'll
try to make it concrete--the Periodical proposal had the ability for
an Article to link directly to the containing Periodical for those
arxiv.org use cases that require only two types before yesterday's
call began. But the Periodical proposal also supports more traditional
periodical relationships as well.

> (This also helps the case where there is an article that is
> not associated with a periodical. One of the examples above is an article
> reprinted in a book.)

Okay, then, let's think that example through. By adopting the
"Collection" proposal [1], as we agreed to do on the call, Article can
use its new "isPartOf" relationship that it will inherit from
CreativeWork to point at the Book for that exact use case.

> The structure we need to address is that of the web
> page, which very well may be repetitive.

I disagree. We need to add structured data to the web page that
enables search engines and other schema.org processors to do
intelligent things with the data on the page. The data might repeat on
a given web page, but being able to give it a @resource ID in RDFa and
refer to that thereafter, or use microdata's @itemref / @itemid
mechanisms, or point to authoritative URLs, will enable the processors
to say "Ah, okay, so this article and this article belong to the same
issue in the same periodical, and I know from crawling these other
pages that these are all the other issues for this same periodical,
and now I can do intelligent things with this data like generate my
own much more usable table of contents with links to the open-access
versions of these articles that I know about from crawling
institutional repositories..." etc.

1. http://www.w3.org/community/schemabibex/wiki/Collection
Received on Friday, 22 November 2013 16:44:09 UTC