Re: Why we want to have separate Periodical and (Periodical)Issu(e|ance) types from Dan Scott on 2013-11-22 (public-schemabibex@w3.org from November 2013)

From: Dan Scott <denials@gmail.com>
Date: Fri, 22 Nov 2013 17:06:34 -0500
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: "public-schemabibex@w3.org" <public-schemabibex@w3.org>
Message-ID: <CAAY5AM2QNYwQNnHy__2D+ZkhVcn5boopm=CkeziizFyO23NKsg@mail.gmail.com>
On Fri, Nov 22, 2013 at 1:16 PM, Karen Coyle <kcoyle@kcoyle.net> wrote:
> I have replied privately to Dan on this, but to save everyone else time, you
> do not have to explain linked data or RDF to me. I already know.

My sincere apologies if you felt that I was saying or inferring that
you do not know linked data or RDF.

For what it's worth, I am personally still very much in learning mode
(for example, don't ask me what the other RDFa attributes are outside
of RDFa Lite, except for @about which I always wish was part of RDFa
Lite...). When I typed some RDFa examples earlier today, I was typing
"typeOf" instead of "typeof" and spent more time than I care to admit
trying to figure out why things weren't working as anticipated.

Going through the exercise of marking up the citation in full RDFa
Lite was an approach that draws on my philosophy background of
starting from first principles and building an argument from there. I
found this effective in philosophy because this gives everyone a
chance to critique the flaws at any point in the development of a
given argument. It also helps reinforce my own skills. I thought that,
since we had seemingly reached an impasse about the issue of whether
we need an Issu(e|ance) type or not, adopting a back to basics
approach might help us move forward.

So I do hope that you will share your objections to my last response
with the list so that we can continue this conversation.

Dan

> On 11/22/13 8:43 AM, Dan Scott wrote:
>>
>> On Fri, Nov 22, 2013 at 9:53 AM, Karen Coyle <kcoyle@kcoyle.net> wrote:
>>>
>>>
>>>
>>> On 11/21/13 5:42 PM, Dan Scott wrote:
>>>
>>>>
>>>> Yes, you have mentioned this a number of times now. As I said on the
>>>> call, we're working with structured data. One benefit lies in being
>>>> able to define Periodical as an entity in and of itself, then refer to
>>>> it from the separate issues, instead of repeating the core Periodical
>>>> information in each instance of an issue (and worse, in each instance
>>>> of an article in each instance of a Periodical). If you refer to two
>>>> separate issues of the same periodical on the same page, and you
>>>> haven't broken Periodical out separately from Issue, then you have to
>>>> repeat all that core Periodical information with slightly different
>>>> volume / number / date information. You could determine that they're
>>>> the same Periodical by comparing their ISSN and name, I suppose, but
>>>> that seems like a very twisted and artificial way to achieve what
>>>> should be a very basic operation.
>>>
>>>
>>>
>>> I honestly don't see this as a mark-up use case. So I would like to see
>>> an
>>> example (preferably of a real web page) where this type of structuring
>>> would
>>> be used in the mark-up.
>>
>>
>> I have included such an example in the Periodical proposal from the
>> very start; see "Example 1: A list of the issues of a given
>> periodical, and the articles that were published in each issue." The
>> example uses an ellipsis to indicate that further articles would
>> follow. Perhaps I need to make that clear.
>>
>> For another example, On November 1 I wrote: "One use case is "here's
>> everything we know about <Time magazine> or <Laurentian University
>> student newspaper>", listing all of the issues and articles contained
>> in those issues and linking to the articles where feasible. I can
>> imagine search engines and discovery layers greedily gobbling that
>> up."
>> (http://lists.w3.org/Archives/Public/public-schemabibex/2013Nov/0005.html)
>>
>> And for another example that points at a real web page, in a separate
>> email on November 1, I wrote: "The citation for "International Journal
>> of Sustainable Development & World Ecology" just lists 2007, for
>> example, but as we can see at
>> http://www.tandfonline.com/loi/tsdw20?open=14&repitition=0#vol_14
>> there were six issues published in 2007. [...] In many cases, the
>> publishers themselves know a lot (see the Taylor & Francis link
>> above), and could augment the mechanical issues/articles lists that
>> they already publish with structured data fairly easily _if_ we help
>> them with a reasonable set of types & properties and provide a
>> reasonable pattern to follow. (Do we have any periodical publishers on
>> this list? That would be fantastic!) The obvious motivation for
>> publishers would be to drive more traffic to their "Pay us $$ to
>> access a copy of this article..." business model. Heh."
>> (http://lists.w3.org/Archives/Public/public-schemabibex/2013Nov/0006.html)
>>
>>> I do not see a problem with having some repetition in marking up a page
>>> like:
>>
>>
>> It's not a problem if someone chooses to repeat the information. It's
>> a problem if we _force_ them to repeat the information because we
>> failed to give them a way to cleanly and rationally provide structure
>> for their data.
>>
>>>
>>> Le Boeuf, P. (2012). Foreword. Cataloging & Classification Quarterly,
>>> 50(5-7), 355–359. doi:10.1080/01639374.2012.682001
>>>
>>> MADISON, Olivia M.A. The origins of the IFLA study on Functional
>>> Requirements for Bibliographic Records. In: LE BŒUF, Patrick. Ed.
>>> Functional
>>> Requirements for Bibliographic Records (FRBR): Hype, or Cure-All? .
>>> Binghamton, NY: the Haworth Press, 2005.
>>>
>>> Le Boeuf, P. (2005).Musical Works in the FRBR Model or "Quasi la Stessa
>>> Cosa": Variations on a Theme by Umberto Eco. Cataloging & Classification
>>> Quarterly, 39(3-4), 103-124. doi:10.1080/01639374.2012.682001
>>>
>>> Schmidt, R. (2012). Composing in Real Time: Jazz Performances as “Works”
>>> in
>>> the FRBR Model. Cataloging & Classification Quarterly, 50(5-7), 653–669.
>>> doi:10.1080/01639374.2012.68160
>>>
>>> Are you saying that you feel a need to have "Cataloging & Classification
>>> Quarterly, 50(5-7)" coded in only a single entry on that page? I am
>>> assuming
>>> that each entry stands alone, and it needs to be marked up something like
>>> (and, yes, this is very pseudo-codey):
>>>
>>> Article
>>>    author "Le Boeuf, P."
>>>    name "Foreward"
>>>    Periodical
>>>      name "Cataloging..."
>>>      volume "50"
>>>      issue "5-7"
>>>    pages "355-359"
>>>    id "doi:10.1080/01639374.2012.682001"
>>
>>
>> Note that you have inlined "author" here as a text value, where a more
>> structured approach would inline or link to a Person or Organization.
>> That's because schema.org supports structured data. But processors do
>> accept that humans will sometimes be lazy or confused and will do
>> their best to deal with non-structured data.
>>
>> The use case for separating issue/volume out of Periodical is
>> absolutely parallel. I want to support non-lazy, clear-thinking
>> development of systems. So in actual RDFa Lite markup, that would look
>> something like:
>>
>> <div vocab="http://schema.org/" typeof="Article">
>>    <span property="author" typeof="Author"><link property="url"
>> href="http://viaf.org/viaf/22193216" /><span property="name">Le Boeuf,
>> P.</span></span>
>>    (<span property="datePublished">2005</span>).
>>    <span property="name">Musical Works in the FRBR Model or "Quasi la
>> Stessa Cosa": Variations on a Theme by Umberto Eco</span>.
>>    <span property="partOfIssue" typeof="Issuance">
>>      <link property="url"
>> href="http://www.tandfonline.com/toc/wccq20/39/3-4" />
>>      <span property="partOfPeriodical" typeof="Periodical">
>>        <link property="url" href="http://www.tandfonline.com/loi/wccq20"
>> />
>>        <span property="name">Cataloging &amp; Classification
>> Quarterly</span>,
>>      </span>
>>      <span property="issueVolume">39</span>(<span
>> property="issueNumber">3-4</span>), <span
>> property="pagination">103-124</span>.
>>    </span>
>>    <a property="url"
>>
>> href="http://dx.doi.org/10.1080/01639374.2012.682001">doi:10.1080/01639374.2012.682001</a>
>> </div>
>>
>> Notice:
>>
>> * The "author" property is a full-fledged Person with a link to an
>> authoritative URL. Linked data, win!
>> * The periodical property is a full-fledged Periodical with a link to
>> a URL (not going to say it is authoritative, but certainly works as an
>> identifier). Linked data, win!
>> * The "partOfIssue" property is a full-fledged Issuance that is, in
>> this case, described inline and links to
>> http://www.tandfonline.com/toc/wccq20/39/3-4 for a URL. (Oh, hey look,
>> publishers feel that it's important to separate out and describe their
>> issues onto separate web pages!). Linked data win! We could also link
>> to http://catalogingandclassificationquarterly.com/ccq39nr3-4.html via
>> sameAs but that just ends up being a table of contents (aside: seems
>> like schema.org could use a property and type(s) for ToCs) so it's not
>> a very satisfying place to link (maybe for "description"...)
>> * We've turned that DOI into a clickable link and authoritative URL
>> for the article.
>>
>> And the markup works. Toss it into Google's Structured Data Testing
>> Tool or http://rdfa.info/play. Now that I've taken the time to mark it
>> up and test it, I'll add that to the examples in the proposal so that
>> we can knock off the "enhanced citation" use case.
>>
>>> And in the case where the page represents an issue with, say, its table
>>> of
>>> contents, then:
>>>
>>> Periodical
>>>    name
>>>    volume
>>>    issue
>>>    date
>>>      Article1
>>>        author...
>>>      Article2
>>>        author
>>>
>>> Which tells me that we don't have a hierarchical structure between
>>> Periodical and Article, but have two things that can be used together in
>>> various ways.
>>
>>
>> Except of course "issue" can simply be a single Issu(e|ance) type that
>> wraps all of the Article types. Bump issue and everything over to the
>> right one level and it works quite nicely.
>>
>> I agree that we have things (but not just two) that can be used
>> together in various ways. As meaningless as that statement is, I'll
>> try to make it concrete--the Periodical proposal had the ability for
>> an Article to link directly to the containing Periodical for those
>> arxiv.org use cases that require only two types before yesterday's
>> call began. But the Periodical proposal also supports more traditional
>> periodical relationships as well.
>>
>>> (This also helps the case where there is an article that is
>>> not associated with a periodical. One of the examples above is an article
>>> reprinted in a book.)
>>
>>
>> Okay, then, let's think that example through. By adopting the
>> "Collection" proposal [1], as we agreed to do on the call, Article can
>> use its new "isPartOf" relationship that it will inherit from
>> CreativeWork to point at the Book for that exact use case.
>>
>>> The structure we need to address is that of the web
>>> page, which very well may be repetitive.
>>
>>
>> I disagree. We need to add structured data to the web page that
>> enables search engines and other schema.org processors to do
>> intelligent things with the data on the page. The data might repeat on
>> a given web page, but being able to give it a @resource ID in RDFa and
>> refer to that thereafter, or use microdata's @itemref / @itemid
>> mechanisms, or point to authoritative URLs, will enable the processors
>> to say "Ah, okay, so this article and this article belong to the same
>> issue in the same periodical, and I know from crawling these other
>> pages that these are all the other issues for this same periodical,
>> and now I can do intelligent things with this data like generate my
>> own much more usable table of contents with links to the open-access
>> versions of these articles that I know about from crawling
>> institutional repositories..." etc.
>>
>> 1. http://www.w3.org/community/schemabibex/wiki/Collection
>>
>
> --
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet
>
Received on Friday, 22 November 2013 22:07:04 UTC