Re: Completing schema:article

On Fri, Nov 1, 2013 at 8:37 AM, Karen Coyle <kcoyle@kcoyle.net> wrote:
> On 11/1/13 8:04 AM, Dan Scott wrote:
>
>>
>> The range of http://schema.org/datePublished is
>> http://schema.org/Date, which is an ISO 8601 date; "2013-11-01" or
>> "2013-W40" or the like.
>
> Yes, although of course ranges in schema are "suggestions". It would be
> interesting to look at the actual use of this property and see what
> percentage actually conform to ISO 8601. (My guess: not many, unless it's
> just a year.) In any case, I don't think it makes sense to add a new date
> property since one already exists. But maybe that's a more philosophical
> question for the public-vocabs list.

I would really rather figure it out here, first, and come to a solid,
well thought-out consensus. It's easy to throw questions around. It's
much harder to create solid solutions.

> Although, as you note below, things like "Summer, 1985" could also be
> considered issues. I think we'll have to assume that some folks might use
> one property, some might use the other.
>

As I noted below, a "datePublished" property is quite distinct from
the identifier of a particular issue. If you take a look at citations
in the wild (https://owl.english.purdue.edu/owl/resource/747/12/ for
example), you'll see that the publication date might consist solely of
a year, but there are still volume and edition identifiers. The
citation for "International Journal of Sustainable Development & World
Ecology" just lists 2007, for example, but as we can see at
http://www.tandfonline.com/loi/tsdw20?open=14&repitition=0#vol_14
there were six issues published in 2007. And the publisher does not
provide any more granularity than that for the date. This journal is
not particularly unusual in that approach.

This is why I believe that datePublished is fine for the publication
date, but that to properly support the use of schema:citation for
articles, we need to provide some mechanism for individually
identifying issues. I also think that the kind of faceting that Thad
demonstrated with the Custom Search Engine testing tool would benefit
from granularity in the volumes & issues. I think we have a couple of
options:

On Periodical, define:
* issueIdentifier - identifies the issue of this periodical if it does
not use issueVolume or issueNumber (for example, "Winter 2007")
* issueVolume - identifies the volume of this periodical (for example,
"X" or "21")
* issueNumber - identifies the issue of this periodical (for example,
"iii" or "2")

Or on Periodical, define:
* issueIdentifer - range: Issue (or Issuance, or PeriodicalIssue, tbd)
- identifies the issue of this periodical

Issue: Identifies an individual issue of a Periodical, often with a
volume and issue number. If the identifier is a simple string such as
"Winter 2007", then use the name property to identify the issue.
* issueVolume - identifies the volume of this periodical (for example,
"X" or "21")
* issueNumber - identifies the issue of this periodical (for example,
"iii" or "2")

>
>>
>>> I realize that some (many?) publication patterns are more complex than
>>> that,
>>> but somehow we've managed with these few in most systems for a good long
>>> time, and most people seem to have some understanding of what they mean.
>>> I
>>> don't think we can take it one more level without creating great
>>> confusion.
>>>
>>> 2) I don't see a particular need for the intervening "issuance" level.
>>> Date,
>>> volume and number should do it.
>>
>>
>> I think that is and should be distinct from the purpose of an issue
>> identifier which is more likely to be used in a citation. So if we go
>> ahead and define schema:Issue, it could normally contain an
>> issueVolume and issueNumber, but optionally could fall back to plain
>> text (or schema:name?) like "Summer 2014" if we don't feel the need to
>> provide an additional catch-all property.
>>
>>> 3) are you thinking that this the idea?
>>>
>>> <periodical (or some such term)
>>> <scholarlyArticle>
>>>     <author>
>>>     <name>
>>>     <Journal>
>>>        <name>
>>>        <issn>
>>>        <publishedDate>
>>>        <volume>
>>>        <number>
>>>     <pages>
>>>
>>> Or would the outer wrapper be the journal (or whatever we call it), with
>>> the
>>> article within that? (That makes sense to me, but is generally the
>>> opposite
>>> of citation formats and displays.)
>>
>>
>> There are at least two different use cases. One use case is "here's
>> everything we know about <Time magazine> or <Laurentian University
>> student newspaper>", listing all of the issues and articles contained
>> in those issues and linking to the articles where feasible. I can
>> imagine search engines and discovery layers greedily gobbling that up.
>
>
> Isn't this the area of "holdings" that we deferred because we all mostly
> hate the idea of dealing with serials?

We deferred serials holdings from the discussion of Holdings-as-Offers
because we did not want to bog down what was otherwise a working
recommendation that we could stake down as "done/good enough" (at
least for now) with a related but separate problem area that we knew
was going to be thornier.

> AFAIK, the only source of "everything
> we know about..." would be library data. (I believe that CONSER has a
> database of publication patterns for different journals, but I don't know if
> it includes a listing of all of the known volumes/issues.)

In many cases, the publishers themselves know a lot (see the Taylor &
Francis link above), and could augment the mechanical issues/articles
lists that they already publish with structured data fairly easily
_if_ we help them with a reasonable set of types & properties and
provide a reasonable pattern to follow. (Do we have any periodical
publishers on this list? That would be fantastic!) The obvious
motivation for publishers would be to drive more traffic to their "Pay
us $$ to access a copy of this article..." business model. Heh.

I'm not going to address the citation use case in this email because
this email is already long enough and satisfying the citation use case
probably deserves to be its own separate thread.

Received on Friday, 1 November 2013 16:18:56 UTC