Re: Comics and periodicals in schema.org (was Re: journal article for next call?) from Henry Andrews on 2013-12-08 (public-schemabibex@w3.org from December 2013)

From: Henry Andrews <hha1@cornell.edu>
Date: Sun, 8 Dec 2013 15:06:07 -0800 (PST)
To: Dan Scott <denials@gmail.com>
Cc: "Olson, Peter" <polson@marvel.com>, "public-schemabibex@w3.org" <public-schemabibex@w3.org>, Henry Andrews <hha1@cornell.edu>
Message-ID: <1386543967.26728.YahooMailNeo@web162606.mail.bf1.yahoo.com>
More replies below!  A quick disclaimer, this thread just appeared in my inbox recently I guess because someone (Peter?) remembered me and cc'd me.  So I'm missing some context and apologize if I lecture folks on things they already know or that have already been worked out.

> From: Dan Scott <denials@gmail.com>
>Subject: Re: Comics and periodicals in schema.org (was Re: journal article for next call?)
> 
>
>On Sat, Dec 7, 2013 at 2:16 AM, Henry Andrews <hha1@cornell.edu> wrote:
>> From: Dan Scott <denials@gmail.com>
>>>On Thu, Dec 5, 2013 at 11:57 PM, Olson, Peter <polson@marvel.com> wrote:


>> This illustrates a point of ongoing discussion at the GCD,
>> which is the idea of treating "periodicals" (i.e. things
>> like U.S. monthly-ish comics, or UK weeklies) vs "albums"
>> (the common European format) vs "books" (collected editions,
>> longer single publications, etc.) in separate ways.  Right
>> now we treat them all the same, primarily based on the U.S.
>> monthly periodical.  Of course, defining what is and isn't
>> a "book" is incredibly contentious, up to and including the
>> question of whether the categories are distinct or overlap.
>> So that one has been going around in circles for years.
>
>Hah! It's good to know that there's no simple solution that we've been
>missing, at least :)


Oh, definitely not :-P

>One (albeit slightly complex) option that schema.org / RDFa offer is
>the ability to mix multiple types, so that (for example) a collected
>edition of comic issues could have a Comic type as its primary type
>and Book as a secondary type, using properties from both. The current
>proposal has a specific "GraphicNovel" type that inherits properties
>from Book and adds in the Comic properties that comes wholecloth from
>the original Comics & Periodicals proposal, but if that section of the
>proposal was rejected by the schema.org partners, the "mix Comic +
>Book types" approach would still let you express what you need, I
>think.


For quite some time at the GCD I championed an approach of defining several attributes, and defining what I called "classifications" as groupings of those attributes.  The idea was that the attributes were clear-cut, and the classifications gave us a way to be more flexible about borders instead of trying for an absolute partition between books and periodicals (at that stage, we hadn't yet agreed that "albums" were a distinct thing).

So you could have something that behaves as both a periodical and a book- for instance the series of Spirit collections, which were sequentially numbered and came out every-so-often (if not regularly) had some periodical-ish attributes.  But in terms of binding and presentation, unquestionably a book.  Although there are book-ish bindings in things that are much more like periodicals ("prestige format" series from the late 80s).

This also helped with European albums, which didn't fit neatly into US-oriented definitions of either book or periodical.  Also, "periodical" was controversial as some people didn't like it being applied to one-shot publications that otherwise fit the "floppy comic book" description.

Other questions included how to distinguish a single issue of something that was meant to be a series but got canceled or re-titled immediately vs something that was truly intended as a one-off (Marvel helpfully sometimes prints "published as a one-shot" in their indicia :-)


Some of this is kind of specific to the GCD in terms of how it all gets presented on the web site- a single-issue series that really should have been a series should show series data (in particular, tracking data is often relevant).  A single issue (whether it looks like a book or not) that was published as a single issue should get a consolidated series/issue page instead of the mess of having two pages for one object.

Anyway, people liked bits and pieces of this idea but the whole thing was deemed too un-intuitive, which for an end-user-oriented indexing system like the GCD is a valid concern.

>> Several of the "Muppet Show" series matched there are collections or otherwise book-like.
>
>Ah, I see - so the particular examples I was looking at were 1.
>serialized across four issues then 2. collected in a trade paperback
>format and 3. collected in a hardcover edition. Got it.

The GCD remains very split on how to handle softcover vs hardcover versions.  Looking at the voting record, there were three votes taken:
* hardcover/softcover versions are variants of the same "issue"
* hardcover/softcover versions are separate "issues" in a single series
* hardcover/softcover versions are not separate "issues" in a single series

All three narrowly failed (this was during my time away from active participation so I don't know the context).  I can understand reservations about all three options.

This also brings up the topic of variants, which is very important in U.S. comics, mostly due to the collector's market.  A given issue is often simultaneously printed with several different covers, possibly with minor variations in content.  If it goes back to press immediately, second and later printings usually have their own covers (often recolored versions of the first printing).

>And the latter
>two are where the Book type properties would come in handy; looking at
>http://www.amazon.ca/Muppet-Show-Comic-Book-Muppets/dp/1934506850, for
>example, "isbn" and "bookFormat" and "numberOfPages" from Book might
>be useful for some, along with the Comic-specific artist / penciler /
>etc properties.

Yes, if we'd gone with classifications, it would have rearranged the display for both data input and browsing.  Underneath we probably wouldn't have put a page count field on the series table, we would have just pulled that and other fields on the issue table up into the series display.  I'm not familiar enough with the system here to say whether that's relevant or not. (although I think I'd like to learn more)

"bookFormat" sounds mildly terrifying if it's anything like our old "format" field.  I may have this completely wrong in which case never mind, but while there are some well-defined formats (mass market paperback) there are others that have extremely loose meanings (trade paperback).  Some are targeted more towards where a bookseller should display them which is interesting data but not always the intuitive meaning of the field.

>> [edit by Henry: this is in regards to the multiple "The Amazing Spider-Man" series.]
>I apologize for explaining myself poorly; what I meant was that the
>ComicSeries proposal includes the note: "At Marvel we use the start
>year as the volume number". So I had expected to see one different
>series listed per year.


I see.  The GCD uses year as the primary distinguisher, but sometimes that isn't sufficient.  Although we don't have a secondary distinguisher in terms of regular display, so, um... it's not common.  It usually means there was a successful mini-series which got picked up as an ongoing but was given a new #1 for marketing reasons.  Or the hardcover/softcover thing.


>> [editing out my ramble on unreliable volume numbers]
>
>Right, in the world of periodicals (both academic and comics) I think
>we have all learned that we cannot rely on anything. However, the core
>part of my question is: does it make sense, as I've laid out in the
>current synthesized proposal at
>http://www.w3.org/community/schemabibex/wiki/Periodicals_and_Comics_synthesis#Comic_Schemata,
>to have a "Comic" type that is separate from the "ComicSeries" type,
>so that we can handle those cases where we have the same title (Comic
>level) with a different volume number (ComicSeries level) than then
>collects one or more issues (ComicIssue level)?


I see what you're getting at here.  I would say "no" with the current definition, as titles are too often repurposed for tentatively related or even unrelated content.  Unless your goal is to track title usage specifically, but then you get weird things like "X-Men" applying to some early part of what's usually thought of as "Uncanny X-Men", plus several titles that ran alongside "Uncanny X-Men" later.  (Do you consider the presence or absence of a leading "The" to distinguish series titles?  Because that frequently comes and goes with alarming irregularity).

>To handle those comics that don't have a volume number, there is a

>direct Comic -> ComicIssue relationship via "hasComicIssue" that
>supports that structure.


That feels weird to me but I'll read the proposal and comment when I have more of a clue.

>> Also, note that comic book publishers, particularly early on, often did weird things with both volume and issue numbers, sometimes as a postal regulation dodge, and sometimes just because nobody cared.  My favorite pathological example is Cat-Man comics:  http://www.comics.org/series/61787/
>
>I love the note in the Holyoke series about the "notoriously
>complicated numbering scheme"!


:-) That thing gave me and several other researchers fits!  It's not only bizarre in reality, but it was also noted incorrectly in several early sources which made it seem even more bizarre.  And a number of issues are fantastically rare.

[edited stuff out]
>> The GCD defines a series more-or-less as a set of sequentially published things having the same indicia (formal/legal) title and the same "master publisher".  Where "master publisher" is another source of endless research and debate (look at the publisher for Cat-Man Comics for an example).  Particularly before about 1960 it's extremely difficult to tell whether certain publishing companies are "the same" and by what measure.  Publishing was a dodgy business.
>
> And corners of it are still dodgy, I'm sure!  I suspect in the short
>term that we'll hold off on trying to mint related work relationships
>as part of the periodicals & comics proposal and bring that in as a
>separate proposal. And I hope that you'll be part of that
>conversation, too :)


I'd love to- this is all fascinating and I've been kind of looking for another comics-related project that doesn't require a ton of programming right now.  If you're specifically talking about something like a "master publisher" concept (but preferably more precise and useful) to determine related publications, that is a particular area of interest to me.  

Actually, how are you handling publisher?  I guess I'll go read the proposal.  Coming up with a definition that will both successfully group the whole history of Marvel from Marvel Comics #1 to the present day *and* deal with the early fly-by-night publishers in any useful way is extremely difficult.  And that's cutting several other key examples.  I imagine this problem is not limited to comics, since most of the relevant behavior came out of the pulp magazine publishing practices anyway.

>>>How would that have been handled in the original proposal: separate

>>>ComicSeries for each title change, I guess?
>>>
>>>> Comic Stories - because stories can be and are reprinted, the original comic issue in which they appeared should probably be identified in the schema.  For example, the Spider-Man origin story has been reprinted hundreds of times, but it's always "from" Amazing Fantasy #15.
>>>
>>>That sounds very reasonable; so something like an
>>>"originallyPublishedIn" property that should only be used if there are
>>>more than one "partOfComicIssue" / "partOfPeriodicalIssue" properties,
>>>to identify the ur-comic (or periodical, as that could be useful for
>>>non-comic articles as well)?
>
>Henry - do you feel strongly (either way) about an
>"originallyPublishedIn" property for ComicStory?


Reprints definitely need to be handled.  I feel strongly on that based on the significant reprint-oriented subgroup of the GCD.  Do you have anyone from the European comic scene involved here?  Reprints are in many ways more significant to them, both for reprinted U.S. work (especially Disney) as well as works published concurrently in multiple languages.

Take a look at how complex the reprint rules are (skip below the tutorial videos to the case examples): http://docs.comics.org/wiki/Reprints

Generally, the reprint link goes to the reprint source, which may or may not be the ur-comic.  A good example is translations.  The first translation links to the ur-comic.  Reprints of that translation link to the first translation.  A second independent translation to the same language would, however, link to the ur-comic.

There are several less intuitive modifications that can affect the links similarly.  This is generally based on the assumption that you can walk the link tree easily, so a link back from reprint of a translation of a translation can go through hops and you don't need a direct link at each level.

Also, keep in mind that multiple editions are often simultaneously published for different markets in Europe, making it difficult to declare a "first" publication.

>>>Hey, there is a "Story" table in the schema. That makes me feel better
>>>about having a ComicStory type, then!
>>
>> A story type is very important- for decades anthologies were the norm, not single-story issues, and in some places they still are.  One of the biggest online databases, I.N.D.U.C.K.S. (specializing in Disney comics) is centered around the story rather than the GCD's issue-centric model.
>
>This is a very good to know, thank you.


You're welcome :-)  It also gets into the reprint thing.  I.N.D.U.C.K.S. is a European database, and I think one of the main reasons it is story-based is that it's more important to be able to track down a particular story in the language of interest than to know the series/issue in which that story otherwise appears.

Stories there and to some degree at the GCD can be tracked down by job code- some sequence of letters and/or numbers assigned by the publisher for tracking the story work.  This concept is not universal, but where it appears it is generally heavily used by researchers.  There are some folks who can tell you a ton of things about a given Timely/Atlas/Marvel story's place in history just from its job code.

>> Anyway, I apologize for rambling, hope some of this helps someone- I'm happy to answer any questions about the GCD's data model.  I'm not actively writing code for them right now but I still keep an eye on developments and may get back to it.
>
>Please don't apologize; rather, thank you so much for your help and
>your patience! Based on what you and Peter have said, I _think_ the
>current proposal at
>http://www.w3.org/community/schemabibex/wiki/Periodicals_and_Comics_synthesis#Comic_Schemata
>can handle most of the core use cases for comics, and does not remove
>any of the capabilities that were offered by the original proposal at
>http://www.w3.org/wiki/WebSchemas/PeriodicalsComics.


I'll take a look at these and see if it knocks loose any other ideas.  I've also subscribed the mailing list here at least for now so folks don't have to remember to cc me.

thanks,
-henry
Received on Sunday, 8 December 2013 23:06:35 UTC