Re: Comics and periodicals in schema.org (was Re: journal article for next call?)

Hi folks- for those who don't know me, I'm the former tech lead for the GCD (comics.org), and in particular I was the tech lead when the current iteration of the database was set up.  Comments below:

> From: Dan Scott <denials@gmail.com>
>Sent: Thursday, December 5, 2013 9:54 PM
>Subject: Re: Comics and periodicals in schema.org (was Re: journal article for next call?)
> 
>
>Hi Peter!
>
>On Thu, Dec 5, 2013 at 11:57 PM, Olson, Peter <polson@marvel.com> wrote:
>> Hi Dan -
>>
>
>Okay, I was following the basic tutorial at comics.org
>(http://docs.comics.org/wiki/OI_Tutorial#How_do_I_create_my_first_index.3F)
>where it mentions searching for "Muppet Show" by Series Name, and
>three of the first four results are series for the same "The Muppet
>Show" comic published by Boom! in 2009, with the first series having
>four issues, and the next two series having one issue each. As I
>warned in my initial email, I worried that I might be drawing too much
>from that example!

This illustrates a point of ongoing discussion at the GCD, which is the idea of treating "periodicals" (i.e. things like U.S. monthly-ish comics, or UK weeklies) vs "albums" (the common European format) vs "books" (collected editions, longer single publications, etc.) in separate ways.  Right now we treat them all the same, primarily based on the U.S. monthly periodical.  Of course, defining what is and isn't a "book" is incredibly contentious, up to and including the question of whether the categories are distinct or overlap.  So that one has been going around in circles for years.

Several of the "Muppet Show" series matched there are collections or otherwise book-like.

[quoted out of order -henry]
>>We have two distinct Amazing Spider-Man series, one which started
>>in 1963 and one which started in 1999
>>(http://marvel.com/comics/series/1987/amazing_spider-man_1963_-_1998
>>and http://marvel.com/comics/series/454/amazing_spider-man_1999_-_2013).

>
>(For what it's worth, I had looked up "The Amazing
>Spider-Man" at the time and saw that it had one huge series starting
>in 1963, so I was confused!)

Not sure what happened there, GCD has the series same as Marvel:
http://www.comics.org/series/1570/ 1963-1998
http://www.comics.org/series/11288/ 1999-2013

>I was worrying that perhaps there was no need for a Comic /
>ComicSeries split after all. Do cases like "7 Brothers"
>(http://www.comics.org/series/name/7%20brothers/sort/alpha/) where a
>set of 5 issues was published in 2007, and another set of 5 issues was
>published in 2008 justify continuing to have ComicSeries match with
>PeriodicalVolume, and to have a separate Comic as a peer of
>Periodical? Maybe.

No, that's not something you can rely on.  Volume numbers vary widely in comics.  Early Golden Age U.S. comics would have a volume per year and reset the issue number each year.  For decades, DC would increment the volume number each year *without* resetting the issue number.  European series do something involving calendar years (I'm not sure if that's a formal volume or just the European GCD indexers' notational convention- sadly a fair chunk of what ought to be schema is still done through notation due to not enough tech volunteers to migrate the more complex notation).

Also, note that comic book publishers, particularly early on, often did weird things with both volume and issue numbers, sometimes as a postal regulation dodge, and sometimes just because nobody cared.  My favorite pathological example is Cat-Man comics:  http://www.comics.org/series/61787/


>> (Storylines generally are complicated because they often don't stay neatly within the comics' bibliographic structures.  I gave a talk a while back that touches on some of these issues here, if you're morbidly curious: http://new.livestream.com/hugeinc/events/2474611)
>
>I am morbidly curious and will check that out.


I'll have to take a look at that!  It is certainly a very complicated relationship.

>> Definition of Comic - There's some potential for ambiguity here so I wanted to dig down on some specific examples.  Often several comic series are published simultaneously with very similar names.  For example, we currently publish the following:
>> X-Men
>> Uncanny X-Men
>> Ultimate X-Men
>> X-Men Legacy
>>
>> If I'm reading the proposal right, each of those would be distinct comics (each containing one or more distinct series).
>
>Yes, that's what I was thinking.
>
>> Another example - over the years we published a series of Comic Series in which the titles changed but the numbering was continuous: X-Men -> New X-Men  -> X-Men -> X-Men Legacy -> X-Men (again see the talk, which lists out a few more examples).  Under the definition in the proposal each distinct title would be a distinct Comic, correct?
>
>Fascinating! Yes, I think each title would be a distinct Comic in that
>case. Maybe we'll need some sort of relatedWork mechanism sooner
>rather than later after all. From http://docs.comics.org/wiki/Tracking
>it looks like "Continues from" / "Continues in" covers the
>relationships that comics.org cares about, although it carries series
>name, publisher, and date with each relationship.

I had a more fully-featured notion of tracking links that I've never had time to get through policy and implement.  Tracking links are one of those things that are still notation-based..  There's the "numbering continues from/in" concept, which is well-established (and btw, may not just link two series at each end- see: http://www.comics.org/series/177/ for a double continuation, and http://www.comics..org/series/236/ for a rather infamous numbering shell game).

An important corollary of the renumbering thing is that the same name can show up again as part of a *different* comic (for lack of a better term).  The first "New X-Men" was very much a retitling of "X-Men", but the second was a book focused on the junior team, with unrelated numbering (it re-launched out of "New Mutants", itself a re-used name).

The GCD defines a series more-or-less as a set of sequentially published things having the same indicia (formal/legal) title and the same "master publisher".  Where "master publisher" is another source of endless research and debate (look at the publisher for Cat-Man Comics for an example).  Particularly before about 1960 it's extremely difficult to tell whether certain publishing companies are "the same" and by what measure.  Publishing was a dodgy business.

>How would that have been handled in the original proposal: separate
>ComicSeries for each title change, I guess?
>
>> Comic Stories - because stories can be and are reprinted, the original comic issue in which they appeared should probably be identified in the schema.  For example, the Spider-Man origin story has been reprinted hundreds of times, but it's always "from" Amazing Fantasy #15.
>
>That sounds very reasonable; so something like an
>"originallyPublishedIn" property that should only be used if there are
>more than one "partOfComicIssue" / "partOfPeriodicalIssue" properties,
>to identify the ur-comic (or periodical, as that could be useful for
>non-comic articles as well)?
>
>> It might be worthwhile looking at the comics.org schema as well: http://docs.comics.org/wiki/Current_Schema
>
>As someone who cut his first-career teeth developing a relational
>database for 8 years, *yes*, it's always worthwhile looking at
>database schemas (I will pretend that I'm not seeing the "recalculated
>by code on data updates" statements) :)
>
>Hey, there is a "Story" table in the schema. That makes me feel better
>about having a ComicStory type, then!


A story type is very important- for decades anthologies were the norm, not single-story issues, and in some places they still are.  One of the biggest online databases, I.N.D.U.C.K.S. (specializing in Disney comics) is centered around the story rather than the GCD's issue-centric model.

I should also warn that that schema on the GCD wiki is out of date, although not drastically misleading.  But for instance the current tech lead just implemented a two-layer publisher's branding scheme to deal with the many difficulties we had with a single-layer system (minor logo changes caused a lot of confusion).

Anyway, I apologize for rambling, hope some of this helps someone- I'm happy to answer any questions about the GCD's data model.  I'm not actively writing code for them right now but I still keep an eye on developments and may get back to it.

cheers,
-henry

Received on Saturday, 7 December 2013 08:22:18 UTC