Hi folks,
  I've wandered over here from the discussion Peter started on the GCD's (comics.org's) tech list.  I do not officially represent comics.org, and I don't currently  hold any formal position with the non-profit foundation that runs comics.org- anything I say is my own opinion, and nothing more.  But I was the lead developer for the creation of the current form of the site (as deployed in late 2009) and held various positions from 2008-early 2011, so I have quite a bit of experience organizing comic book data and trying to get people to agree on how to manage it :-)

Here are a few thoughts on the schema as it currently stands (and questions for not just Peter but schema.org folks in general, as I'm still pretty unfamiliar with how things work here- so if I say something that makes no sense, please point it out!).  Generally I think this is a great project and the schema is looking good.  My feedback involves odd cases and how to handle them, mostly.

As I understand it, schema.org does not try to enforce any particular usage of the various fields.  In many cases (like "penciler"), this seems fine- any confusion about what to put in that field would likely just be the result of confusing credits in the source.  The field itself is clear enough.  But for others, is it possible to offer guidance?  For instance:

numberOfPages:  The GCD convention is to count all pages including the covers.  This is to avoid confusion when stories are printed on the (outside or inside) covers, or comics lack outer sheets that are differentiated from the inner pages (newspaper-like comics).  So in the GCD, the standard U.S. golden age comic book was 68 pages in length, not 64.  We extend the same convention to book-form comics, although with the separate GraphicNovel type, schema.org could make a distinction and exclude covers there, I suppose.

imprint: This term is much abused in the discussion of comics history.  Sometimes it means imprint in the modern publishing sense (this is OK).  But often people use it to mean the company formally listed in the indicia (a U.S.-centric concept) as the publisher. By this definition, Marvel had well over 60 "imprints" between 1939 and the late 1960s, and they shifted with dizzying frequency.  It's not clear to me what the desired use of this field is for schema.org, or how practical any recommendation would be.

issueNumber and subtitle:  How will this handle cases where numbers are not numeric, but are not really subtitles either?  For instance, issue 1/2 (which shouldn't really be written as 0.5, although is the "number" type an int, a float, or only non-negative numbers?).  Also, how about situations where duplicate numbers are disambiguated by some fairly arbitrary convention?  Human Torch #5 and #5a are a classic example, sticking with the Marvel theme.  Also, one issue is #4 in the indicia, but #3 on the cover.  And actually, #5 and #5a are Marvel's convention- the GCD uses #5 [a] and #5 [b].  http://www.comics.org/series/178/  For even more complex numbering, try Cat-Man Comics http://www.comics.org/series/61787/

There are definitely more outlandish issue "numbers" out there although I'd have to go ask some folks where they are.  I just picked some relatively simple exceptions.

One more item of international concern- once you get outside the U.S., you may find issues that are distributed by multiple distributors (across different European markets, for instance).  I have almost no knowledge of this, but it came up when we looked at adding simple distributor fields (one of the reasons why we tabled that for the time being).


From: "Olson, Peter" <polson@marvel.com>
To: public-vocabs@w3.org
Sent: Tuesday, February 21, 2012 4:25 PM
Subject: Comics Schema Update and Open Questions

Comics Schema Update and Open Questions
Hi Group –
Some housekeeping – I’ve made a number of tweaks to the comics and periodicals schema based on feedback from a number of industry resources (including comics.org and a large number of digital and brick-and-mortar retailers).  I’m waiting on feedback from a number of other sources (including Diamond Comics) and I’m going to start reaching out to non-US sites and publishers. All changes to the Wiki entry are tracked in the discussion page.
There seems to be a recurring question about identifiers  that keeps coming up For example, I included the distributor code (the Diamond Comics code, which is used by US books) as one of the identifiers for comic issues and graphic novels.  I could see a role for the internal ID used by publishers the Marvel ID is used by a lot of our digital retailers as a unique identifier and I’m sure other publishers do the same.  Comics.org suggested that the schema add their local ID as an element (it’s used by a number of other sites and they have as comprehensive a list of books as anyone)
Basically, between publishers, distributors and fan sites, what is the best practice around identifiers?
What qualifies as ubiquitous enough to be included as an identifier in a schema such as this one?  Is it better to include more (with fewer actually used) or fewer (but more frequently populated)?
-       Peter
Peter Olson | VP, Web and Application Development | Marvel Entertainment
Nothing contained in this e-mail shall (a) be considered a legally binding agreement, amendment or modification of any agreement with Marvel, each of which requires a fully executed agreement to be received by Marvel or (b) be deemed approval of any product, packaging, advertising or promotion material, which may only come from Marvel's Legal Department.