Re: Comic Schemata from Dan Scott on 2013-12-09 (public-schemabibex@w3.org from December 2013)

From: Dan Scott <denials@gmail.com>
Date: Mon, 9 Dec 2013 07:35:15 -0500
To: Henry Andrews <hha1@cornell.edu>, Peter Olson <polson@marvel.com>
Cc: "public-schemabibex@w3c.org" <public-schemabibex@w3c.org>
Message-ID: <CAAY5AM1Y1NmG+4buM2A_G8BdLs5m9SdDQD_E3yJ017Rbg_=zLw@mail.gmail.com>
HI Henry:

These are good questions to be asking; see inline, below, for my
attempted responses :)

On Mon, Dec 9, 2013 at 2:17 AM, Henry Andrews <hha1@cornell.edu> wrote:
> Hi folks,
>   I took a look at the
> http://www.w3.org/community/schemabibex/wiki/Periodicals_and_Comics_synthesis
> as suggested and have a few questions and comments.  Some of these are
> really basic questions about the schema goals a process.  I did go through
> the comics-related emails in the archive for the past few months to catch up
> a bit, but I haven't read the entire rather large volume of emails that
> didn't specifically say "comics" in the subject line.  So feel free to tell
> me to go read some other documentation, or to send some answers off-list.
>
> This definitely does a good job of covering the essentials, which I gather
> is the goal.  So now I'll nitpick at details :-P
>
> One really basic question is how much precision are you going for here?  I
> am guessing less than the GCD, which wants all the precision :-)  Do you
> have a feel for the point at which it's fine to stuff things in a "notes"
> field?  In practice, this generally boils down to "should people be able to
> search for this thing?"

We're actually aiming for a fair bit of precision, although we can
work towards refining the precision over time. schema.org is capable
of modeling complex relationships between different types, as well as
between different instances of the same type. I'll try to provide some
examples below.

> Some general concerns about the definition:
> ================================
> The description of "Comics" given at
> http://www.w3.org/community/schemabibex/wiki/Periodicals_and_Comics_synthesis#Comics
> , if read literally, is extremely specific to typical modern U.S. periodical
> comics.
>
> The restrictions on binding and size, while helpful to give the general
> idea, will break down pretty rapidly looking over the entire history of US
> comics.  It also doesn't fit typical European (nor, I imagine, Asian)
> formats all that well, although perhaps some of those fit better under
> GraphicNovel?  Is the goal here to handle published sequential art in
> general, or just the US market and things that are similar enough, with
> other schemas for bande-dessinée, manga, etc.?

Hmm. Most of the comics-related portions of the proposal you're
looking at is based on either the original January 2012 proposal that
was floated to schema.org
(http://www.w3.org/wiki/WebSchemas/PeriodicalsComics) or taken from
Peter's earlier reply on this list. The particular example that I
suspect is bothering you ("short form, saddle-stitched, usually comes
in pamphlet form") was also used in the introductory material to the
original proposal.

However (and this is hopefully good news), that was meant only to
serve as an introduction for the proposal. The actual content that
users of schema.org would see if the proposal was adopted is the
description under the actual definition of the type; so, for
ComicIssue, down at
http://www.w3.org/community/schemabibex/wiki/Periodicals_and_Comics_synthesis#Thing_.3E_CreativeWork_.3E_PeriodicalIssue_.3E_ComicIssue
the description is:

"Individual comic issues are serially published as part of a larger
series (for the sake of consistency, even one-shot issues belong to a
series comprised of a single issue). All comic issues can be uniquely
identified by the combination of the name and volume number of the
series to which the issue belongs; the issue number; and the variant
description of the issue (if it exists)."

> Hierarchy:
> =======
> I see that each level (Comic, ComicSeries, ComicIssue, ComicStory) can link
> to all of the levels above or below it.  Is this just to support the full
> range of possible "joins" (to borrow from SQL) more easily?  Or do you
> expect that some levels will be omitted. Would a comic published as a
> one-shout (per the indicia) with only one story in it just have a Comic and
> a ComicIssue and no ComicSeries or ComicStory?

Peter had made it clear earlier that many comics do not follow a
strict Comic / ComicSeries / ComicIssue hierarchy. In the original
Comics proposal, there was only ComicSeries and ComicIssue, but in
digging into the possible permutations it seemed to me as though there
was a need to break out ComicStory as its own thing (to support the
description of multiple stories published in a single issue, as well
as to support individual stories that get republished elsewhere). I
have been less sure about the need for a separate Comic vs.
ComicSeries type, but wanted to start with that distinction and then
collapse it if it does not hold up under scrutiny.

I expect that an automated publishing system like comics.org would
simply mark up that one-shot example on the series page
(http://www.comics.org/series/76838/) using the full Comic /
ComicSeries / ComicIssue / ComicStory - something like:

<div vocab="http://schema.org/" typeof="Comic">
  <h1 property="name">All-New X-Men Special</h1>
  <div property="hasComicSeries" typeof="ComicSeries"><span
property="volumeNumber">2013</span> Series
    <div property="hasComicIssue" typeof="ComicIssue">Issue #<span
property="issueNumber">1</span>
      <div property="hasComicStory" typeof="ComicStory"><span
property="description">The X-Men are in the arms of the
Octopus...</span></div>
    </div>
  </div>
</div>

Although that could also be flattened out as follows:

<div vocab="http://schema.org/" typeof="Comic">
  <h1 property="name">All-New X-Men Special</h1>
  <p property="hasComicSeries" typeof="ComicSeries"><span
property="volumeNumber">2013</span> Series</p>
  <p property="hasComicIssue" typeof="ComicIssue">Issue #<span
property="issueNumber">1</span></p>
  <p property="hasComicStory" typeof="ComicStory"><span
property="description">The X-Men are in the arms of the
Octopus...</span></p>
</div>

That said, the existing proposal would let you mark it up as just a
Comic and ComicIssue. For a publishing system like comics.org, they
would have to use a separate template for one-shots, and it might be a
little more difficult for search engines to retrieve the intended
semantics ('If the ComicIssue has no described ComicStory, then assume
that the descriptive properties like "description" and "name" actually
describe an implicit ComicStory...').

> Is this why ComicIssue and
> ComicStory have many duplicate fields?

That's one of the reasons, yes. Note that the original Comics proposal
also duplicated most of those fields across ComicSeries and
ComicIssue, so I was just attempting to maintain the status quo there.

> How are searches expected to handle creator data being at either of two
> possible levels?  (again, apologies if this is obvious to folks who have
> been working on this stuff for a while).

Implementations will differ by each search engine that crawls the
pages marked up with schema.org, but in the Google Custom Search, for
example, you can filter on properties of different types - so if you
wanted to search for ComicIssues or ComicStories where the creator was
"Joss Whedon", that would avoid pulling up results where the Comic was
created by Joss Whedon but the issues and stories were actually
written by someone else. Alternately, I believe you can search for the
same property across a number of types.

> I see in the examples that these things are shown nesting in XML.  Does this
> mean that none of the connections are many-to-many?

You can actually repeat the "has*" and "partOf*" properties, so you
can (for example) link a given story to many different issues, series,
and graphic novels. And you can link a given Comic to many different
series and issues and stories. I _think_ this offers the flexibility
you're looking for.

> With issues and stories
> that can be useful for variants (although that's not how the GCD does it on
> the back-end).  There are examples of issues as part of multiple series (the
> GCD has never implemented that, although there are intentions).  Of course
> duplicating data is also an option- if that's the plan for variants then
> you're probably fine with it for the series case- it's fairly rare.  I can
> pull up examples if anyone wants some, though.
>
> I've already commented that I think the "Comic" concept as stated is a bit
> problematic, although I'm still contemplating that.  Probably worth its own
> thread, maybe tomorrow.

Agreed (both that it is a bit problematic, and deserving of a separate thread).

> ComicIssue and ComicStory:
> =====================
> I noticed discussion of an Article type.  Is there a particular reason why
> ComicStory does not correlate to Article?

I think I originally made ComicStory a subclass of Article, then opted
against that for the sake of simplicity. The only property I was
really interested in for ComicStory that would have come from Article
was "pagination", so it seemed easier to just define it as a direct
descendant of CreativeWork. However, that could easily be changed!

> At the GCD we have found it much easier to model the cover as a type of
> ComicStory.  They need the entire set of credits (especially when you get to
> cases like http://www.comics.org/issue/85/cover/4/ where the cover is a
> complete story, or ones where the cover is the first page of a story that
> continues inside (I can't recall an example off the top of my head).

Hah! The cover as a complete story is an awesome example. You may be
interested in my proposal for enabling better markup of cover art at
http://lists.w3.org/Archives/Public/public-schemabibex/2013Nov/0091.html
(in short this would add a coverArt property to all CreativeWorks
pointing at a full type to allow distinguishing between variants, for
example, and in the case of cover-as-story you could apply the
alternate type "ComicStory" to it as well).

> Looks like multiple contributors of the same role should work fine here.
> What about pen names?  Is this intended to record it as credited, by an
> authoritative name, or both?  I'm assuming "Person" handles some notion of
> name changes or nicknames/sobriquets.  I should probably go find the
> definition of the Person type...

Yeah... in an ideal world, you link to a Person which can in turn link
to something like http://viaf.org/viaf/34481281/ or the ISNI
equivalent so that you (or more accurately the search engine) can see
that Alan Moore also published works as Curt Vile and Jill de Ray and
do intelligent things with that.

> Is there any interest in capturing information about editors or other roles?

Good news: because all of the Comic types are subclasses of
http://schema.org/CreativeWork, we get all the roles defined there
(such as "editor") for free! Of course there is always room for more
roles.

> I think it would be a good idea to allow job codes as a local ID space on
> stories similar to distributor codes on issues.
> Here's an illustration of INDUCKS' prominent use of job/story codes:
> http://coa.inducks.org/index.php
> AtlasTales http://atlastales.com/search and the GCD also allow searching by
> job codes.

There is a way of defining an external enumeration that would be more
restrictive, but that external enumeration needs to exist and be
openly available to point at. Alternately, you can define an
enumeration within schema.org, but that causes more churn for
schema.org and seems to be not working out all that well thus far. Are
the comic-related job codes openly available and reasonably broadly
used?

> Is the "format" field free-form?  That's given the GCD a lot of headache
> over the years, although it depends on the attempted resolution.  If you're
> just going for comic vs book vs album vs (I dont know what the Asian formats
> are) you'll probably get reasonable data.

Yes, at this point (following the original Comics proposal), format is
free-form "Text". Again, if there's a good external enumeration we
could point at that already exists and covers most use cases, that
would be a great enhancement!

> =====
> I had another section on "imprint" but decided that it could use its own
> thread.  I'll post that shortly.  I'll also write separately on GraphicNovel
> at some point.

Great, thanks Henry!
Received on Monday, 9 December 2013 12:35:44 UTC