Re: Comic Schemata

> From: Dan Scott <denials@gmail.com>
>Subject: Re: Comic Schemata
> 
>
>HI Henry:
>
>These are good questions to be asking; see inline, below, for my
>attempted responses :)
>
>On Mon, Dec 9, 2013 at 2:17 AM, Henry Andrews <hha1@cornell.edu> wrote:
>> One really basic question is how much precision are you going for here?  I
>> am guessing less than the GCD, which wants all the precision :-)  Do you
>> have a feel for the point at which it's fine to stuff things in a "notes"
>> field?  In practice, this generally boils down to "should people be able to
>> search for this thing?"
>
>We're actually aiming for a fair bit of precision, although we can
>work towards refining the precision over time. schema.org is capable
>of modeling complex relationships between different types, as well as
>between different instances of the same type. I'll try to provide some
>examples below.


OK this makes sense.  So basically we don't need to get super-precise now, but we need to not make that difficult for the future.

[editing out the bit about the definition of "Comics" as that all sounds good]

>> Hierarchy:
>> =======
>> I see that each level (Comic, ComicSeries, ComicIssue, ComicStory) can link
>> to all of the levels above or below it.  Is this just to support the full
>> range of possible "joins" (to borrow from SQL) more easily?  Or do you
>> expect that some levels will be omitted. Would a comic published as a
>> one-shout (per the indicia) with only one story in it just have a Comic and
>> a ComicIssue and no ComicSeries or ComicStory?
>
>Peter had made it clear earlier that many comics do not follow a
>strict Comic / ComicSeries / ComicIssue hierarchy. In the original
>Comics proposal, there was only ComicSeries and ComicIssue, but in
>digging into the possible permutations it seemed to me as though there
>was a need to break out ComicStory as its own thing (to support the
>description of multiple stories published in a single issue, as well
>as to support individual stories that get republished elsewhere). I
>have been less sure about the need for a separate Comic vs.
>ComicSeries type, but wanted to start with that distinction and then
>collapse it if it does not hold up under scrutiny.
>
>I expect that an automated publishing system like comics.org would
>simply mark up that one-shot example on the series page
>(http://www.comics.org/series/76838/) using the full Comic /
>ComicSeries / ComicIssue / ComicStory - something like:
>
><div vocab="http://schema.org/" typeof="Comic">
>  <h1 property="name">All-New X-Men Special</h1>
>  <div property="hasComicSeries" typeof="ComicSeries"><span
>property="volumeNumber">2013</span> Series
>    <div property="hasComicIssue" typeof="ComicIssue">Issue #<span
>property="issueNumber">1</span>
>      <div property="hasComicStory" typeof="ComicStory"><span
>property="description">The X-Men are in the arms of the
>Octopus...</span></div>
>    </div>
>  </div>
></div>


Yeah, we would probably do the full nested thing.  Flattening might happen if the GCD starts treating certain kinds of publications as not having all the levels as well- i.e. single books don't really have series, although a multi-volume collection or novel series does.

>Although that could also be flattened out as follows:
>
><div vocab="http://schema.org/" typeof="Comic">
>  <h1 property="name">All-New X-Men Special</h1>
>  <p property="hasComicSeries" typeof="ComicSeries"><span
>property="volumeNumber">2013</span> Series</p>
>  <p property="hasComicIssue" typeof="ComicIssue">Issue #<span
>property="issueNumber">1</span></p>
>  <p property="hasComicStory" typeof="ComicStory"><span
>property="description">The X-Men are in the arms of the
>Octopus...</span></p>
></div>
>
>That said, the existing proposal would let you mark it up as just a
>Comic and ComicIssue. For a publishing system like comics.org, they
>would have to use a separate template for one-shots, and it might be a
>little more difficult for search engines to retrieve the intended
>semantics ('If the ComicIssue has no described ComicStory, then assume
>that the descriptive properties like "description" and "name" actually
>describe an implicit ComicStory...').


Yeah, I don't have a good feel fore where the complexity should be here (data structure or data indexing/retrieval)

>> Is this why ComicIssue and
>> ComicStory have many duplicate fields?
>
>That's one of the reasons, yes. Note that the original Comics proposal
>also duplicated most of those fields across ComicSeries and
>ComicIssue, so I was just attempting to maintain the status quo there.


Makes sense.

>> How are searches expected to handle creator data being at either of two
>> possible levels?  (again, apologies if this is obvious to folks who have
>> been working on this stuff for a while).
>
>Implementations will differ by each search engine that crawls the
>pages marked up with schema.org, but in the Google Custom Search, for
>example, you can filter on properties of different types - so if you
>wanted to search for ComicIssues or ComicStories where the creator was
>"Joss Whedon", that would avoid pulling up results where the Comic was
>created by Joss Whedon but the issues and stories were actually
>written by someone else. Alternately, I believe you can search for the
>same property across a number of types.


Hm... we rarely think of a series as having a creator, but I see the point here (the Buffy/Angel comics do kind of fit this, don't they?)

>> I see in the examples that these things are shown nesting in XML.  Does this
>> mean that none of the connections are many-to-many?
>
>You can actually repeat the "has*" and "partOf*" properties, so you
>can (for example) link a given story to many different issues, series,
>and graphic novels. And you can link a given Comic to many different
>series and issues and stories. I _think_ this offers the flexibility
>you're looking for.


It does seem to.

>> ComicIssue and ComicStory:
>> =====================
>> I noticed discussion of an Article type.  Is there a particular reason why
>> ComicStory does not correlate to Article?
>
>I think I originally made ComicStory a subclass of Article, then opted
>against that for the sake of simplicity. The only property I was
>really interested in for ComicStory that would have come from Article
>was "pagination", so it seemed easier to just define it as a direct
>descendant of CreativeWork. However, that could easily be changed!


Thanks for the explanation- I don't have a good enough feel for the pros and cons to weigh in yet :-)

>> At the GCD we have found it much easier to model the cover as a type of
>> ComicStory.  They need the entire set of credits (especially when you get to
>> cases like http://www.comics.org/issue/85/cover/4/ where the cover is a
>> complete story, or ones where the cover is the first page of a story that
>> continues inside (I can't recall an example off the top of my head).
>
>Hah! The cover as a complete story is an awesome example. You may be
>interested in my proposal for enabling better markup of cover art at
>http://lists.w3.org/Archives/Public/public-schemabibex/2013Nov/0091.html
>(in short this would add a coverArt property to all CreativeWorks
>pointing at a full type to allow distinguishing between variants, for
>example, and in the case of cover-as-story you could apply the
>alternate type "ComicStory" to it as well).

This also makes sense. It's not so much that the cover art is the same thing as a story as that we just have one SQL table for all the things inside an issue so we say that they're all "stories" even though some use the fields differently than others (for a particularly unusual example, see Statement of Ownership, which is specific to US comics but is critical for research into business structure and circulation, even though you can't read the ownership information literally as it's all a tax dodge in the early decades anyway).

>> Looks like multiple contributors of the same role should work fine here.
>> What about pen names?  Is this intended to record it as credited, by an
>> authoritative name, or both?  I'm assuming "Person" handles some notion of
>> name changes or nicknames/sobriquets.  I should probably go find the
>> definition of the Person type...
>
>Yeah... in an ideal world, you link to a Person which can in turn link
>to something like http://viaf.org/viaf/34481281/ or the ISNI
>equivalent so that you (or more accurately the search engine) can see
>that Alan Moore also published works as Curt Vile and Jill de Ray and
>do intelligent things with that.


Yup.  The GCD has a big proposal on how to handle creator identities, affiliation and other history.  No one has time to implement it, which is a shame because it's supposed to make Jerry Bails' Who's Who a updating resource again.  Here's the report from that committee: http://docs.comics.org/wiki/Report_WhosWho_Committee

>> Is there any interest in capturing information about editors or other roles?
>
>Good news: because all of the Comic types are subclasses of
>http://schema.org/CreativeWork, we get all the roles defined there
>(such as "editor") for free! Of course there is always room for more
>roles.


Nice.

>> I think it would be a good idea to allow job codes as a local ID space on
>> stories similar to distributor codes on issues.
>> Here's an illustration of INDUCKS' prominent use of job/story codes:
>> http://coa.inducks.org/index.php
>> AtlasTales http://atlastales.com/search and the GCD also allow searching by
>> job codes.
>
>There is a way of defining an external enumeration that would be more
>restrictive, but that external enumeration needs to exist and be
>openly available to point at. Alternately, you can define an
>enumeration within schema.org, but that causes more churn for
>schema.org and seems to be not working out all that well thus far. Are
>the comic-related job codes openly available and reasonably broadly
>used?


I think there's a misunderstanding here or it might just be me :-)  Job codes are letters and numbers usually scrawled in some corner of a panel at the beginning or end of the story.  There's no enumeration, it's either visible on the story or its not.  Some publishers always have them (at least for a particular span of time).  Others never used them.  See the right side of the story list for examples on a randomly chosen issue here: http://atlastales.com/issue/1242

>> Is the "format" field free-form?  That's given the GCD a lot of headache
>> over the years, although it depends on the attempted resolution.  If you're
>> just going for comic vs book vs album vs (I dont know what the Asian formats
>> are) you'll probably get reasonable data.
>
>Yes, at this point (following the original Comics proposal), format is
>free-form "Text". Again, if there's a good external enumeration we
>could point at that already exists and covers most use cases, that
>would be a great enhancement!


The GCD kicks this around every so often, maybe one day there will be an actual enumeration.  There have been some steps towards putting bounds on the problem, but I don't think there's a clear list at this point.


thanks,
-henry

Received on Tuesday, 10 December 2013 06:28:05 UTC