Re: Uniquely identifying series and issues from hha1@cornell.edu on 2015-10-06 (public-schemabibex@w3.org from October 2015)

From: <hha1@cornell.edu>
Date: Tue, 6 Oct 2015 05:34:24 +0000 (UTC)
To: Sean Petiya <spetiya1@kent.edu>
Cc: "public-schemabibex@w3.org" <public-schemabibex@w3.org>
Message-ID: <609134460.1071061.1444109664799.JavaMail.yahoo@mail.yahoo.com>
I expect to provide a RESTful API, including HATEOAS, linking/relations, etc.  I've done a good bit of work on RESTful API design before so I have some fairly concrete notions of how that works already.
We're using Django underneath, so the only oddity of the URLs is that the canonical form always ends with a trailing "/".  There's an argument that can be made that resources should not have a trailing "/" as they are things, not directories.  I haven't decided if I want to drop the trailing "/" for REST URIs- if we were to use file extensions for content selection we would want to drop it.  For the HTML web pages we'll want to leave them in place as those URLs are used in a fair number of places around the web now.
I guess one question is whether to use the web site URIs or the API's URIs.  I was not planning to enforce a direct correlation between the web site and the API.  It may match in some places but not in others.  The web pages canonically live at www.comics.org, and I was vaguely planning on hosting the API at api.comics.org (all of this is provisional, btw- there's a group of tech folks and while I'm the only one working on the API, everything is subject to review by the group).
I am planning to support different content formats selected with headers.  I dislike using file extensions for content selection, partially because you can then have conflicting headers and extensions which annoys me a lot.  The argument I've usually heard in favor of extensions is about being able to look at output in a browser by typing in a simple URL, but there are debugging tools that let you set headers in order to do that kind of thing (and you can set up a reasonable default content type).  I'd be interested in hearing other counterarguments, though.  The last time I had that discussion was a couple of years ago, so maybe there are newer best practices to consider.
Any commentary on the above, while perhaps outside the scope of this mailing list, is welcome.
thanks,-henry
      From: Sean Petiya <spetiya1@kent.edu>
 To: hha1@cornell.edu 
Cc: Dan Scott <denials@gmail.com>; "public-schemabibex@w3.org" <public-schemabibex@w3.org> 
 Sent: Monday, October 5, 2015 8:41 PM
 Subject: Re: Uniquely identifying series and issues
   
Actually, I think GCD URL's are good candidates for identifiers. They are extensionless, and meet the technical criteria for a URI. I'm not familiar with the GCD webserver configuration, but depending on how you plan to setup your API, Henry, you could serve negotiable content in a variety of formats from these same base URLs (Not sure what your specific plan is, or the technical requirements of your API, maybe its RESTful...).
Here's just one basic approach---and an example from my comic book ontology---but you could pick almost any good Web vocabulary and do the same type of in-browser request for specific content types:
URI -> https://comicmeta.org/cbo/Comic
HTML -> https://comicmeta.org/cbo/Comic.htmlTurtle -> https://comicmeta.org/cbo/Comic.ttlJSON -> https://comicmeta.org/cbo/Comic.json
If you were to follow this approach, your URLs would look like:
URI -> http://www.comics.org/issue/899800
HTML-> http://www.comics.org/issue/899800.htmlTurtle->http://www.comics.org/issue/899800.ttlJSON->http://www.comics.org/issue/899800.json
Of course, even without the additional content types, the GCD URLs make good identifiers. I'd love to see library data referencing GCD identifies so that we could query for relationships like what specific comic issues and/or stories are contained in a collection of comics on a library shelf, such as in an omnibus or trade paperback.
For example, relationships like:
<http://www.worldcat.org/oclc/714725942> schema:hasPart <http://www.comics.org/issue/44703/>
...are especially useful to comic book fans and readers (i.e., "I need to read the Amazing Spiderman #302, where can I find it?").
I've fleshed out some of what I think this might look like in my thesis [1], and I have examples on GitHub [2] if you are interested. Although, fair warning, they are not schema.org specific or exclusive---but the basics I think would be applicable to your case.
Dan Scott also has a great set of HTML/RDFa schema.org examples for comics that (I think) uses WorldCat identifiers, and if not it definitely used GCD URLs as identifiers--if I remember correctly. Unfortunately, I have lost the link---but maybe Dan can provide it?
Good luck, and I'm excited to hear more!
Sean Petiya
[1] http://rave.ohiolink.edu/etdc/view?acc_num=kent1416791055[2] https://github.com/seanpetiya/thesis






On Mon, Oct 5, 2015 at 7:28 PM, <hha1@cornell.edu> wrote:

I think I answered this question (below) myself already- GCD URLs would be one source of URLs that could be used in the "sameAs" field from Thing.  If I'm understanding that field's usage correctly now.  I had originally taken it to be "same as" in some sort of same-type sense rather than an identity-defining sense.
Learning curve...
cheers,-henry
      From: "hha1@cornell.edu" <hha1@cornell.edu>
When you use GCD URLs as examples here, are you thinking of people generally using our URLs for identification purposes, or that it would just be any URL (for instance from one of the other databases) and which source would depend on the user?
thanks,-henry
Received on Tuesday, 6 October 2015 05:34:59 UTC