RE: Uniquely identifying series and issues from Young,Jeff (OR) on 2015-10-06 (public-schemabibex@w3.org from October 2015)

From: Young,Jeff (OR) <jyoung@oclc.org>
Date: Tue, 6 Oct 2015 14:04:06 +0000
To: "hha1@cornell.edu" <hha1@cornell.edu>, Sean Petiya <spetiya1@kent.edu>
CC: "public-schemabibex@w3.org" <public-schemabibex@w3.org>
Message-ID: <CY1PR0601MB15368B70BC84614EE707B48EAD370@CY1PR0601MB1536.namprd06.prod.outlook.>
I don’t know if Django can support 303 redirects, but the VIAF URI patterns might be a useful comparison. It uses this “Cool URI” pattern:

http://www.w3.org/TR/cooluris/#r303gendocument


[cid:image001.png@01D1001C.EF2FB320]

In effect, the “generic document” is a graph. This pattern is a bit more engineered than the others but it has more flexibility, such as the ability to describe the graph (generic document) independently from the thing. Another nice thing about this “generic document” pattern is that you can piggy back hash URIs on the generic document URI and describe those things too, if you need to.

http://www.w3.org/TR/cooluris/#hashuri


Jeff

From: hha1@cornell.edu [mailto:hha1@cornell.edu]
Sent: Tuesday, October 06, 2015 1:34 AM
To: Sean Petiya
Cc: public-schemabibex@w3.org
Subject: Re: Uniquely identifying series and issues

I expect to provide a RESTful API, including HATEOAS, linking/relations, etc.  I've done a good bit of work on RESTful API design before so I have some fairly concrete notions of how that works already.

We're using Django underneath, so the only oddity of the URLs is that the canonical form always ends with a trailing "/".  There's an argument that can be made that resources should not have a trailing "/" as they are things, not directories.  I haven't decided if I want to drop the trailing "/" for REST URIs- if we were to use file extensions for content selection we would want to drop it.  For the HTML web pages we'll want to leave them in place as those URLs are used in a fair number of places around the web now.

I guess one question is whether to use the web site URIs or the API's URIs.  I was not planning to enforce a direct correlation between the web site and the API.  It may match in some places but not in others.  The web pages canonically live at www.comics.org,<http://www.comics.org%2C/> and I was vaguely planning on hosting the API at api.comics.org (all of this is provisional, btw- there's a group of tech folks and while I'm the only one working on the API, everything is subject to review by the group).

I am planning to support different content formats selected with headers.  I dislike using file extensions for content selection, partially because you can then have conflicting headers and extensions which annoys me a lot.  The argument I've usually heard in favor of extensions is about being able to look at output in a browser by typing in a simple URL, but there are debugging tools that let you set headers in order to do that kind of thing (and you can set up a reasonable default content type).  I'd be interested in hearing other counterarguments, though.  The last time I had that discussion was a couple of years ago, so maybe there are newer best practices to consider.

Any commentary on the above, while perhaps outside the scope of this mailing list, is welcome.

thanks,
-henry

________________________________
From: Sean Petiya <spetiya1@kent.edu<mailto:spetiya1@kent.edu>>
To: hha1@cornell.edu<mailto:hha1@cornell.edu>
Cc: Dan Scott <denials@gmail.com<mailto:denials@gmail.com>>; "public-schemabibex@w3.org<mailto:public-schemabibex@w3.org>" <public-schemabibex@w3.org<mailto:public-schemabibex@w3.org>>
Sent: Monday, October 5, 2015 8:41 PM
Subject: Re: Uniquely identifying series and issues

Actually, I think GCD URL's are good candidates for identifiers. They are extensionless, and meet the technical criteria for a URI. I'm not familiar with the GCD webserver configuration, but depending on how you plan to setup your API, Henry, you could serve negotiable content in a variety of formats from these same base URLs (Not sure what your specific plan is, or the technical requirements of your API, maybe its RESTful...).

Here's just one basic approach---and an example from my comic book ontology---but you could pick almost any good Web vocabulary and do the same type of in-browser request for specific content types:

URI -> https://comicmeta.org/cbo/Comic

HTML -> https://comicmeta.org/cbo/Comic.html

Turtle -> https://comicmeta.org/cbo/Comic.ttl

JSON -> https://comicmeta.org/cbo/Comic.json


If you were to follow this approach, your URLs would look like:

URI -> http://www.comics.org/issue/899800

HTML-> http://www.comics.org/issue/899800.html

Turtle->http://www.comics.org/issue/899800.ttl

JSON->http://www.comics.org/issue/899800.json


Of course, even without the additional content types, the GCD URLs make good identifiers. I'd love to see library data referencing GCD identifies so that we could query for relationships like what specific comic issues and/or stories are contained in a collection of comics on a library shelf, such as in an omnibus or trade paperback.

For example, relationships like:

<http://www.worldcat.org/oclc/714725942>
            schema:hasPart <http://www.comics.org/issue/44703/>

...are especially useful to comic book fans and readers (i.e., "I need to read the Amazing Spiderman #302, where can I find it?").

I've fleshed out some of what I think this might look like in my thesis [1], and I have examples on GitHub [2] if you are interested. Although, fair warning, they are not schema.org<http://schema.org/> specific or exclusive---but the basics I think would be applicable to your case.

Dan Scott also has a great set of HTML/RDFa schema.org<http://schema.org/> examples for comics that (I think) uses WorldCat identifiers, and if not it definitely used GCD URLs as identifiers--if I remember correctly. Unfortunately, I have lost the link---but maybe Dan can provide it?

Good luck, and I'm excited to hear more!

Sean Petiya

[1] http://rave.ohiolink.edu/etdc/view?acc_num=kent1416791055

[2] https://github.com/seanpetiya/thesis







On Mon, Oct 5, 2015 at 7:28 PM, <hha1@cornell.edu<mailto:hha1@cornell.edu>> wrote:
I think I answered this question (below) myself already- GCD URLs would be one source of URLs that could be used in the "sameAs" field from Thing.  If I'm understanding that field's usage correctly now.  I had originally taken it to be "same as" in some sort of same-type sense rather than an identity-defining sense.

Learning curve...

cheers,
-henry

________________________________
From: "hha1@cornell.edu<mailto:hha1@cornell.edu>" <hha1@cornell.edu<mailto:hha1@cornell.edu>>
When you use GCD URLs as examples here, are you thinking of people generally using our URLs for identification purposes, or that it would just be any URL (for instance from one of the other databases) and which source would depend on the user?

thanks,
-henry
Attachments

image/png attachment: image001.png
Received on Tuesday, 6 October 2015 14:04:42 UTC