- From: Young,Jeff (OR) <jyoung@oclc.org>
- Date: Fri, 12 Nov 2010 19:01:47 -0500
- To: "William Waites" <ww@eris.okfn.org>
- Cc: "public-lld" <public-lld@w3.org>, <richard.stirling@cabinet-office.x.gs.gov.uk>, <john.sheridan@nationalarchives.gsi.gov.uk>, <jeni@jenitennison.com>
I appreciate Jodi's comment that my document was actually "readable". Unfortunately, my clarity seems to be running on empty again. :-) William, my comments are below: > > (Real world Document) > > > > http://education.data.gov.uk/school/78 > > > > (Generic) Document > > > > http://education.data.gov.uk/school/78/ > > I don't quite understand what a "Generic Document" is and the > difference between presence and absence of a slash is very slight > and likely to leat to confusion and bugs for people using the > data. Note that the CTOC document specifies a distinction between "Document" and "Representation" resources. In "Cool URIs for the Semantic Web" document , the terms "Generic Document" and "Web Document" are used instead: http://www.w3.org/TR/cooluris/#r303gendocument http://www.w3.org/TR/cooluris/#oldweb The meaning of "Generic Document" is based on a W3C TAG decision known as genericResources-53 described here: http://www.w3.org/2001/tag/issues.html#genericResources-53 http://www.w3.org/2001/tag/doc/alternatives-discovery Basically, a Generic Document is an identifiable content-negotiable Web Document that has no representation of its own. (You could think of the trailing slash as an indication of this fact.) This abstraction allows servers to support browser (HTML) and semantic agents (RDF) from the same HTTP URI. There is a one-to-many relationship between a generic document and web document, which fits naturally with the pattern I've suggested: Generic Document: http://education.data.gov.uk/school/78/ Web Document: http://education.data.gov.uk/school/78/about.html (application/xhtml+xml) Web Document: http://education.data.gov.uk/school/78/about.rdf (application/rdf+xml) Web Document: http://education.data.gov.uk/school/78/about.json (application/json) etc. > > (Web Document) Representation > > > > http://education.data.gov.uk/school/78/doc.rdf > > Why not just 78.rdf, 78.html, etc? There are a few reasons. The most important is to accommodate use cases where multiple representations of the same "file extension" are possible. For example: Generic Document: http://viaf.org/viaf/108389263/ Web Document: http://viaf.org/viaf/108389263/viaf.html Web Document: http://viaf.org/viaf/108389263/viaf.xml Web Document: http://viaf.org/viaf/108389263/marc21.html Web Document: http://viaf.org/viaf/108389263/marc21.xml Web Document: http://viaf.org/viaf/108389263/unimarc.html Web Document: http://viaf.org/viaf/108389263/unimarc.xml A more common example would be the potential for desktop and mobile versions of HTML, both negotiable from the same Generic Document URI: Generic Document: http://example.org/person/alice/ Web Document: http://example.org/person/alice/default.html Web Document: http://example.org/person/alice/mobile.html > > Definition of the scheme concept > > > > http://education.data.gov.uk/ontology/education/#School > > The URI looks very strange. Obviously it is valid to have a > # immediately following a / but it still looks very strange. It's the functionality that's important here. As above, the trailing slash indicates a Generic Document. This means the URI can be used to support delivery of HTML and RDF representations: Generic Document: http://education.data.gov.uk/ontology/education/ Web Document: http://education.data.gov.uk/ontology/education/about.html Web Document: http://education.data.gov.uk/ontology/education/about.owl In this pattern, the generic resource is intended to be an OWL ontology. By hanging the '#' off the generic document, the URI "works" as an owl:Class identifier AND as an HTML anchor. This allows the OWL URIs to be self-documenting. Here are two live example from VIAF: <http://viaf.org/ontology/1.1/#Heading> a owl:Class . <http://viaf.org/ontology/1.1/#hasEstablishedForm> a owl:ObjectProperty . > And I don't see why ontology/education wouldn't be the > name of the ontology, with education.html being the human > readable documentation, and education.rdf being the machine > readable, and education#School being an identified fragment > in those docs. I want to keep the URI patterns and features for the model and meta-model as similar as possible. "Ontology" should be treated as a class just like "Road" with multiple content-negotiable representations. There is a subtle difference that needs to be acknowledged, though. Unlike an individual "Road", every individual owl:Ontology IS a Web Document. If this is your perspective, then it would make sense to have the "real world" URI return 302 (Found) instead of 303 (See Other). IMO, a "real world" resource interpretation COULD be justified if you're like me and believe that UML and OWL are different ways to represent an abstract "model". BTW, the CTOC document talks about "URI sets" "for 'Things' such as schools, roads, legislation, locations, projects, events, and so on." IMO, these should be modeled as OWL classes. The CTOC document also talks about an "organization of URI sets into 'sectors (e.g. education, transport, health)..." I think these should be modeled as ontologies. I mentioned some reservations about education.data.gov.uk as a domain name, and the token redundancy in examples like http://education.data.gov.uk/ontology/education/ starts to illustrate why. Here's some URIs refactored based on my interpretation of sectors as ontologies: <http://data.gov.uk/ontology/education/> a owl:Ontology . <http://data.gov.uk/ontology/education/#School> a owl:Class . <http://data.gov.uk/school/123> a <http://data.gov.uk/ontology/education/#School> . <http://data.gov.uk/ontology/transport/> a owl:Ontology . <http://data.gov.uk/ontology/transport/#Road> a owl:Class . <http://data.gov.uk/road/M5> a <http://data.gov.uk/ontology/transport/#Road> . etc. If the UK is anything like OCLC, information and management of individuals in such classes is scattered far and wide and it's too idealistic to think they need to be identified and described in a new and improved silo. Maybe someday. If the scattered systems all use the same ontologies, though, this distributed information can be published with some consistency and systematically linked instead. This may be happening behind the scenes, but I'd be surprised. > > List of scheme identifiers > > > > http://education.data.gov.uk/school/ > > > > Set > > > > http://education.data.gov.uk/school > > Again, this is a very big semantic difference for the presence > or absence of a / to signal. Browser users will never notice. They're used to web servers adding a slash. There are two resources related 1-to-1 here and neither one can be avoided in Linked Data. If the difference with or without the slash is too subtle, then I would argue that shunting them off to a top-level path segment named "set" is too heavy handed. This gets back to my concern about the idealism of a new and improved silo. > The way most people would > understand a trailing / is that it implies the string "index". > I realise this isn't RDF semantics but it is the behaviour > that everyone who has ever done any web development will > expect. I think there is good precedent for delivering default.html or index.html depending on the circumstances. I haven't looked lately, but Tomcat and Apache used to come preconfigured for both. (I don't mean to suggest that Linked Data boils down to an Apache Web Server, of course.) My pattern has two Generic Resources that both end with a trailing slash. It would be sensible to configure index.html as a default HTML representation for the "Set" and default.html as the default HTML representation for a "Document". > So why not school/schemes and school/all or something? > (along with schemes.html, schemes.rdf, all.html, all.rdf etc)? We need to be careful about semantic collisions in the URI hierarchy. The CTOC and my patterns are consistent in the {concept}/{reference} pattern (what I would call {class-name}/{instance-name}). Reserving tokens from the set of possible instance tokens is problematic which is why the CTOC and my patterns avoid it. > > Also note that their URI pattern recognition for "(Web Document) > > Representation" depends on the trailing path segment starting with > the > > letters "doc.". This is a serious limitation, IMO, caused by their > > willingness to stack concept/reference pairs in their URI. This > > limitation could be avoided by coining a formulaic or opaque token > for > > the individual instead. (Roads and junctions have a nasty habit of > > changing "names" over time, so maybe opaque tokens would be better in > > these cases.) > > This is clearly not a problem that it unique to DGU. The > problem with opaque identifiers is they don't make sense to > humans. > > http://ckan.net/package/statistics-data-gov-uk > > is a lot better than > > http://ckan.net/package/b37a8465-e94f-4c84-95b9-dc3c2b2e1066 > > but the former as you rightly point out may change. I'm arguing for a URI pattern, so I think that transparent vs. opaque needs to be considered class-by-class in the context of use cases. I like transparent tokens if they can be mapped to an ontology or some other explainable principle. UUIDs are OK if you have to pull an instance identifier out of thin air, but in general I would prefer sequential numbers if nothing better will do. People often need to type these things in my hand. > > Their stacked (Real world) Identifier: > > http://education.data.gov.uk/id/road/M5/junction/24 > > > > Formulaic alternative: http://education.data.gov.uk/junction/M5-24 > > (s/education/transport/) > > I agree your alternative here is more succinct and better for that > reason, but I'm not sure it solves the opaque and unreadbale vs. > plastic and memorable problem. I think hackability is important, which is why I think it's important for these "URI Types" be structured hierarchically based on generalized principles rather than scattered across different top-level path segments. I wouldn't be offended if somebody stepped in and overrode a token here or there, but IMO this should be the exception rather than the rule. We have too much real work to do. ;-) Jeff > As I said, I think our approaches are very similar (modulo the > bit about the trailing /). > > Cheers, > -w
Received on Saturday, 13 November 2010 00:08:23 UTC