Re: URI scheme advice for an RDF schema

Hi Jon,

Very useful, thanks.

Comments inline.

John

> On July 23, 2014 at 7:05 PM Jon Phipps <jphipps@madcreek.com> wrote:
>
>
> Hi John,
>
> Our 'RDF Schema' approach, based on many years of multilingual
> vocabulary development and exemplified by the RDA Vocabularies:
> http://www.rdaregistry.info/
> …might be helpful. Some more inline...
>
> On 23 Jul 2014, at 7:22, john.walker wrote:
>
> > Hi There,
> > 
> > There is plenty of advice/help out there regarding URI schemes for
> > instance
> > data, for example the EC study on persistent URIs [1].
> > 
> > I was wondering if there are any similar studies or guidelines about
> > URI schemes
> > for RDF schema (using this as catch all term for vocabulary, data
> > dictionary,
> > schema, ontology).
> > 
> > The particular use case I have is a ISO 13584 compliant data
> > dictionary with a
> > few hundred classes and over 1000 properties which I'd like to convert
> > to RDF.
> > Everything in the dictionary (including the dictionary itself) is
> > identified
> > with an IRDI [2].
> > 
> > Points to consider:
> >
> > 1. (I'll get this one out of the way first :) ) Hash vs. slash URIs:
> > What's the
> > latest advice/pros/cons? Currently I am leaning towards slash URIs so
> > the user
> > is not forced to download the entire schema in one file (of course we
> > can always
> > provide a dump for those who want it). Any best practices here?
> >
>
> I can't say that it's a best practice, but we strongly prefer slash URIs
> even though it presents some management challenges wrt content
> negotiation.
>
> > 2. URN or HTTP URI: A URN scheme for IRDIs has previously been mooted,
> > but seems
> > a distinct lack of progress. Following linked data principles I was
> > planning to
> > use HTTP URIs instead. Would there be any advantage to use URNs
> > instead?
> >
>
> The main disadvantage to a non-HTTP URN is the need to maintain some
> form of URN resolution service over time, assuming that you want/need
> the URNs to resolve. It also limits public/global reuse and mapping of
> your vocabularies, which hopefully isn't desirable.
>
> > 3. Human-readable URIs: Many widely used schema (e.g. Schema.org,
> > FOAF) have a
> > human-readable component in the URI, typically a URI-friendly version
> > of the
> > label. I can see this makes things a lot easier for human consumers
> > when reading
> > raw Turtle or writing a SPARQL query. However the labels are subject
> > to change
> > over time, are in multiple languages and are not unique. It is simple
> > to define
> > a mapping from IRDI to URI, but this does not give a meaningful URI
> > (e.g.
> > http://example.com/myDictionary/c_abc123), but would guarantee
> > uniqueness and
> > persistence. Given the opacity axiom [3] does this really matter? I
> > could
> > imagine that one could allow the editor of the dictionary to define
> > slugs that
> > would be to build the URI rather than generating from the IRDI. These
> > could be
> > optional and you might only define such a slug for the most commonly
> > used terms.
> > Alternatively one could define these as aliases with additional
> > statements
> > defining some equivalence links (perhaps using owl:sameAs,
> > owl:equivalentClass
> > and owl:equivalentProperty).
> >
> > <http://example.com/myDictionary/c_abc123> owl:equivalentClass
> > <http://example.com/myDictionary/Person> .
> >
> > Has anyone ever tried such an approach?
>
> The RDA developers are using this approach:
> http://www.rdaregistry.info/rgFAQ.html
> and
> http://www.slideshare.net/jonphipps1/ala-presentation-36888593, slides
> 11-12
>
> We've coined a reg:lexicalAlias (intended to be a more semantically
> specific subproperty of owl:sameAs) attribute to describe the
> relationship between a mutable, language-specific, label-based URI and a
> canonical, language-independent, 'opaque' URI. We're returning an HTTP
> 308 header (newly redefined) when a lexical URI is resolved to a
> canonical URI.
> See http://tools.ietf.org/html/rfc7238
>

OK.

As we also plan to publish data about instances using the schema, the next
question is which URI to use there, the canonical URI or the "vanity" URI... or
both.

In some ways it doesn't matter, but could lead to substantial bloat of the data
if we repeat ourselves.

My experience is that in many cases the identifier becomes the name given enough
time.

> >
> > 4. Versioning: The IRDI includes a version identifier where there are
> > clearly
> > defined rules about what type of change can be done within a version
> > (e.g.
> > editorial changes), what can be done as a version change (e.g.
> > upward-compatible
> > change) and what requires a new identifier (breaking change). I was
> > thinking to
> > exclude this version identifier from the URI, but perhaps (if needed)
> > expose the
> > different versions/states of the resource using Memento [4]. Any
> > experiences
> > with using such an approach?
> >
>
> We prefer to have URI resolution always be to the most current version
> and aren't planning to offer versioned resolution anytime soon. That
> said, we recognize that public linked data that absolutely depends on
> stable semantics defined by a specific version of the vocabularies will
> need to be able to dynamically reference that specific version, and
> probably as part of the URI -- it's unlikely (although possible) that
> linked-data-based systems will be able to effectively utilize any of the
> other non-URI-based versioning methods. When we do implement support for
> specific version declarations it may be something like Memento, but it's
> more likely to be something like:
> https://www.npmjs.org/doc/package.json.html#version
> or
> https://getcomposer.org/doc/01-basic-usage.md#package-versions
> or
> http://guides.rubygems.org/patterns/#declaring-dependencies
>

So similar to declaring dependencies using POM files in Maven.

> As an interim alternative, we make each version of the vocabularies
> available as a download:
> http://www.rdaregistry.info/rgAbout/versions.html
> …and this can be loaded into a triple store along with its dependent
> linked data, eliminating the need for dynamic resolution, although
> there's currently no broadly accepted best practice around defining the
> requirement for a specific vocabulary version, and it's download
> location, that I'm aware of.
>
> > 5. Serving representations: Maybe this is a moot point, but I would
> > consider the
> > 'things' described in the dictionary to be abstract entities and, as
> > such, to
> > give a 303 response if used with slash URIs. The response would then
> > include a
> > redirect to the information resource that would use conneg to serve
> > the
> > different representations/states of that resource. However I do not
> > see this
> > practice widely used for other RDF schemas. Any reason why?
>
> Not that I'm personally aware of. It's the practice we generally follow:
>
> $ curl -I http://rdaregistry.info/Elements/a/P50026
> HTTP/1.1 303 See Other
> Location: http://rdaregistry.info/Elements/a/P50026.n3
> HTTP/1.1 303 See Other
> Location: http://rdaregistry.info/Elements/a.n3
> n3 is the default but the above URI redirects again to the full
> vocabulary because
> at the moment only jsonld representations serve individual elements
> (server issues)
>
> $ curl -I -H "Accept:text/html"
> http://rdaregistry.info/Elements/a/P50026
> HTTP/1.1 303 See Other
> Location: http://www.rdaregistry.info/Elements/a/#P50026
> (note that the HTML document is a single resource with IDs for each
> vocabulary element)
>
> $ curl -I -H "Accept:application/ld+json"
> http://rdaregistry.info/Elements/a/P50026
> HTTP/1.1 303 See Other
> Location: http://rdaregistry.info/Elements/a/P50026.jsonld
>
> Hope this helps some,
> Jon Phipps
> http://metadataregistry.org/
> http://managemetadata.com/
>
> > 
> > [1] http://philarcher.org/diary/2013/uripersistence/
> > [2] http://wiki.eclass.eu/wiki/IRDI
> > [3] http://www.w3.org/DesignIssues/Axioms.html#opaque
> > [4] http://mementoweb.org/
> > 
> > Regards,
> >
> > John Walker

Received on Wednesday, 23 July 2014 21:34:16 UTC