W3C home > Mailing lists > Public > semantic-web@w3.org > August 2015

Naming Conventions for URIs

From: Paul Houle <ontology2@gmail.com>
Date: Thu, 20 Aug 2015 11:36:29 -0400
Message-ID: <CAE__kdQnTKVtECJ9Bj=Xh7Gid7Kx4AMv=TBpybPStOUzyCK0KQ@mail.gmail.com>
To: "semantic-web@w3.org" <semantic-web@w3.org>, "Discussion list for the Wikidata project." <wikidata-l@lists.wikimedia.org>
Tell me if I am right or wrong about this.

If I am coining a URI for something that has an identifier in an outside
system is is straightforward to append the identifier (possibly modified a
little) to a prefix,  such as

http://dbpedia.org/resource/Stellarator

Then you can write

@prefix dbpedia: <http://dbpedia.org/resource/>

and then refer to the concept (in either Turtle or SPARQL) as
dbpedia:Stellarator.

I will take one step further than this and say that for pedagogical and
other coding situations,  the extra length of prefix declarations is an
additional cognitive load on top of all the other cognitive loads of
dealing with the system,  so in the name of concision you can do something
like

@base <http://dbpedia.org/resource/>
@prefix : <http://dbpedia.org/ontology/>

and then you can write :someProperty and <Stellarator>,  and your queries
are looking very simple.

The production for a QName  cannot begin with a number so it is not correct
to write something like

dbpedia:100

or expect to have the full URI squashed to that.  This kind of gotcha will
drive newbies nuts,  and the realization of RDF as a universal solvent
requires squashing many of them.

Another example is

isbn:9971-5-0210-0

If you look at the @base declaration above,  you see a way to get around
this,  because with the base above you can write

<100> which works just fine in the dbpedia case.

I like what Wikidata did with using fairly dense sequential integers for
the ids,  so a dbpedia resource URI looks like

http://www.wikidata.org/entity/Q4876286

which is always a QName,  so you can write

@base <http://www.wikidata.org/entity/>
@prefix wd: <http://www.wikidata.org/entity/>

and then you can write

wd:Q4876286
<Q4876286>

and it is all fine,  because (i) wikidata added the alpha prefix and (ii)
started at the beginning with it,  and (iii) made up a plausible
explanation for it is that way.  Freebase mids have the same property,  so
:BaseKB has it too

I think customers would expect to be able to give us

isbn:0884049582

and have it just work,  but because a number is never valid in the QName,
 you can encode the URI like this:

http://isbn.example.com/I0884049582

and then write

isbn:I0884049582
<I0884049582>

which is not too bad.  Note,  however,  if you want to write

<0884049582> you have to encode as

http://isbn.example.com/I0884049582

because,  at least with the Jena framework,  the same thing happens if you
write

@base <http://isbn.example.com/I>

or

@base <http://isbn.example.com/>

so you can't choose a representation which supports that mode of expression
and a :+prefix mode.

Now what bugs me is,  what to do in the case of something which "might or
might not be numeric".  What internal prefix would find good acceptability
for end users?


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   ontology2@gmail.com

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275
Received on Thursday, 20 August 2015 15:36:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:43:02 UTC