Re: National Identification Number URIs ( NIN URIs ) from Peter Ansell on 2010-03-09 (public-lod@w3.org from March 2010)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Wed, 10 Mar 2010 08:37:23 +1000
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: Bernhard Schandl <bernhard.schandl@univie.ac.at>, Kingsley Idehen <kidehen@openlinksw.com>, Aldo Bucchi <aldo.bucchi@gmail.com>, Linked Data community <public-lod@w3.org>
Message-ID: <a1be7e0e1003091437y55ec1783x5fe89df5081a5b75@mail.gmail.com>
Bernhard and Hugh,

On 9 March 2010 21:37, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
> I have found this a very interesting discussion, thinking about the Linked
> Data World at large as well as what others think - thanks.
> Sorry this moved away from the important discussion about how to identify
> people, both as a technical and a socio issue - my fault.

Although we are currently discussing using the ISBN example I think it
is still relevant to the thread as identifiers for people are possibly
better shared as URN's since to create standardised HTTP URI's might
imply that a particular organisation has taken over that persons
identity which may not fit so well with the privacy issues.

> On 09/03/2010 09:12, "Bernhard Schandl" <bernhard.schandl@univie.ac.at>
> wrote:
>
>> Peter,
>>
>>> It is a good thing that the subject URI is an HTTP URI available from
>>> your server but that is only the start of the story. The rest of the
>>> story needs other servers to give your data more context.
>>>
>>>>> In your example the fact that there
>>>>> is a link can only be figured out using some external service that
>>>>> knows about both data sources.
>>>>
>>>> Sure. Before I can add a link to any data set, I have to detect it using
>>>> some heuristics. Shared URN/DOI/... identifiers seem a valid approach for
>>>> this -- think of ISBN numbers.
>>>
>>> Sharing identifiers is a good idea, but it isn't Linked Data as yet...
>>
>> I'm talking of the *preconditions* for linking data, based on shared
>> identifiers. And once I have these identifiers, why not publish them alongside
>> the dereferenceable URIs.

I think this one is partly a precondition and partly a useful outcome
in semantic terms. You can get access to the item better if there are
third parties or generic query interfaces available as a precondition.
However, then you also portray your data in terms of the community
defined URN scheme, similar to putting your data in the context of an
ontology that is understood by many people.

> Being able to work out what a dereferenceable URI means is indeed a
> pre-condition for linking data, and also in the Linked Data, this is
> achieved by dereferencing and examining the RDF returned.
> And finding a URN, doi, isbn, mailto, etc. is a very good way of
> communicating that information.
> However, for me in the Linked Data world, such URIs are no more an
> *identifier* than "Hugh Glaser", or the title of a book, (or even the URL of
> one of my homepages) simply because the access mechanism is unclear, and
> even if I do try to look it up I am unlikely to get RDF (at least at
> present).
> They are more useful in general, of course, because they are less likely to
> be ambiguous, but it is only a matter of degree.
>>
>>>>> If your server was Linked Data and not
>>>>> just an HTTP URI based RDF database then it would link out using HTTP
>>>>> URI's and both servers could be directly explored without some
>>>>> external service.
>>>>
>>>> Once the link has been detected, I can of course add it to both data sets.
>>>> Well, the owner of the datasets can.
>>>
>>> This is Linked Data, when the dataset owners discover the mutual
>>> references and link out from their HTTP URI's to the other datasets
>>> HTTP URI's.
>>
>> Why only the dataset owners? A third party that is aware of both data sets is
>> enabled to discover these links, too.
> I agree entirely, although the dataset owner is in a prime position to seed
> the activity, and also may have other implicit knowledge that is useful to
> help to get the links right.

If other users identify the links they have to communicate their
knowledge back to the dataset owners to complete the Linked Data
circle, unless you include searching on indicies as part of the Linked
Data circle. If the third party publishes the information it may not
be Linked Data, as they may choose to make up a triple that doesn't
contain any URI's that resolve to their data... Resolving the
following triple on <http://thirdparty.com/mypage> is not Linked Data
IMO, but it could be if they make up a different structure to
represent the link.

<http://myserver.com/urn:isbn:1232-132132-1> <linksto>
<http://otherserver.com/urn:isbn:1232-132132-1> .

>>> It was enabled by sharing the property, and then having
>>> others discover it. Just sharing the URN property isn't Linked Data as
>>> people have no way of resolving the URN that is referenced to more
>>> information.
>>
>> Again, it's a precondition to link data.
>>
>>> It could also have been shared in another way using Inverse Functional
>>> Properties (IFP) so that the URN scheme need not have been created.
>>
>> The URN schema for ISBN already exists [1], and several others exist (e.g.,
>> SWIFT [2]), why should we throw them away?
>>
>> [1] <http://www.faqs.org/rfcs/rfc3187.html>
>> [2] <http://www.faqs.org/rfcs/rfc3615.html>
>>
>>> There is no automatic HTTP based way of knowing which datasets may
>>> have relevant links in either case,
>>
>> One could use indices to find other occurrences of the same URN. When they are
>> linked via owl:sameAs, the linking can be fully automatized.
>>
>>> so serving up the statements on
>>> your dataset is very useful for discovery, I wasn't meaning to say
>>> that was a bad thing. Just emphasising the full story for Linked Data.
>>
>> I got that :-)
>>
>> My point is simply that not *every* URI in a Linked Data context needs to be
>> dereferenceable. When there are established URN schemes in place (like it is
>> the case for ISBN numbers), why not reuse them instead of packing them in a
>> literal (is there a datatype for ISBN numbers?) and publish them to simplify
>> linking for others? This seems to make more sense to me than only relying on
>> URN-to-HTTPURI mappings, which I can still do, as long as I publish the
>> "original" identifier in its "native" URN form.

If the dataset owner doesn't actually publish statements using the URN
as the subject then it should be okay. In Linked Data the subject
should be the HTTP URI that is being resolved, and the link to the URN
should be in some other fashion such as owl:sameAs, so people can
rediscover the entire definition knowing only one triple or one Linked
Data URI.

Choosing to publish the URN as an RDF resource (as opposed to literal)
is possibly better, but I wouldn't say that publishing it as a string
with the appropriate predicate is necessarily locking it away. I have
had different views about this at different times, so both are
probably useful, even publishing both may be useful.

> I have a feeling that the issue here may be the same as how to represent the
> address of someone's pure html home page in RDF.
> It is a URL and hence a URI. But it is not dereferenceable to RDF.
> A purist might say that it is not a Linked Data URI (doesn't return RDF),
> and so should be a string, hopefully with a useful type on it.
> But for others it is a resource, and so can comfortably be a URI in RDF.
> And having it as a resource enables it to be used in a more convenient way
> for the sort of thing that we are discussing.

In my mind it gets messy if you do decide to publish data using
traditional web page URL's as resources in RDF. If you aren't able to
change the way the page is served to include a link tag or link
header, and you can't add RDFa to the page, then you are merely
linking to something as a resource just because it happens to match
the way the current data format works.

If in future Linked Data is produced using non-URI's (say a successor
to RDF comes along that doesn't use URI's) or the HTTP web moves away
from URL's for some reason, then there would be no difference to
people who publish links to the traditional web as plain or typed
literal strings, but others would suddenly find themselves in a
quandry... Just a thought, and not necessarily relevant as it is very
hypothetical.

> So dereferencing one of your Linked Data URIs will return some RDF that has
> resources (URIs) that are not dereferenceable to RDF.
> And these will be very helpful to people/agents who are trying to add
> linking to the world.

They are very helpful for aggregators. There hasn't been very much
discussion about how to actually find entry points into Linked Data,
so people generally defer to SPARQL endpoints (which may not always be
public or even implemented) or search engines. I think we could design
a better way to do entry level search than relying on search engines
to be up to date with information. Whatever the method is it should
support searches for URN's because although they are not Linked Data
they are very useful long term identifiers.

> Hopefully that is sufficiently closely related to your comments to make
> sense?
> And I am pleased to agree, although I might lean more to the purist side :-)
>
> By the way, in the original question, there seemed to be a suggestion which
> I guess I misunderstood, that an RDF store that effectively only published
> non-dereferenceable URIs, and accessed as a query service, was in some sense
> doing Linked Data. I would have found that very hard to agree with.
>
> Best
> Hugh
>>
>> Best
>> Bernhard
>>
>
>
Received on Tuesday, 9 March 2010 22:37:56 UTC