Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval from Peter Ansell on 2009-08-12 (public-lod@w3.org from August 2009)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Wed, 12 Aug 2009 11:29:27 +1000
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: "public-lod@w3.org" <public-lod@w3.org>, "dbpedia-discussion@lists.sourceforge.net" <dbpedia-discussion@lists.sourceforge.net>
Message-ID: <a1be7e0e0908111829q236fb209gffb1821a744c8d28@mail.gmail.com>
2009/8/12 Hugh Glaser <hg@ecs.soton.ac.uk>:
> Dear Peter,
> Thank you for your comments, which I think raise the main issues.
>
> On 12/08/2009 01:11, "Peter Ansell" <ansell.peter@gmail.com> wrote:
>
>> 2009/8/12 Hugh Glaser <hg@ecs.soton.ac.uk>:
>>> Are you saying that the only way to access Linked Data is via SPARQL?
>>
>> That is going a bit far, but in the end if you want to allow people to
>> extend the model it has to be done using SPARQL. If the extension is
>> taken well by users then it could be included in what is resolved for
>> the URI but that doesn't mean it is not Linked Data up until the point
>> it is included.
> My view is that if you need to extend (I would say "step outside") the
> model, then something is broken. Or at least it is broken until the model
> includes the extension, as you suggest. So we need to work out how to
> include such extensions in the model, if such a thing is desirable.

By extend I meant extend the information pool, and not necessarily
extend the protocol, which should still work with some suggestions I
make below.

I definitely think extensions are useful, although they may need to
appear with different URI's to the accepted set of information pieces
that have been published and recognised as the minimal set by the
original author.

> Did I go too far?
> I'm not sure. I have a sense that the suggested solution to any problem I
> raise is "Oh don't worry, just use a Named Graph".
> But "How to Publish Linked Data on the Web"
> (http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/),
> which really is an excellent description of what I think should be
> happening, makes no real mention of the idea that a SPARQL endpoint might be
> associated with Linked Data.
> In fact, it says that if you have a SPARQL endpoint (for example using D2R),
> you might use Pubby "as a Linked Data interface in front of your SPARQL
> endpoint."
> And pubby says:
> "Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server."
> I infer from this that SPARQL endpoints are optional extras when publishing
> Linked Data. So any solutions to problems must work simply by resolving
> URIs.

I have a very similar approach to this with the Bio2RDF server, but I
am using multiple SPARQL endpoints to provide resolution for URI's.

I use the ability to get information by either URI resolution or
SPARQL endpoints to create extended versions. SPARQL endpoints should
be optional, but encouraged IMO, so people can pick and choose without
having to transfer everything across the wire every time they access
the information if they want to optimise their applications.

>>
>> I for one loved the recent addition of the Page Links set in a
>> separate Named Graph, and I don't see how this is different.
> That's great.
> I'd be interested to know how you make use of them?
> We find it very hard to make use of Named Graph data.
> All we start with is a URI for a NIR; so all we can do is resolve it.
> We cache the resulting RDF and then use it for analysis and fresnel
> rendering.
> It is pretty hard to build in anything that takes any notice of Named Graphs
> at arbitrary Linked Data sites. We would need to be able to find the SPARQL
> endpoint from a URI so that we can do the DESCRIBE, and then also be able to
> specify a Named Graph to go with it. In fact, how would I do that from
> http://dbpedia.org/resource/London ?

In short it is difficult, but not impossible if you are aware that
there is some extra information that you want to include for your
users that doesn't come from the URI resolution.

I have been working on a system that can take notice of Named Graphs,
but it doesn't work with arbitrary URI's as it requires the URI's to
be normalised to some scheme that the software recognises. For
instance, the normalised form of http://dbpedia.org/resource/London in
my system is "http://domain.name/dbpedia:London", with the domain.name
being specified by the user. By design it doesn't fit with the notion
that URI's are opaque and shouldn't be modified, but it is hard to
deny that it works. Resolving http://qut.bio2rdf.org/dbpedia:London
for instance will include the PageLinks set along with any extensions
that Matthias Samwald has included to link OBO to DBpedia (although in
this case it is unlikely any would exist in this set) and some links
that the DrugBank LODD project provide using their dataset in relation
to DBpedia resources. If you want to know exactly which datasets would
be resolved there is a URI for that...
http://qut.bio2rdf.org/queryplan/dbpedia:London

In some ways it isn't really typical Linked Data, but it allows the
distributed extensions that I think people really want access to in
some cases.

> I'm afraid I find Linked Data (by resolving URIs) really beautiful, and
> think I can understand how I and others might use it. So when it is
> suggested that the way to solve an issue with how it works is to step
> outside the RDFramework, I think it needs to be challenged or brought into
> the Framework.

One way you could do it could be by including links to the extended
versions in the original URI For example:

<http://dbpedia.org/resource/London>
<http://purl.org/ontology/lodextensions#hasExtendedVersion>
<http://bio2rdf.org/dbpedia:London>

The hasExtendedVersion could be derived from seeAlso. If you extend
the term then it will only be applicable to user agents who are either
doing reasoning for seeAlso related properties and know they want
seeAlso references included, or know they want an actual extended
version which might not all be derived from the original source (in
this case the original source is just the pieces of information
available in the background in the named graph "http://dbpedia.org" in
the http://dbpedia.org/sparql endpoint).

I don't have it perfect I admit. There are still some qualms from the
other Bio2RDF guys about my including too many extensions and slowing
down the process of URI resolution, but the possibility of extended
versions is there for people who want to experiment with them. (This
has led to a difference in the information that the two Bio2RDF
mirrors provide, which is definitely broken, but should be fixed in
future)

> Hope that helps to show where I come from.

It does put it pretty well. I just fear that the ability to extend
won't be accommodated in what is a really cool, but contextually
limited, single URI for every piece of information method without
exploring the other possibilities for modularisation and extension by
third parties such as extra RDF statements being inserted that keep
LOD active but can make it extensible as well.

Maybe there is a reason for saying that the LOD model is broken with
respect to context, as URI's have no context until they are resolved,
and if you resolve the information LOD currently expects that every
known piece of information is coming back, although there are already
extensions available that can be used to make more useful documents in
some contexts (such as PageLinks, the subject of this thread,
DrugBank, OBO2DBpedia mappings etc.)

Cheers,

Peter
Received on Wednesday, 12 August 2009 01:30:09 UTC