Re: Can we lower the LD entry cost please (part 1)? from Yves Raimond on 2009-02-07 (public-lod@w3.org from February 2009)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Sat, 7 Feb 2009 15:18:18 +0000
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <82593ac00902070718r7873122alf657e81298027d3c@mail.gmail.com>
Hello!

On Sat, Feb 7, 2009 at 2:31 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
> Hi Yves,
> Thank you for the response.
> Yes, you are right - when we have taken over the world, there will be powerful systems to help us do this, and I can be a happy little data provider, while others provide my search and linkage.
> But when we try to tell people that we have this wonderful resource called Musicbrainz, which is part of the amazing LOD cloud, (I think I saw evidence of such a talk recently), what experience do the excited listeners get when they go away and try to join?
> After quite a lot of work they will have concluded, at best, that this is system infrastructure for gurus, and so they can do a bit of browsing a bit like wikipedia but not as pleasant, and it is not relevant to them.
> I have just failed to find Telemann on Musicbrainz, I'm afraid, (musicbrainz.org or Sindice) although I only spent a few minutes - but why so hard?

Just typing "telemann musicbrainz" in Google led me directly to:
http://musicbrainz.org/artist/8f831f50-e409-47c3-8598-71a61bc8cfb3
I don't consider that as particularly hard!


> Perhaps all I wanted to do was use his URI to identify him unambiguously, using a little tool that lets me say I (dis)like his music, but it is just so hard.
> OK, maybe my sort of use case is not what the community cares about - so be it, but I think I should be able to do it, and do it now.
> These sort of links are really valuable - there might not be so many of them, but they can carry a lot of information.
> I can tell you we have over 1M links to the dblp world from rkbexplorer, but since the data is substantially the same, I don't consider them as valuable.
> On the other hand, we have 174 links from nsf to cordis and 183 the other way - now that is value. How did we create them? By a lot of work, and the ability to search.
>
> So I agree in principle with your view of separating out these things.
> But I don't think we have the time, and while we fail to deliver this, possible recruits are turning away.
> Is all this publishing work to founder because the Sindice team is not big enough to cope, or no-one seems to be building the linkage systems, all because the data providers do not want to offer a simple search facility?
>

On a side-note, there are at least three interlinkage systems I know
of (Georgi's, LinkedMDB's and mine). Most dataset provide a SPARQL
end-point allowing to make such specific-dataset-to-specific dataset
linkage easy enough. Having a SPARQL interface makes interlinking
*much* more reliable, because you know exactly what happens. If you
provide me with a simple text search, I won't have any clue how your
inner searching process works (are you retrieving all resources which
label matches the search term? are you building an index on
neighboring literals?), and I won't be able to draw satisfying
interlinking conclusion.

Best,
y

> Best
> Hugh
>
> By the way, I am not suggesting that any identifiers such as GUIDs or PIDs should be read by humans - more the opposite. My agent should be able to find them easily and then ask me if that was what I meant, using words.
>
> On 07/02/2009 13:39, "Yves Raimond" <yves.raimond@gmail.com> wrote:
>
>
> I think this is a really dangerous idea. Most "web-scale" identifiers,
> eg Musicbrainz GUIDs and BBC PIDs are not human readable (for a lot of
> reasons, and mainly because human-readable identifiers are not unique
> enough!!), but both provide really easy-to-use lookup service.
> Such lookups, for other sites, can be provided by semantic web search
> engines. It is exactly as in the document web: web identifiers are
> mostly opaque, but search engines are here to provide the help needed.
>
> So my proposal is: let's not confuse everything. Some people's job is
> to make datasets available out there and as linked as possible to
> others. Some other people make lookup services (eg Sindice), and I
> think this separation of concerns works quite well.
>
> Best,
> y
>
>
>
Received on Saturday, 7 February 2009 15:18:52 UTC