Re: A VoCamp Galway 2008 success story from Hugh Glaser on 2008-12-04 (public-lod@w3.org from December 2008)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Thu, 4 Dec 2008 01:17:20 +0000
To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
CC: "public-lod@w3.org" <public-lod@w3.org>, Les Carr <lac@ecs.soton.ac.uk>
Message-ID: <C55CE020.282CC%hg@ecs.soton.ac.uk>
Hi.
Good stuff.
And yes, the rkbexplorer world is an exception, as there is an
infrastructure that is based around such external link sets (as you call
them) for doing the linkage. And in fact this sits uneasily in the current
LOD world; so for example rkbexplorer is only one circle in the LOD picture,
despite being possibly the most heavily interlinked set of KBs around.

The points you make about the metadata of the link sets are good. However, I
am not sure that they are any different from metadata about any store. Link
sets are knowledge bases that are not so different from any others; we need
to solve the problems for all KBs (and voiD is heading that way I hope).

Despite my current commitment to the ability to have separate link sets, I
feel the need to raise a word of caution.
I believe that it is the right thing to do, for many reasons that you argue,
but similar arguments were put forward long ago about not embedding links in
web pages, for similar very good reasons (Eg [1] and [2]).
Then the mass of web pages just swamped the system, and the norm is to embed
URLs. I happen to think that the LD world is qualitatively different, as it
is essentially as much about linkage as data.
Or is it the case that the vast majority of linkage can happen in the
associated dataset, and people who worry about the issues you describe are
worrying about a tiny fraction that will become essentially irrelevant, and
would require a more complex architecture?
Those who do not learn from history are condemned to repeat it.

Best
Hugh


[1] Carr, L. A., Hall, W., Davis, H. C., DeRoure, D. C. and Hollom, R.
(1994) The Microcosm Link Service and its Application to the World Wide Web.
In: Proceedings of the First WWW Conference, Geneva.
http://eprints.ecs.soton.ac.uk/860/
[2] Carr, L. A., DeRoure, D. C., Hall, W. and Hill, G. J. (1995) The
Distributed Link Service: A Tool for Publishers, Authors and Readers. In:
Fourth International World Wide Web Conference: The Web Revolution, (Boston,
Massachusetts, USA).
http://eprints.ecs.soton.ac.uk/861/

On 03/12/2008 14:59, "Ted Thibodeau Jr" <tthibodeau@openlinksw.com> wrote:

>
>
> * On Dec 1, 2008, at 09:31 AM, Richard Cyganiak wrote:
>> We chose the current model (ds1 -> containsLinks -> ls -> target ->
>> ds2) because we want to record which dataset contains the links. We
>> have some use cases that require this. Your proposal (ds1 <- target
>> <- ls -> target -> ds2) doesn't capture that bit of information.
>
> It seems to me that François' proposal *does* capture that bit of
> information, because the data set which contains the links is distinct
> from both ds1 and ds2 -- it is an invisible and enclosing *ds3*.
>
>
>> Note that *all* the links in the LOD cloud are published as part of
>> one of the datasets.
>
> This seems to me to be an error of early practice.
>
> Consider -- I have a data set, ds1, and I *think* that my entities
> are owl:sameAs entities in *your* data set, ds2.  So I create a lot
> of owl:sameAs triples.  But I'm wrong.
>
> How do you easily and cheaply exclude those triples from your queries,
> when the *rest* of the data in my data set is valid and useful?
>
> Consider the next step in the sequence -- *you* have a bunch of
> owl:sameAs triples in *your* data set, pointing to entities in ds3.
> *Your* owl:sameAs statements are correct -- but now ds1 entities are
> incorrectly inferred to be owl:sameAs ds3 entities.  And so on.
>
> This is just as troublesome -- if not more so -- in ontology mapping
> as in instance data mapping.
>
> It seems clear to me that interlink data sets (or "link sets" in what
> is becoming common parlance) should be entirely distinct from instance
> data sets (or "data sets" in now-common parlance).
>
>
>> I'm also not sure if there is a clear understanding about how to
>> publish linksets independently from the datasets on the Web. I don't
>> see it being done in practice.
>
> Surprisingly enough, we're still in early days of doing such things --
> and the lack of implementation is not an argument in either direction
> about validity of such practice.
>
> How do you publish a link set independently of a data set?
>
> You create a new data set, which is comprised entirely of link
> statements.  Best case, a distinct link set would be created for each
> ds1-to-ds2 pairing, but it might be sufficient to create ted's-ds1-to-
> DBpediaLOD (which could then be ignored when/if a more accurate joe's-
> ds1-to-DBpediaLOD is released).
>
>
>> François, can you point us to some examples of linksets that are
>> published independently from any of the linked datasets?
>
> There may well be none, at this moment.  However, I say again, that
> is not evidence of whether there *should* be any.
>
>
>> Also, can you present us with your use case that requires exchanging
>> descriptions of such linksets? If there is enough interest, we will
>> consider a modelling that can be used for both scenarios.
>
>
> See above.  More details below...
>
> I publish my link set (ds3, also known as ls1) today, based on my
> incorrect understanding of ds2's entities relative to ds1's entities.
>
> Tomorrow, I get hit by a bus, and cannot change my link set based on
> the explanations sent to me by both data set creators which would have
> corrected my understanding.
>
> Next week, someone publishes a new link set (ds4 or ls2), with correct
> linkages (say, rather than owl:sameAs, valid owl:subPropertyOf).
>
> When someone wants to work with these two data sets (ds1 and ds2),
> how do they know which link set (ds3 or ds4) is more valid?
>
> One hopes, voiD allows description of the new link set, which can say
> "ds4 was created after ds3, to correct incorrect assertions made in
> ds3" or similar.
>
> Now ... the creator of ds4 might have their own misunderstandings.
> Might be creating a link set without consultation with either ds1 or
> ds2 creators -- and even without knowing about ds3.  Perhaps ds3 *is*
> correct, and it's the newer ds4 which is incorrect.
>
> Perhaps I know the creators of ds3 and ds4, and I know that the latter
> tends to go off half-cocked, while the former carefully researches and
> considers what they publish.  Perhaps I want to trust ds3 -- without
> regard for whatever anyone else may say about it -- and disregard ds4.
>
> Does voiD allow for this?
>
> Apparently not if the links which comprise ds3 or ds4 are included in
> ds1 or ds2 -- and therein lies a problem.
>
> Be seeing you,
>
> Ted
>
>
>
> --
> A: Yes.                      http://www.guckes.net/faq/attribution.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
>
> Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>                                   http://www.openlinksw.com/weblogs/uda/
> OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
>                                 http://www.openlinksw.com/blog/~kidehen/
>      Universal Data Access and Virtual Database Technology Providers
>
>
>
>
>
Received on Thursday, 4 December 2008 01:18:19 UTC