- From: Hugh Glaser <hg@ecs.soton.ac.uk>
- Date: Tue, 27 Aug 2013 16:14:18 +0000
- To: "<Csarasua@uni-koblenz.de> " <Csarasua@uni-koblenz.de>
- CC: Linked Data community <public-lod@w3.org>
Hi, Thanks. Just one comment, relating to the cities example you use. The paper you cite mentions cities and says: "For example, the city of Paris is referenced in a number of different Linked Data-sets: ranging from OpenCyc to the New York Times. In DBPedia, a Linked Data export of Wikipedia, these data-sets are connected by owl:sameAs. In particular, dbpedia:Paris is owl:sameAs as both the opencyc:CityOfParisFrance and opencyc:Paris DepartmentFrance, as OpenCyc distinguishes that “the department of Paris. Paris DepartmentFrance is a distinct geopolitical entity from CityOfParisFrance, despite the fact that both share the same territory, while Wikipedia does not make this distinction." So even cities (actually especially cities and other geo things) have significant challenges here. Geo-political v. geographic v. the geo-extent v. the nounSynset etc. And we haven't even mentioned temporal aspects. So I do worry about all this. If the dataset is simple enough that you can ignore the problems, then the question is if the exercise tells you anything useful. If the dataset is more complicated, for example having both geo-political and geographic and wanting to keep them separate, then it is also a question is if the exercise tells you anything useful! But if something is hard and challenging it is more reason to do it, I guess. Good luck. Hugh On 27 Aug 2013, at 16:57, <Csarasua@uni-koblenz.de> wrote: > Hi Hugh, > > >> Hi Cristina, >> Some interesting issues you raise. >> One of them is how people publish links (which enables your analysis). >> There are two ways this happens. >> 1) People add triples to their dataset that have an equivalence predicate > (owl:sameAs, skos:exactMatch, skos:closeMatch, etc.) >> 2) People use a "foreign" URI (very commonly a dbpedia URI), because > when turning > their data into RDF they have decided that the entity they are concerned > with is > the same as the dbpedia one. The second paragraph of Tom's message > describes such > a linkage, I think. >> I think these distinctions are behind the comments of Milorad, where he is > assuming the type (2) way. >> Either of these methods should be fodder for you, and you may well find > that the > type (2) way is used by a dataset that is useful to you. >> >> > I agree, it is important to distinguish between different types of links. > When I refer to interlinking I have in mind triples (s, p, o), > where "s" and "o" are resources from different data sets, and "p" is either a > property like owl:sameAs or a domain-specific property like foaf:knows. I > think this corresponds to what you specified in 1) and 2). I would like to > have both kinds of links in my evaluation (if possible). > >> It may be harder for you to process, as the linkage is not so explicit > because > there is no distinct URI for the resource in the database, different from the > "foreign" one. But any "foreign" URI is in fact a link. >> You will find that people have tended towards type (2) linkage because > they can > shy away from having lots of equivalence predicates in their datasets, not > least > because there was a time when RDF stores did not comfortably do owl:sameAs > inference, and so they do the linking at RDF conversion time, and use > "foreign" > URIs. >> >> Another interesting issue is more fundamental to your work. >> You seem to think that there must be a "gold standard or reference > interlinking" > for equivalence. >> As long-time readers of this list will have seen discussed many times > (!), it is > not a simple matter. >> >> It is a complex matter to have such a thing, which is a necessity for > you to do > your precision/recall statistics. >> At its most basic, for example, am I as a private citizen the same as me > as a > member of my University or me as a member of my company? >> The answer is, of course yes and no. >> Another field that has spent a lot of time on this is the FRBR world > (http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records). >> If I have a book of the Semantic Web, is it the same as your book of the > same name? >> Perhaps. What if it is a different (corrected) edition? An electronic > version? >> Certainly a library will usually consider each book a different thing, > but if you > are asking how many books the author has published, you want to treat all the > books as the same resource. >> > I understand the point, and I find it very interesting, indeed. I guess > that it might depend on the context where the data was created / will be > used. This reminds me of the paper about the analysis of identity links ( > http://www.w3.org/2009/12/rdf-ws/papers/ws21). However, I think that it is > possible to evaluate different interlinking techniques, establishing some > gold standard (e.g. the links between the cities of a data set describing > the population of European cities and a data set describing the cities as > tourist attractions), to be able to analyse the results in terms of > precision and recall, and say that one tool is able to certain things, > while the other not. > > Regarding the humans behind the manual definition or the reviewing > process of a reference interlinking, I would expect them to be > knowledgeable (i.e. domain experts should have been part of the process at > least). > >> >> So in asking for a "gold standard or reference interlinking", I think > you are > chasing a chimera. >> What you can do is choose datasets and then you will need to find out > what the > policies of the equivalence creators; and then you will need to build your > system > so that it implements the same policies. >> By the way, policies usually relate to the way in which the dataset will > be used, > rather than the wishes of the publisher of the data - there is no absolute > truth > in this. Some would argue there is never any equivalence: "One cannot step > once > (sic) into the same stream" (http://en.wikipedia.org/wiki/Cratylus) >> > Thanks for the suggestion. > >> It's great you have asked the question - convincing research in this > field is very > challenging! >> >> Best >> Hugh >> > Best, > Cristina >
Received on Tuesday, 27 August 2013 16:15:13 UTC