- From: Bernard Vatant <bernard.vatant@mondeca.com>
- Date: Mon, 09 Jul 2007 10:17:40 +0200
- To: Linking Open Data <linking-open-data@simile.mit.edu>
- Cc: semantic-web@w3.org
Hi Chris > Here is the problem statement together with an example: Within the Linking > Open Data community project [2] different data sources (URI owners) publish > information about Tim Berners-Lee ... There are strong implicit underlying assumptions here. Expliciting them would maybe help to answer your questions. 1. Uniqueness of the thing/subject you speak about. You assume there is one Tim Berners-Lee. And this includes, but goes beyond simple homonymy issues. Maybe one would like to make distinct "Tim-Berners-Lee-the-private-person" and "Tim-Berners-Lee-the-public-person". So if uniqueness of Tim Berners-Lee is taken for granted, you should be explicit about it. 2. To put together the following URIs, you have set rules, or whatever heuristics, to discover that two URIs are "aliases" of the same non-information resource. And seems to me that those rules/heuristics should be exposed explicitly. We are in the Linking Open Data process. The data are open, so should be the rules used to link them. Open means the rules are explicit and exposed, so that anyone can reproduce their behaviour, and accept or not to play by those rules. The completely opposite process is e.g., Google News, where resources are gathered and displayed as being "related to the same event", without any explicit statement about how this event is identified (and let alone selected to appear or not), and how a resource is considered to be "about this event". Of course, Google does not expose its smart algorithms, but at least it's clear that they exist and are implemented somehow. When you apply such rules to structured data, they could/should be expressed formally in whatever relevant language, e.g., as SPARQL CONSTRUCT queries if all data are RDF. The clauses under which you consider you can safely declare a:foo owl:sameAs b:bar certainly rely on elements of descriptions of a:foo and b:bar, like e.g., equality of type (Person) + first name + given name + birth date + birth place. Elements which actually can be present in the two descriptions using the same properties or different ones but that your heuristics assume to be equivalent. When we set up for Geonames the owl:sameAs assertions between Geonames URIs and INSEE URIs for administrative entities in France, the heuristic was based on such matching of typing properties on both sides (INSEE Class Region <=> Geonames fcode ADM1, INSEE Class Departement = Geonames fcode ADM2 etc) then matching of names (including dealing with case and special characters issues), and resolution of homonymy cases based on administrative hierarchy. Granted, such rules are no more explicited on Geonames than the rules used to match the aliases of Tim Berners-Lee on DBpedia. But I think they could and should be in both cases. > using different HTTP URIs: > > 1. DBpedia: http://dbpedia.org/resource/Tim_Berners-Lee > 2. Hannover DBLP Server: > http://dblp.l3s.de/d2r/resource/authors/Tim_Berners-Lee > 3. Berlin DBLP Server: > http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007 > 4. RDF Book Mashup: > http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee > > ... > > 5. Tim also publishes a FOAF profile in which he assigns the URI > http://www.w3.org/People/Berners-Lee/card#i to himself. > > Question 1: According to the terminology of the Architecture of the WWW > document [4] are all these URIs aliases for the same non-information > resource (our current view) or are they referring to different resources? > As said above, it's up to the publisher of the "owl:sameAs" assertions, to explicit the rules. If you consider those URIs to be aliases of the same resource, be bold :-) , say so, but say why. > Does the TAG finding "On Linking Alternative Representations To Enable > Discovery And Publishing " [5] about generic and specific resources apply > here, meaning that the URIs 1,2,3,5 refer to different specific > non-information resources that are related to one generic non-information > resource? > > Question 2: When the URIs are dreferenced they provide quite different > information about Tim, which reflects the knowledge and the opinion of the > specific URI owner about him. Within our tutorial we need to talk about this > information and therefore need a term to refer to a concept that can be > described as "information provided by a specific URI owner about a > non-information resource", for example Tim. Depending on the answer to > question 1, what would be the correct Web Architecture term to refer to this > concept? Or is such a term missing? > Is not such information called a "Description", with a "D" like in RDF? Or do I miss something more subtler? > Question 3: Depending on the answer to question 1, is it correct to use > owl:sameAs [6] to state that http://www.w3.org/People/Berners-Lee/card#i and > http://dbpedia.org/resource/Tim_Berners-Lee refer to the same thing as it is > done in Tim's profile. > See above ... -- *Bernard Vatant *Knowledge Engineering ---------------------------------------------------- *Mondeca** *3, cité Nollez 75018 Paris France Web: www.mondeca.com <http://www.mondeca.com> ---------------------------------------------------- Tel: +33 (0) 871 488 459 Mail: bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com> Blog: Leçons de Choses <http://mondeca.wordpress.com/>
Received on Monday, 9 July 2007 08:17:56 UTC