- From: Chris Richard <chris.richard@gmail.com>
- Date: Sat, 4 Aug 2007 11:18:43 -0700
- To: "Richard Cyganiak" <richard@cyganiak.de>, "Kingsley Idehen" <kidehen@openlinksw.com>, "Chris Bizer" <chris@bizer.de>
- Cc: semantic-web@w3.org
- Message-ID: <4c7b59910708041118o3199ecdfq5eff22003288fb54@mail.gmail.com>
Hi all, While browsing dbpedia.org I recognized your names from the sem-web mailing list and wanted to send along a question. Have you done any thinking about extracting disambiguation information from disambiguation pages? I was working on a similar project to extract structured info from wikipedia.org to be used as the basis for a sem-web project (until I came across dbpedia.org), and this is one thing I was targeting that I couldn't find any mention of on dbpedia.org. I extract all the list items from a particular disambiguation page and perform some basic processing to try and determine the disambiguated article/concept. The Apple disambiguation page<http://en.wikipedia.org/wiki/Apple_%2528disambiguation%2529> is a good example of some of the different styles of information you get: 1. Apple Brook <http://en.wikipedia.org/wiki/Apple_Brook>, a British actress Simple to extract a mapping between the ambiguous "Apple" and Apple Brook, along with a potentially useful single sentence abstract. 2. *Apple* (album) <http://en.wikipedia.org/wiki/Apple_%28album%29>, an album by Mother Love Bone <http://en.wikipedia.org/wiki/Mother_Love_Bone> or Ariane Passenger Payload Experiment<http://en.wikipedia.org/wiki/Ariane_Passenger_Payload_Experiment>, an Indian experimental communication satellite with a C-Band transponder<http://en.wikipedia.org/wiki/Transponder>launched in 1981. Multiple links, so it's not immediately obvious which one is the disambiguated concept, but you can imagine heuristics to make connections here. 3. any of the *computers* made by Apple Inc.<http://en.wikipedia.org/wiki/Apple_Inc.>since 1976 <http://en.wikipedia.org/wiki/1976>, notably the Apple Macintosh<http://en.wikipedia.org/wiki/Apple_Macintosh> Somewhat unclear disambiguation, potentially difficult to extract the correct relationship. I haven't done a lot of thinking about the proper way to represent these relationships in RDF, I was just writing back to a custom DB schema for now, but I think the information is highly valuable. Also, similar to this, but easier to extract, is the synonym information stored in the redirect links; are you currently extracting multiple rdfs:label-s based on these redirects? If you have a minute let me know your thoughts on this. Chris
Received on Saturday, 4 August 2007 18:18:46 UTC