- From: John Erickson <olyerickson@gmail.com>
- Date: Mon, 13 May 2013 16:31:09 -0400
- To: Gannon Dick <gannon_dick@yahoo.com>
- Cc: Sam Kuper <sam.kuper@uclmail.net>, public-lod <public-lod@w3.org>
First of all, keep in mind that while DBPedia is amazing and astounding, it (a) doesn't have all the data of the known universe, (b) is not 100% consistent in its use of vocabulary to describe the universe it does know, and (c) sometimes the vocabulary changes. In short, your mileage may vary... Now...A simple examination shows that you can get lucky. For example, the following query will get you a table of all the web site URLs for the entities it thinks are "universities" SELECT ?university ?website WHERE { ?university <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/University>. ?university dbpedia2:website ?website. } or http://bit.ly/BigAssSparqlQuery You could use such a query as the basis for creating an "instance hub" style service for disambiguating university URIs and for aggregating linked data related to these unis, including their websites... On Mon, May 13, 2013 at 2:59 PM, Gannon Dick <gannon_dick@yahoo.com> wrote: > Hi Sam, > > The problem is already solved in fine detail, but the parameter names may be > a little difficult to relate to LOD usage. > > http://www.ncbi.nlm.nih.gov/books/NBK25497/ > > Good luck :-) > > ________________________________ > From: Sam Kuper <sam.kuper@uclmail.net> > To: public-lod <public-lod@w3.org> > Sent: Monday, May 13, 2013 1:39 PM > Subject: Given a university's name, retrieve URL for university's home page. > > Dear all, > > As I am something of an LOD noob, please feel free to point me in the > direction of other mailing lists or sources of advice if you feel they > are more appropriate than public-lod is for my request below. > > I wish to solve the following problem: given a string that represents > one of perhaps several common orthographic representations of a > university's name (e.g. "Cambridge University" might be given, instead > of "University of Cambridge"), retrieve the URL of that university's > home page on the WWW. > > My first attempt at a solution is a two-step process. It is to query > the Wikipedia API in order to obtain, with any luck, the title for the > university's article in Wikipedia, e.g.: > http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Cambridge%20University > yields > {"query-continue":{"search":{"sroffset":1}},"query":{"searchinfo":{"totalhits":86254},"search":[{"ns":0,"title":"University > of Cambridge"}]}} > > The second step is to use that title to submit a SPARQL query to > DBpedia in the hope of obtaining the university's website's URL, e.g. > http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_Cambridge%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 > yields an HTML table containing the desired result. > > This attempt suffers from several shortcomings: > > (1) Step 1 does not reliably yield a result unless the string is > varied slightly and resubmitted, e.g. > http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Pennsylvania%20State%20University%20-%20University%20Park > does not yield an article title, but > http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Pennsylvania%20State%20University-University%20Park > does. > > (2) Step 2 does not reliably yield a result, even if step 1 is > successful and Wikipedia has a record of the university's website, > e.g. > http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FHarvard_University%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 > yields no URL. > > (3) In step 3, I am using HTML output from the SPARQL query only > because the JSON output seems to be unreliable. For example, > http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_California,_Los_Angeles%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 > yields the desired URL in the output but > http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_California,_Los_Angeles%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=json&timeout=0 > does not. > > I therefore suspect that there are better approaches, e.g.: better > ways for me to use the APIs of the resources I am querying (i.e. > Wikipedia and DBpedia), or better resources to query, or some > combination of the two. If you can suggest any such improvements (or, > as I mentioned above, more appropriate sources of advice), I would be > grateful. > > Many thanks in advance, > > Sam > > > -- John S. Erickson, Ph.D. Director, Web Science Operations Tetherless World Constellation (RPI) <http://tw.rpi.edu> <olyerickson@gmail.com> Twitter & Skype: olyerickson
Received on Monday, 13 May 2013 20:31:37 UTC