- From: Sam Kuper <sam.kuper@uclmail.net>
- Date: Mon, 13 May 2013 19:39:21 +0100
- To: public-lod <public-lod@w3.org>
Dear all, As I am something of an LOD noob, please feel free to point me in the direction of other mailing lists or sources of advice if you feel they are more appropriate than public-lod is for my request below. I wish to solve the following problem: given a string that represents one of perhaps several common orthographic representations of a university's name (e.g. "Cambridge University" might be given, instead of "University of Cambridge"), retrieve the URL of that university's home page on the WWW. My first attempt at a solution is a two-step process. It is to query the Wikipedia API in order to obtain, with any luck, the title for the university's article in Wikipedia, e.g.: http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Cambridge%20University yields {"query-continue":{"search":{"sroffset":1}},"query":{"searchinfo":{"totalhits":86254},"search":[{"ns":0,"title":"University of Cambridge"}]}} The second step is to use that title to submit a SPARQL query to DBpedia in the hope of obtaining the university's website's URL, e.g. http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_Cambridge%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 yields an HTML table containing the desired result. This attempt suffers from several shortcomings: (1) Step 1 does not reliably yield a result unless the string is varied slightly and resubmitted, e.g. http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Pennsylvania%20State%20University%20-%20University%20Park does not yield an article title, but http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Pennsylvania%20State%20University-University%20Park does. (2) Step 2 does not reliably yield a result, even if step 1 is successful and Wikipedia has a record of the university's website, e.g. http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FHarvard_University%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 yields no URL. (3) In step 3, I am using HTML output from the SPARQL query only because the JSON output seems to be unreliable. For example, http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_California,_Los_Angeles%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 yields the desired URL in the output but http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_California,_Los_Angeles%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=json&timeout=0 does not. I therefore suspect that there are better approaches, e.g.: better ways for me to use the APIs of the resources I am querying (i.e. Wikipedia and DBpedia), or better resources to query, or some combination of the two. If you can suggest any such improvements (or, as I mentioned above, more appropriate sources of advice), I would be grateful. Many thanks in advance, Sam
Received on Monday, 13 May 2013 18:39:48 UTC