- From: Lushan Han <lushan1@umbc.edu>
- Date: Tue, 14 May 2013 08:42:14 -0400
- To: Sam Kuper <sam.kuper@uclmail.net>
- Cc: public-lod <public-lod@w3.org>
- Message-ID: <CAOyMU3gfStOQaZVHL1QzEDoE1-vs4jW2MjC=3FuACZ1UMna0Sg@mail.gmail.com>
Hi Sam, Why don't you use Google or Bing? Typically the first or second result which are not from the wikipedia site would be what you want. I did this before and it ran pretty well. Best regards, Lushan Han On Mon, May 13, 2013 at 2:39 PM, Sam Kuper <sam.kuper@uclmail.net> wrote: > Dear all, > > As I am something of an LOD noob, please feel free to point me in the > direction of other mailing lists or sources of advice if you feel they > are more appropriate than public-lod is for my request below. > > I wish to solve the following problem: given a string that represents > one of perhaps several common orthographic representations of a > university's name (e.g. "Cambridge University" might be given, instead > of "University of Cambridge"), retrieve the URL of that university's > home page on the WWW. > > My first attempt at a solution is a two-step process. It is to query > the Wikipedia API in order to obtain, with any luck, the title for the > university's article in Wikipedia, e.g.: > > http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Cambridge%20University > yields > {"query-continue":{"search":{"sroffset":1}},"query":{"searchinfo":{"totalhits":86254},"search":[{"ns":0,"title":"University > of Cambridge"}]}} > > The second step is to use that title to submit a SPARQL query to > DBpedia in the hope of obtaining the university's website's URL, e.g. > > http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_Cambridge%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 > yields an HTML table containing the desired result. > > This attempt suffers from several shortcomings: > > (1) Step 1 does not reliably yield a result unless the string is > varied slightly and resubmitted, e.g. > > http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Pennsylvania%20State%20University%20-%20University%20Park > does not yield an article title, but > > http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Pennsylvania%20State%20University-University%20Park > does. > > (2) Step 2 does not reliably yield a result, even if step 1 is > successful and Wikipedia has a record of the university's website, > e.g. > http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FHarvard_University%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 > yields no URL. > > (3) In step 3, I am using HTML output from the SPARQL query only > because the JSON output seems to be unreliable. For example, > > http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_California,_Los_Angeles%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0 > yields the desired URL in the output but > > http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_California,_Los_Angeles%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=json&timeout=0 > does not. > > I therefore suspect that there are better approaches, e.g.: better > ways for me to use the APIs of the resources I am querying (i.e. > Wikipedia and DBpedia), or better resources to query, or some > combination of the two. If you can suggest any such improvements (or, > as I mentioned above, more appropriate sources of advice), I would be > grateful. > > Many thanks in advance, > > Sam > >
Received on Tuesday, 14 May 2013 12:42:45 UTC