Re: Given a university's name, retrieve URL for university's home page.

Hi Sam,

Why don't you use Google or Bing? Typically the first or second result
which are not from the wikipedia site would be what you want. I did this
before and it ran pretty well.

Best regards,

Lushan Han


On Mon, May 13, 2013 at 2:39 PM, Sam Kuper <sam.kuper@uclmail.net> wrote:

> Dear all,
>
> As I am something of an LOD noob, please feel free to point me in the
> direction of other mailing lists or sources of advice if you feel they
> are more appropriate than public-lod is for my request below.
>
> I wish to solve the following problem: given a string that represents
> one of perhaps several common orthographic representations of a
> university's name (e.g. "Cambridge University" might be given, instead
> of "University of Cambridge"), retrieve the URL of that university's
> home page on the WWW.
>
> My first attempt at a solution is a two-step process. It is to query
> the Wikipedia API in order to obtain, with any luck, the title for the
> university's article in Wikipedia, e.g.:
>
> http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Cambridge%20University
> yields
> {"query-continue":{"search":{"sroffset":1}},"query":{"searchinfo":{"totalhits":86254},"search":[{"ns":0,"title":"University
> of Cambridge"}]}}
>
> The second step is to use that title to submit a SPARQL query to
> DBpedia in the hope of obtaining the university's website's URL, e.g.
>
> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_Cambridge%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0
> yields an HTML table containing the desired result.
>
> This attempt suffers from several shortcomings:
>
> (1) Step 1 does not reliably yield a result unless the string is
> varied slightly and resubmitted, e.g.
>
> http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Pennsylvania%20State%20University%20-%20University%20Park
> does not yield an article title, but
>
> http://en.wikipedia.org/w/api.php?action=query&list=search&srprop=score&srredirects=true&srlimit=1&format=json&srsearch=Pennsylvania%20State%20University-University%20Park
> does.
>
> (2) Step 2 does not reliably yield a result, even if step 1 is
> successful and Wikipedia has a record of the university's website,
> e.g.
> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FHarvard_University%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0
> yields no URL.
>
> (3) In step 3, I am using HTML output from the SPARQL query only
> because the JSON output seems to be unreliable. For example,
>
> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_California,_Los_Angeles%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0
> yields the desired URL in the output but
>
> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FUniversity_of_California,_Los_Angeles%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=json&timeout=0
> does not.
>
> I therefore suspect that there are better approaches, e.g.: better
> ways for me to use the APIs of the resources I am querying (i.e.
> Wikipedia and DBpedia), or better resources to query, or some
> combination of the two. If you can suggest any such improvements (or,
> as I mentioned above, more appropriate sources of advice), I would be
> grateful.
>
> Many thanks in advance,
>
> Sam
>
>

Received on Tuesday, 14 May 2013 12:42:45 UTC