W3C home > Mailing lists > Public > public-lod@w3.org > May 2013

Given a university's name, retrieve URL for university's home page.

From: Sam Kuper <sam.kuper@uclmail.net>
Date: Mon, 13 May 2013 19:39:21 +0100
Message-ID: <CAD-Jur+BcMO3iEpJdB9xkNn-6TesMqkVh=Htga1L3VNK0BxemA@mail.gmail.com>
To: public-lod <public-lod@w3.org>
Dear all,

As I am something of an LOD noob, please feel free to point me in the
direction of other mailing lists or sources of advice if you feel they
are more appropriate than public-lod is for my request below.

I wish to solve the following problem: given a string that represents
one of perhaps several common orthographic representations of a
university's name (e.g. "Cambridge University" might be given, instead
of "University of Cambridge"), retrieve the URL of that university's
home page on the WWW.

My first attempt at a solution is a two-step process. It is to query
the Wikipedia API in order to obtain, with any luck, the title for the
university's article in Wikipedia, e.g.:
yields {"query-continue":{"search":{"sroffset":1}},"query":{"searchinfo":{"totalhits":86254},"search":[{"ns":0,"title":"University
of Cambridge"}]}}

The second step is to use that title to submit a SPARQL query to
DBpedia in the hope of obtaining the university's website's URL, e.g.
yields an HTML table containing the desired result.

This attempt suffers from several shortcomings:

(1) Step 1 does not reliably yield a result unless the string is
varied slightly and resubmitted, e.g.
does not yield an article title, but

(2) Step 2 does not reliably yield a result, even if step 1 is
successful and Wikipedia has a record of the university's website,
e.g. http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwebsite%0D%0AWHERE++{+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FHarvard_University%3E+dbpprop%3Awebsite+%3Fwebsite+.+}&format=text%2Fhtml&timeout=0
yields no URL.

(3) In step 3, I am using HTML output from the SPARQL query only
because the JSON output seems to be unreliable. For example,
yields the desired URL in the output but
does not.

I therefore suspect that there are better approaches, e.g.: better
ways for me to use the APIs of the resources I am querying (i.e.
Wikipedia and DBpedia), or better resources to query, or some
combination of the two. If you can suggest any such improvements (or,
as I mentioned above, more appropriate sources of advice), I would be

Many thanks in advance,

Received on Monday, 13 May 2013 18:39:48 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:21:44 UTC