W3C home > Mailing lists > Public > semantic-web@w3.org > December 2006

Re: Wikipedia and Geonames. was: AW: ANN: RDF Book Mashup - Integrating Web 2.0 data sources like Amazon and Google into the Semantic Web

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 4 Dec 2006 00:23:11 +0100
Message-Id: <EFC5CBE5-C205-48E0-BD7B-C99C305F3B5E@cyganiak.de>
Cc: Chris Bizer <chris@bizer.de>, Semantic Web <semantic-web@w3.org>
To: Marc <marc@geonames.org>

Marc,

On 3 Dec 2006, at 23:04, Marc wrote:
> We are already working on linking wikipedia articles with geonames  
> places. See also this thread in October : http://lists.w3.org/ 
> Archives/Public/semantic-web/2006Oct/0148.html
>
> Now that you are asking for it, I have released today a first  
> version, which includes the following wikipedia information about  
> Embrun :
>
> <wikipediaArticle>http://fr.wikipedia.org/wiki/Embrun_%28Hautes- 
> Alpes%29</wikipediaArticle>
> <wikipediaArticle>http://pl.wikipedia.org/wiki/Embrun</ 
> wikipediaArticle>
> <wikipediaArticle>http://de.wikipedia.org/wiki/Embrun</ 
> wikipediaArticle>
> <wikipediaArticle>http://en.wikipedia.org/wiki/Embrun%2C_Hautes- 
> Alpes</wikipediaArticle>
> <wikipediaArticle>http://it.wikipedia.org/wiki/Embrun</ 
> wikipediaArticle>
> <wikipediaArticle>http://nl.wikipedia.org/wiki/Embrun</ 
> wikipediaArticle>
>
> Around 100,000 geonames place names now have wikipedia links.

Very cool. I wonder how you link the articles? Can't be simple word  
matching, no?

One detail: You should use the URI syntax, not the literal syntax.  
That is, instead of this:

   <wikipediaArticle>http://nl.wikipedia.org/wiki/Embrun</ 
wikipediaArticle>

it should look like this:

   <wikipediaArticle rdf:resource="http://nl.wikipedia.org/wiki/ 
Embrun"/>

Just like the other links that are already in the RDF data, e.g.  
nearbyFeatures.

Best,
Richard


>
>> I once read about some pretty sophisticated screen-scraping  
>> frameworks
> As far as I know crawling and screen scraping wikipedia is not  
> considered fair use. Wikimedia software is rather resource  
> intensive and the preferred way is to use the xml download files :  
> http://download.wikipedia.org/
>
> Cheers
>
> Marc
>
>
Received on Sunday, 3 December 2006 23:23:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:22:45 GMT