- From: Karl Dubost <karl@la-grange.net>
- Date: Sat, 2 Feb 2013 10:15:06 -0500
- To: "public-lod@w3.org community" <public-lod@w3.org>
I wanted to access to the infobox data on wikipedia.
For example, The Guy Debord Web page in French.
https://fr.wikipedia.org/wiki/Guy_Debord
I could scrap it with lxml and a bit of python, but I thought there might be a better way. I was expecting something like:
→ curl -H "Accept: text/html+infobox" http://fr.wikipedia.org/wiki/Guy_Debord
<!DOCTYPE html>
but that didn't work, it return the full HTML document. So I searched a bit and remembered about DBpedia.
→ curl http://dbpedia.org/data/Guy_Debord
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
It returned an RDF version of the document in English. Hmm ok. Let's try to force French.
→ curl -H "Accept-Language: fr" http://dbpedia.org/data/Guy_Debord
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
That returned the same version in English. But I have seen in the markup that there is a link to a French version.
<owl:sameAs rdf:resource="http://fr.dbpedia.org/resource/Guy_Debord" />
Let's hardcode it then.
→ curl http://fr.dbpedia.org/data/Guy_Debord
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
This time I get the RDF/XML version. Are there other versions? such as RDF n3.
→ curl -H "Accept: text/rdf+n3" http://fr.dbpedia.org/data/Guy_Debord
@prefix dbpedia-owl: <http://dbpedia.org/ontology/> .
@prefix dbpedia-fr: <http://fr.dbpedia.org/resource/> .
dbpedia-fr:Anarchisme dbpedia-owl:wikiPageWikiLink dbpedia-fr:Guy_Debord .
This worked. Then I tried text/turtle
→ curl -H "Accept: text/turtle" http://fr.dbpedia.org/data/Guy_Debord
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>406 Not Acceptable</title>
</head><body>
<h1>406 Not Acceptable</h1>
<p>An appropriate representation of the requested resource Guy_Debord could not be found on this server.</p>
Available variant(s):
<ul>
<li><a href="Guy_Debord">Guy_Debord</a> , type application/rdf+xml, charset UTF-8</li>
</ul>
</body></html>
It didn't work. :/ What about json.
→ curl -H "Accept: application/json" http://fr.dbpedia.org/data/Guy_Debord
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>406 Not Acceptable</title>
</head><body>
<h1>406 Not Acceptable</h1>
<p>An appropriate representation of the requested resource Guy_Debord could not be found on this server.</p>
Available variant(s):
<ul>
<li><a href="Guy_Debord">Guy_Debord</a> , type application/rdf+xml, charset UTF-8</li>
</ul>
</body></html>
but this worked by harcoding the URI.
→ curl http://fr.dbpedia.org/data/Guy_Debord.json
{
"http://fr.dbpedia.org/resource/Anarchisme" : { "http://dbpedia.org/ontology/wikiPageWikiLink" : [ { "type" : "uri", "value" : "http://fr.dbpedia.org/resource/Guy_Debord" } ] } ,
"http://fr.dbpedia.org/resource/Id\u00E9ologie" : { "http://dbpedia.org/ontology/wikiPageWikiLink" : [ { "type" : "uri", "value" : "http://fr.dbpedia.org/resource/Guy_Debord" } ] } ,
But I see that has been discussed already :)
http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg03582.html
--
Karl Dubost, a Web opener to hire
http://www.la-grange.net/karl/
Received on Saturday, 2 February 2013 15:15:08 UTC