W3C home > Mailing lists > Public > public-rdf-ruby@w3.org > February 2011

Re: Language options in rdf.rb

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Mon, 28 Feb 2011 12:14:43 -0500
To: Alex Kremer <alex@entitylab.com>
CC: "public-rdf-ruby@w3.org" <public-rdf-ruby@w3.org>
Message-ID: <99B6F2C0-7D40-4A6F-85B4-39414B3636BE@kellogg-assoc.com>
Looking at the source, it's returned as text/html, meaning that it is using the RDF::RDFa parser. The source is, indeed, in RDFa 1.0 format. This format depends on the xml:lang or lang tags from the element containing the literal, or any element in it's ancestry. In this case, the html element contains xml:lang="en". That's why the literal has a language tag of :en. It seems that DBPedia, in this case anyway, isn't properly attributing the language to the page.

If you get the RDF/XML version of the page (http://dbpedia.org/data/Vienna), they do properly set language tags, so you will get the proper language tag assigned to the literal.

It seems that DBPedia isn't properly setting xml:lang attributes on nodes when publishing the RDFa content. It would certainly be a good idea to file this as a bug at DBPedia. In the mean time, best make use of the RDF/XML feed.


On Feb 27, 2011, at 8:11 AM, Alex Kremer wrote:

> Hi,
> Apologies if the following seems very elementary, but here goes:
> I'm trying to retrieve an abstract from a DBPedia page in English. The problem is it seems like rdf.rb thinks every result it gets back is english, even results in foreign languages:
> graph = RDF::Graph.load("http://dbpedia.org/page/Vienna")
> dbp = RDF::Vocabulary.new("http://dbpedia.org/ontology/")
> query = RDF::Query.new(:article => {dbp.abstract => :abstract})
> => #<RDF::Query:0x1094812c8 @solutions=[], @options={}, @variables={}, @patterns=[#<RDF::Query::Pattern:0x84a40770(?article <http://dbpedia.org/ontology/abstract> ?abstract .)>]>
> a = query.execute(graph)
> a.first
> <RDF::Query::Solution:0x84ba74d8({:abstract=>#<RDF::Literal:0x812bc7b8("Wien ist die Bundeshauptstadt der Republik \u00D6sterreich und zugleich eines der neun \u00F6sterreichischen (...shortened for brevity...) gefolgt von Z\u00FCrich und Genf an zweiter und dritter Stelle."@en)>, :article=>#<RDF::URI:0x81724044(http://dbpedia.org/resource/Vienna)>})>
> As you can see, rdf.rb seems to think the language for the first abstract is english, when in fact it's german. If I query DBPedia via their SPARQL endpoint I do get correct results, so I am sure their data isn't the problem here. I tried to filter the solutions by language per http://rdf.rubyforge.org/RDF/Query/Solutions.html but since they're all tagged with @en, they all come back when I ask for English.
> Does anyone have any idea what could be causing this or how to solve it? Am I querying wrong? If so, how would I structure the query to get the proper language result? 
> Thanks in advance!
> -Alex
Received on Monday, 28 February 2011 17:16:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:53:41 UTC