W3C home > Mailing lists > Public > semantic-web@w3.org > November 2009

Re: XML problems with percent encoding

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Wed, 18 Nov 2009 15:17:12 +0100
Message-ID: <4B0401E8.5020504@informatik.uni-leipzig.de>
To: Toby Inkster <tai@g5n.co.uk>
CC: Damian Steer <pldms@mac.com>, Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, semantic-web@w3.org, Michael Martin <martin@informatik.uni-leipzig.de>, Christopher Jona Sahnwaldt <christopher@sahnwaldt.de>, Matthias Weidl <matthias.weidl@googlemail.com>, Anja Jentzsch <anja@anjeve.de>, Richard Cyganiak <richard@cyganiak.de>, Robert Isele <robertisele@gmail.com>
Toby Inkster schrieb:
> On Tue, 2009-11-17 at 10:13 +0000, Damian Steer wrote:
>> RDF/XML remains the only recommended rdf serialisation, but it can't
>> serialise every rdf graph.
>> Not a happy situation. 
> Actually XHTML+RDFa is a W3C Recommendation, with the same (de jure)
> status as RDF/XML. It's capable of representing almost every RDF graph.
> (With the exception of literals containing certain Unicode control
> characters which are completely illegal in XML.)
> XHTML+RDFa uses CURIEs rather than QNames. CURIEs are a superset of
> QNames and allow a much wider set of characters to be used.
> For example, <http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%90> can be
> serialised as:
> 	<div xmlns:dbp-ko="http://dbpedia.org/property/"
> 	     property="dbp-ko:%EA%B4%91%EC%9E%90">
> As it happens, some properties containing lots of percent-encoding can
> be represented fine in RDF/XML. e.g. <http://ko.dbpedia.org/property/%EA
> %B4%91%EC%9E%8F> which can be:
> <foo:F xmlns:foo="http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%8">
> The problems arise when neither hex digit of the last character is in
> the range A-F.
So basically XHTML+RDFa is incompatible with RDF/XML in this respect. Let's 
say the original data is kept in XHTML+RDFa. If it is spread in the Web of Data 
from host to host and somebody tries to serialize it in RDF/XML his 
parser/serializer is bound to fail. This still does not sound optimal.
I will discuss this issue with the rest of the DBpedia team. Maybe we will just skip the 
underscore workaround then and produce a clean solution, albeit not compatible 
with RDF/XML anymore, but still with turtle and XHTML+RDFa.

Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Wednesday, 18 November 2009 14:18:03 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:42:16 UTC