W3C home > Mailing lists > Public > semantic-web@w3.org > November 2009

Re: XML problems with percent encoding

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Wed, 18 Nov 2009 15:17:12 +0100
Message-ID: <4B0401E8.5020504@informatik.uni-leipzig.de>
To: Toby Inkster <tai@g5n.co.uk>
CC: Damian Steer <pldms@mac.com>, Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, semantic-web@w3.org, Michael Martin <martin@informatik.uni-leipzig.de>, Christopher Jona Sahnwaldt <christopher@sahnwaldt.de>, Matthias Weidl <matthias.weidl@googlemail.com>, Anja Jentzsch <anja@anjeve.de>, Richard Cyganiak <richard@cyganiak.de>, Robert Isele <robertisele@gmail.com>
Toby Inkster schrieb:
> On Tue, 2009-11-17 at 10:13 +0000, Damian Steer wrote:
>   
>> RDF/XML remains the only recommended rdf serialisation, but it can't
>> serialise every rdf graph.
>>
>> Not a happy situation. 
>>     
>
> Actually XHTML+RDFa is a W3C Recommendation, with the same (de jure)
> status as RDF/XML. It's capable of representing almost every RDF graph.
> (With the exception of literals containing certain Unicode control
> characters which are completely illegal in XML.)
>
> XHTML+RDFa uses CURIEs rather than QNames. CURIEs are a superset of
> QNames and allow a much wider set of characters to be used.
>
> For example, <http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%90> can be
> serialised as:
>
> 	<div xmlns:dbp-ko="http://dbpedia.org/property/"
> 	     property="dbp-ko:%EA%B4%91%EC%9E%90">
>
> As it happens, some properties containing lots of percent-encoding can
> be represented fine in RDF/XML. e.g. <http://ko.dbpedia.org/property/%EA
> %B4%91%EC%9E%8F> which can be:
>
> <foo:F xmlns:foo="http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%8">
>
> The problems arise when neither hex digit of the last character is in
> the range A-F.
>   
So basically XHTML+RDFa is incompatible with RDF/XML in this respect. Let's 
say the original data is kept in XHTML+RDFa. If it is spread in the Web of Data 
from host to host and somebody tries to serialize it in RDF/XML his 
parser/serializer is bound to fail. This still does not sound optimal.
I will discuss this issue with the rest of the DBpedia team. Maybe we will just skip the 
underscore workaround then and produce a clean solution, albeit not compatible 
with RDF/XML anymore, but still with turtle and XHTML+RDFa.
Regards,
Sebastian


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Wednesday, 18 November 2009 14:18:03 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:32 GMT