Re: XML problems with percent encoding

Toby Inkster schrieb:
> On Tue, 2009-11-17 at 10:13 +0000, Damian Steer wrote:
>   
>> RDF/XML remains the only recommended rdf serialisation, but it can't
>> serialise every rdf graph.
>>
>> Not a happy situation. 
>>     
>
> Actually XHTML+RDFa is a W3C Recommendation, with the same (de jure)
> status as RDF/XML. It's capable of representing almost every RDF graph.
> (With the exception of literals containing certain Unicode control
> characters which are completely illegal in XML.)
>
> XHTML+RDFa uses CURIEs rather than QNames. CURIEs are a superset of
> QNames and allow a much wider set of characters to be used.
>
> For example, <http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%90> can be
> serialised as:
>
>  <div xmlns:dbp-ko="http://dbpedia.org/property/"
>       property="dbp-ko:%EA%B4%91%EC%9E%90">
>
> As it happens, some properties containing lots of percent-encoding can
> be represented fine in RDF/XML. e.g. <http://ko.dbpedia.org/property/%EA
> %B4%91%EC%9E%8F> which can be:
>
> <foo:F xmlns:foo="http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%8">
>
> The problems arise when neither hex digit of the last character is in
> the range A-F.
>   
So basically XHTML+RDFa is incompatible with RDF/XML in this respect. Let's 
say the original data is kept in XHTML+RDFa. If it is spread in the Web of Data 
from host to host and somebody tries to serialize it in RDF/XML his 
parser/serializer is bound to fail. This still does not sound optimal.
I will discuss this issue with the rest of the DBpedia team. Maybe we will just skip the 
underscore workaround then and produce a clean solution, albeit not compatible 
with RDF/XML anymore, but still with turtle and XHTML+RDFa.
Regards,
Sebastian


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Received on Wednesday, 18 November 2009 14:18:03 UTC