W3C home > Mailing lists > Public > semantic-web@w3.org > November 2009

Re: XML problems with percent encoding

From: Jeremy Carroll <jeremy@topquadrant.com>
Date: Wed, 18 Nov 2009 07:14:22 -0800
Message-ID: <4B040F4E.8090508@topquadrant.com>
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
CC: semantic-web@w3.org, Michael Martin <martin@informatik.uni-leipzig.de>, Christopher Jona Sahnwaldt <christopher@sahnwaldt.de>, Matthias Weidl <matthias.weidl@googlemail.com>, Anja Jentzsch <anja@anjeve.de>, Richard Cyganiak <richard@cyganiak.de>, Robert Isele <robertisele@gmail.com>
Sebastian Hellmann wrote:
> Dear all,
> we (especially Matthias Weidl @ KAIST)  are currently working on 
> producing a Korean DBpedia.
> We encountered a problem again that we are not really able to solve 
> but can only produce a workaround. The property URIs in korean 
> completely have special Characters. If we try to URL encode them, 
> serialisation in RDF/XML is bound to fail.
>
> For a property like:
> http://dbpedia.org/property/l%E3%A4ngengrad
> Jena produces the following:
> <ns0:ngengrad xmlns:ns0="http://dbpedia.org/property/l%E3%A4">
> because % is not a valid character in an XML tag.
> But if the property only contains special characters, it can not work 
> any more:
> http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%90
>
> In DBpedia we created a work around for this, replacing % with _percent_
> but it is clearly not a satisfactory solution.
>
> How shall we resolve this matter?
> Is XML conformity still necessary or is there a motion to only use 
> turtle in the future?
>
>

Sorry I am late to this thread.
Why are you percent encoding the special chars. Why not just leave them 
in Korean?
Semantic Web standards are based on IRIs that allow all this chars

Jeremy


>
>
Received on Wednesday, 18 November 2009 15:14:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:48:03 UTC