Re: XML problems with percent encoding

Sebastian Hellmann wrote:
> Dear all,
> we (especially Matthias Weidl @ KAIST)  are currently working on
> producing a Korean DBpedia.
> We encountered a problem again that we are not really able to solve but
> can only produce a workaround. The property URIs in korean completely
> have special Characters. If we try to URL encode them, serialisation in
> RDF/XML is bound to fail.
> 
> For a property like:
> http://dbpedia.org/property/l%E3%A4ngengrad
> Jena produces the following:
> <ns0:ngengrad xmlns:ns0="http://dbpedia.org/property/l%E3%A4">
> because % is not a valid character in an XML tag.
> But if the property only contains special characters, it can not work
> any more:
> http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%90
> 
> In DBpedia we created a work around for this, replacing % with _percent_
> but it is clearly not a satisfactory solution.
> 
> How shall we resolve this matter?
> Is XML conformity still necessary or is there a motion to only use
> turtle in the future?

RDF/XML remains the only recommended rdf serialisation, but it can't
serialise every rdf graph.

Not a happy situation.

You could look at adding an _
(http://ko.dbpedia.org/property/_%EA%B4%91%EC%9E%90) for the problem
cases, or add a fragment
(http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%90#it)? The latter has
its own issues, so I'd try the former.

Damian

Received on Tuesday, 17 November 2009 10:13:31 UTC