- From: Christopher Jona Sahnwaldt <christopher@sahnwaldt.de>
- Date: Wed, 18 Nov 2009 18:21:54 +0100
- To: Jeremy Carroll <jeremy@topquadrant.com>
- Cc: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, semantic-web@w3.org, Michael Martin <martin@informatik.uni-leipzig.de>, Matthias Weidl <matthias.weidl@googlemail.com>, Anja Jentzsch <anja@anjeve.de>, Richard Cyganiak <richard@cyganiak.de>, Robert Isele <robertisele@gmail.com>
On Wed, Nov 18, 2009 at 16:14, Jeremy Carroll <jeremy@topquadrant.com> wrote: > Sebastian Hellmann wrote: >> >> Dear all, >> we (especially Matthias Weidl @ KAIST) are currently working on producing >> a Korean DBpedia. >> We encountered a problem again that we are not really able to solve but >> can only produce a workaround. The property URIs in korean completely have >> special Characters. If we try to URL encode them, serialisation in RDF/XML >> is bound to fail. >> >> For a property like: >> http://dbpedia.org/property/l%E3%A4ngengrad >> Jena produces the following: >> <ns0:ngengrad xmlns:ns0="http://dbpedia.org/property/l%E3%A4"> >> because % is not a valid character in an XML tag. >> But if the property only contains special characters, it can not work any >> more: >> http://ko.dbpedia.org/property/%EA%B4%91%EC%9E%90 >> >> In DBpedia we created a work around for this, replacing % with _percent_ >> but it is clearly not a satisfactory solution. >> >> How shall we resolve this matter? >> Is XML conformity still necessary or is there a motion to only use turtle >> in the future? >> >> > > Sorry I am late to this thread. > Why are you percent encoding the special chars. Why not just leave them in > Korean? > Semantic Web standards are based on IRIs that allow all this chars > > Jeremy Hi, I don't know the details of IRIs, but that sounds like a good idea. For a moment I thought that N-Triples wouldn't allow IRIs, but then I realized that there's a difference between 'URI' and 'URI reference'. If I understand [1] and [2] correctly, we could (and probably should) generate N-Triples like the following: <http://dbpedia.org/resource/Glinde%2C_Schleswig-Holstein> <http://dbpedia.org/property/l\u00E4ngengrad> "10/12/40/E"@de . instead of <http://dbpedia.org/resource/Glinde%2C_Schleswig-Holstein> <http://dbpedia.org/property/l%C3%A4ngengrad> "10/12/40/E"@de . (Besides, the encoded property URIs we currently use are broken - E3 A4 is not even a valid UTF-8 byte sequence. The correct UTF-8 encoding of 'ไ' is C3 A4.) Christopher [1] http://www.w3.org/TR/rdf-testcases/#sec-uri-encoding [2] http://www.w3.org/TR/rdf-concepts/#dfn-URI-reference
Received on Wednesday, 18 November 2009 17:30:30 UTC