Re: XML problems with percent encoding from Damian Steer on 2009-11-17 (semantic-web@w3.org from November 2009)

From: Damian Steer <pldms@mac.com>
Date: Tue, 17 Nov 2009 12:43:47 +0000
To: Richard Cyganiak <richard@cyganiak.de>
CC: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, semantic-web@w3.org, Michael Martin <martin@informatik.uni-leipzig.de>, Christopher Jona Sahnwaldt <christopher@sahnwaldt.de>, Matthias Weidl <matthias.weidl@googlemail.com>, Anja Jentzsch <anja@anjeve.de>, Robert Isele <robertisele@gmail.com>
Message-ID: <4B029A83.4060104@mac.com>

Richard Cyganiak wrote:
> Just throwing in one comment: I *believe* (but am not 100% sure) that
> XML element names may contain characters outside of the US-ASCII range,
> such as “ä”. So, <dbpedia:längengrad> might actually be a valid XML
> element. 

This is true for both xml 1.0 and 1.1, see [1] and [2]. IIRC the
exclusions are mostly obvious, non-printable things once you're beyond
ascii.

> I have no idea if this also applies to RDF/XML. The
> relationships between URIs, IRIs, RDF, XML, UTF-8 etc are incredibly
> complex... But it might be worth trying if a URI such as
> <http://de.dbpedia.org/property/längengrad> is actually somehow allowed
> *in the RDF data model*, and how it would be serialized in the different
> RDF surface syntaxes.

Things will probably go pear shaped once the IRI has been converted to a
URI, particularly comparisons. I doubt any libraries are routinely
normalising incoming IRIs.

Damian

[1] <http://www.w3.org/TR/xml-names/#NT-NCNameStartChar>
[2] <http://www.w3.org/TR/xml11/#IDAKUDS>

Received on Tuesday, 17 November 2009 12:43:23 UTC