Re: XML problems with percent encoding

Richard Cyganiak wrote:
> Just throwing in one comment: I *believe* (but am not 100% sure) that
> XML element names may contain characters outside of the US-ASCII range,
> such as “ä”. So, <dbpedia:längengrad> might actually be a valid XML
> element. 

This is true for both xml 1.0 and 1.1, see [1] and [2]. IIRC the
exclusions are mostly obvious, non-printable things once you're beyond
ascii.

> I have no idea if this also applies to RDF/XML. The
> relationships between URIs, IRIs, RDF, XML, UTF-8 etc are incredibly
> complex... But it might be worth trying if a URI such as
> <http://de.dbpedia.org/property/längengrad> is actually somehow allowed
> *in the RDF data model*, and how it would be serialized in the different
> RDF surface syntaxes.

Things will probably go pear shaped once the IRI has been converted to a
URI, particularly comparisons. I doubt any libraries are routinely
normalising incoming IRIs.

Damian

[1] <http://www.w3.org/TR/xml-names/#NT-NCNameStartChar>
[2] <http://www.w3.org/TR/xml11/#IDAKUDS>

Received on Tuesday, 17 November 2009 12:43:23 UTC