- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 18 Apr 2007 15:03:19 -0400
- To: John Cowan <cowan@ccil.org>
- Cc: Jeremy Carroll <jjc@hpl.hp.com>, semantic-web@w3.org, www-international@w3.org
> Jeremy Carroll scripsit:
>
> > Sandro Hawke wrote:
> > >Of course, if you *want* the base end with "résumé" you're out of luck,
> > >since XML Base [1] says you can only use a URI. But at least you've
> > >avoided the dilemma.
> >
> > Yes I like using xml:base as much as possible.
> > (And I think xml:base does allow non-ASCII chars since it tells
> > applications how to % encode them)
>
> There are two different questions here: what characters can appear
> in a [base URI] Infoset property, and what characters can appear
> in an xml:base attribute value?
>
> The [base URI] property of a document, element, or PI is a URI;
> as such, it can only make use of a limited repertoire, a subset
> of ASCII characters.
>
> The value of an xml:base attribute is not so limited: it can contain
> (almost) arbitrary Unicode, which is %-escaped before being used
> to alter the base URI property of the element on which it appears
> and the element's children.
Percent-escaping has got to be among the 10 most confusing and confused
subjects in the history of computing. :-)
My sense is that the 2001 XML Base Recommendation [1] is very confused
about how to handle percent-escaping. Of course, it long predated IRIs,
so this isn't so surprising.
There is a Proposed Edited Recommendation [2] which, to my mind, is much
clearer about this. It says, essentially, don't do percent-escaping.
XML is safe for Unicode, so just use Unicode. (As I understand it, this
new draft is waiting to see what happens with HRRIs [3] before
proceeding. HRRIs are one step past IRIs in also allowing the ASCII
characters people use that IRIs don't allow, like " " and "<".)
-- Sandro
[1] http://www.w3.org/TR/xmlbase/
[2] http://www.w3.org/TR/2006/PER-xmlbase-20061220
[3] http://www.w3.org/XML/Group/2007/03/xmlresourceid/
Received on Wednesday, 18 April 2007 19:03:32 UTC