Re: xml:base (was Re: IRI meets RDF meets HTTP redirect) from Chris Lilley on 2007-04-19 (semantic-web@w3.org from April 2007)

From: Chris Lilley <chris@w3.org>
Date: Thu, 19 Apr 2007 15:57:17 +0200
To: Jeremy Carroll <jjc@hpl.hp.com>
Cc: Sandro Hawke <sandro@w3.org>, John Cowan <cowan@ccil.org>, <semantic-web@w3.org>, <www-international@w3.org>
Message-ID: <372120749.20070419155717@w3.org>

On Thursday, April 19, 2007, 1:26:36 PM, Jeremy wrote:


JC> Oh good. So a base-uri function, which doesn't do any fetching, also 
JC> doesn't do any %-escaping?

I would need to spec-spelunk to be sure but that would be my interpretation of the intent of the PER, yes.

Specifically, two IRIs are the same if (following use of xml:base to do relative-to-absolute) they are the same Unicode strings.

There is no need to hexify both of them, though IIRC RFC3987 does talk about doing that in theory.

JC> Jeremy

JC> Chris Lilley wrote:
>> On Wednesday, April 18, 2007, 9:03:19 PM, Sandro wrote:

>>>> The value of an xml:base attribute is not so limited: it can contain
>>>> (almost) arbitrary Unicode, which is %-escaped before being used
>>>> to alter the base URI property of the element on which it appears
>>>> and the element's children.

>> SH> Percent-escaping has got to be among the 10 most confusing and confused
>> SH> subjects in the history of computing.   :-)

>> This is why its better if computers do it, and humans see the real characters.

>> SH> My sense is that the 2001 XML Base Recommendation [1] is very confused
>> SH> about how to handle percent-escaping.  Of course, it long predated IRIs,
>> SH> so this isn't so surprising.

>> I agree that the newer PER is clearer.

>> SH> There is a Proposed Edited Recommendation [2] which, to my mind, is much
>> SH> clearer about this.  It says, essentially, don't do percent-escaping.
>> SH> XML is safe for Unicode, so just use Unicode.

>> Which is pretty much what

>>   The set of characters allowed in xml:base attributes is the same as
>>   for XML, namely [Unicode]. However, some Unicode characters are
>>   disallowed from URI references, and thus processors must encode and
>>   escape these characters to obtain a valid URI reference from the
>>   attribute value.

>> says. The improvement in the PER is to clarify that the 'processor' is
>> the software which reads the XML attribute value and constructs a URI
>> to fetch; not, as it could be read, the software which creates the XML
>> document.







-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG

Received on Thursday, 19 April 2007 13:57:46 UTC