Re: xml:base (was Re: IRI meets RDF meets HTTP redirect) from Addison Phillips on 2007-04-19 (www-international@w3.org from April to June 2007)

From: Addison Phillips <addison@yahoo-inc.com>
Date: Thu, 19 Apr 2007 15:21:45 +0100
To: Chris Lilley <chris@w3.org>
CC: Jeremy Carroll <jjc@hpl.hp.com>, Sandro Hawke <sandro@w3.org>, John Cowan <cowan@ccil.org>, semantic-web@w3.org, www-international@w3.org
Message-ID: <46277AF9.5030901@yahoo-inc.com>

I believe that your interpretation is pretty much the opinion that many 
of us have held of what the PER means. I (think) it's clear that CharMod 
thinks that's what it means (although xml:base is never directly 
referenced). See:

    http://www.w3.org/TR/charmod-resid/#C059

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.


Chris Lilley wrote:
> On Thursday, April 19, 2007, 1:26:36 PM, Jeremy wrote:
> 
> 
> JC> Oh good. So a base-uri function, which doesn't do any fetching, also 
> JC> doesn't do any %-escaping?
> 
> I would need to spec-spelunk to be sure but that would be my interpretation of the intent of the PER, yes.
> 
> Specifically, two IRIs are the same if (following use of xml:base to do relative-to-absolute) they are the same Unicode strings.
> 
> There is no need to hexify both of them, though IIRC RFC3987 does talk about doing that in theory.
> 
> JC> Jeremy
> 
> JC> Chris Lilley wrote:
>>> On Wednesday, April 18, 2007, 9:03:19 PM, Sandro wrote:
> 
>>>>> The value of an xml:base attribute is not so limited: it can contain
>>>>> (almost) arbitrary Unicode, which is %-escaped before being used
>>>>> to alter the base URI property of the element on which it appears
>>>>> and the element's children.
> 
>>> SH> Percent-escaping has got to be among the 10 most confusing and confused
>>> SH> subjects in the history of computing.   :-)
> 
>>> This is why its better if computers do it, and humans see the real characters.
> 
>>> SH> My sense is that the 2001 XML Base Recommendation [1] is very confused
>>> SH> about how to handle percent-escaping.  Of course, it long predated IRIs,
>>> SH> so this isn't so surprising.
> 
>>> I agree that the newer PER is clearer.
> 
>>> SH> There is a Proposed Edited Recommendation [2] which, to my mind, is much
>>> SH> clearer about this.  It says, essentially, don't do percent-escaping.
>>> SH> XML is safe for Unicode, so just use Unicode.
> 
>>> Which is pretty much what
> 
>>>   The set of characters allowed in xml:base attributes is the same as
>>>   for XML, namely [Unicode]. However, some Unicode characters are
>>>   disallowed from URI references, and thus processors must encode and
>>>   escape these characters to obtain a valid URI reference from the
>>>   attribute value.
> 
>>> says. The improvement in the PER is to clarify that the 'processor' is
>>> the software which reads the XML attribute value and constructs a URI
>>> to fetch; not, as it could be read, the software which creates the XML
>>> document.
> 
> 
> 
> 
> 
> 
>

Received on Thursday, 19 April 2007 14:25:43 UTC