W3C home > Mailing lists > Public > www-tag@w3.org > April 2005

Re: Question for TAG about base URIs

From: Chris Lilley <chris@w3.org>
Date: Sat, 23 Apr 2005 00:17:51 +0200
Message-ID: <1323067564.20050423001751@w3.org>
To: Richard Tobin <richard@inf.ed.ac.uk>
Cc: www-tag@w3.org

On Wednesday, April 20, 2005, 6:50:49 PM, Richard wrote:


RT> The XML Core WG requests advice about the use of base URIs in the
RT> XML Infoset, following the publication of RFCs 3986 (URIs) and
RT> 3987 (IRIs).

RT> System identifiers in XML have always allowed characters that need to
RT> be escaped before the identifier is used as a URI.  In fact, XML
RT> allows what are now called IRIs, with the addition that XML requires
RT> implementations to support the (optional in RFC 3987) escaping of
RT> SPACE etc.

RT> The same applies to xml:base attributes and "other XML strings meant
RT> to be used as URI references" - though it is not entirely clear what
RT> the latter means.

The cumbersome phrase can usefully be replaced by "IRI".

RT> As far as using these identifiers to retrieve documents goes, there is
RT> no problem.  The escaping and absolutization rules produce the same
RT> results with the new RFCs as with the old one.  But the XML Infoset
RT> exposes the base URI itself, and there are two issues with that:

RT> (1) Does %-escaping happen before or after the base URI is calculated?
RT>     The XML spec currently says that escaping should happen "as late
RT>     as possible" (because it is not reversible); that seems to imply
RT>     that the base URI should not have escaping done, in which case
RT>     strictly speaking the base URI is not a URI.

Right. The base is an IRI, and it can be combined with a relative IRI to
produce and absolute IRI which then (for example, for dereferecing over
a transport that does not directly support IRIs) may need to be escaped.


RT> (2) RFC 3986 changes the algorithm: the base URI now has any fragment
RT>     component stripped, whereas it didn't before.  Should we amend our
RT>     specs to require the new behaviour?

Yes. In practice I suspect that

a) there are few base URIs with a fragment in the wild
b) that implementations probably differ when faced with
http://example.org/toto/blah.xml#foo
and
"../bar"

You may want to have a look at the very brief, but useful,
http://www.w3.org/TR/2004/CR-charmod-resid-20041122/
if you have not already.

-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
Received on Friday, 22 April 2005 22:17:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:34 GMT