Question for TAG about base URIs from Richard Tobin on 2005-04-20 (www-tag@w3.org from April 2005)

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Wed, 20 Apr 2005 17:50:49 +0100 (BST)
To: www-tag@w3.org
Message-Id: <20050420165049.F0E0129308C@macintosh.inf.ed.ac.uk>

The XML Core WG requests advice about the use of base URIs in the
XML Infoset, following the publication of RFCs 3986 (URIs) and
3987 (IRIs).

System identifiers in XML have always allowed characters that need to
be escaped before the identifier is used as a URI.  In fact, XML
allows what are now called IRIs, with the addition that XML requires
implementations to support the (optional in RFC 3987) escaping of
SPACE etc.

The same applies to xml:base attributes and "other XML strings meant
to be used as URI references" - though it is not entirely clear what
the latter means.

As far as using these identifiers to retrieve documents goes, there is
no problem.  The escaping and absolutization rules produce the same
results with the new RFCs as with the old one.  But the XML Infoset
exposes the base URI itself, and there are two issues with that:

(1) Does %-escaping happen before or after the base URI is calculated?
    The XML spec currently says that escaping should happen "as late
    as possible" (because it is not reversible); that seems to imply
    that the base URI should not have escaping done, in which case
    strictly speaking the base URI is not a URI.

(2) RFC 3986 changes the algorithm: the base URI now has any fragment
    component stripped, whereas it didn't before.  Should we amend our
    specs to require the new behaviour?

-- Richard

Received on Wednesday, 20 April 2005 16:50:51 UTC