- From: Richard Tobin <richard@inf.ed.ac.uk>
- Date: Wed, 20 Apr 2005 15:15:25 +0100 (BST)
- To: public-xml-core-wg@w3.org
I had an action to draft this, so that we can discuss it and send it to the TAG. I've added the question of when %-escaping happens. The XML Core WG requests advice about the use of base URIs in the XML Infoset, following the publication of RFCs 3986 (URIs) and 3987 (IRIs). System identifiers in XML have always allowed characters that need to be escaped before the identifier is used as a URI. In fact, XML allows what are now called IRIs, with the addition that XML requires implementations to support the (optional in RFC 3987) escaping of SPACE etc. The same applies to xml:base attributes and "other XML strings meant to be used as URI references" - though it is not entirely clear what the latter means. As far as using these identifiers to retrieve documents goes, there is no problem. The escaping and absolutization rules produce the same results with the new RFCs as with the old one. But the XML Infoset exposes the base URI itself, and there are two issues with that: (1) Does %-escaping happen before or after the base URI is calculated? The XML spec currently says that escaping should happen "as late as possible" (because it is not reversible); that seems to imply that the base URI should not have escaping done, in which case strictly speaking the base URI is not a URI. (2) RFC 3986 changes the algorithm: the base URI now has any fragment component stripped, whereas it didn't before. Should we amend our specs to require the new behaviour? -- Richard
Received on Wednesday, 20 April 2005 14:15:26 UTC