- From: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
- Date: Fri, 02 Nov 2007 15:47:49 +0100
- To: Richard Tobin <richard@inf.ed.ac.uk>
- CC: Martin Duerst <duerst@it.aoyama.ac.jp>, "Grosso, Paul" <pgrosso@ptc.com>, Richard Ishida <ishida@w3.org>, public-i18n-core@w3.org, public-xml-core-wg@w3.org, public-iri@w3.org
- Message-ID: <472B3895.1020709@iaik.tugraz.at>
Martin Duerst wrote: >> My understanding was that one of the motivations behind Legacy >> Extended IRIs was to allow as little escaping as possible e.g. in >> XPointers. One could potentially go as far as replacing % by %25 iff % was not followed by two hex characters and hence it is no percent-encoding. Or percent-encoding each # but the first one from the left ... Well my understanding would be to percent-encode "as little as reasonably possible" and that would include square brackets in the fragment. Richard Tobin wrote: > That's an attraction of them, but as far as the XML Core WG is > concerned are motivation is merely to simplify and clarify our specs, > several of which describe identifiers of this kind. So we want > LEIRIs to match those definitions, and they don't currently allow > square brackets unescaped. I would argue they do allow them, obviously for IPv6 hosts and more interestingly in the fragment of a URI reference by referring to RFC 2732 and its amendment to the grammar of RFC 2396. My guess would be that many implementations are lenient and accept non percent-encoded square brackets in the fragment anyway so one might just as well legalize them for LEIRI / HRRI. > I haven't checked the ancient history of this, but even XML 1.0 2nd > edition knew about RFC 2732 and excluded square brackets: > http://www.w3.org/TR/2000/REC-xml-20001006#sec-external-ent This section about escaping - found in various RECs like, XPointer Framework, XMLDsig, XLink, XIncude, XML Base, RDF ...- *excludes* square brackets [] from the list of characters that will be/should be escaped for some string to *potentially* be/become a URI reference. http://www.ietf.org/internet-drafts/draft-walsh-tobin-hrri-01.txt : > ... which allow the use of characters which must be escaped in a > legal IRI, such as delimiters and a few other ASCII characters. > Examples include XML System Identifiers[4], the href attribute in > XLink[5], and XML Base attributes[6]. These specifications all > describe, with slightly different wording, the same algorithm for > converting that string to a URI or IRI. What "these specifications" usually don't say is that the described percent-encoding is necessary, but not necessarily sufficient for strings to be converted to a valid URI reference. (except for RDF, XLink) What "these specifications" from my reading actually do say however is that one does not percent-encode %, #, [ and ] at all. And "these specifications" often do not make clear that - as we know today - this is to be understood as an algorithm for interpreting the value rather than a constraint on the value itself. Although this is not entirely true as the grammar of RFC 2396 + RFC 2732 / RFC 3986 still reflect back on the set of allowable values via the interpretation. Otherwise #xpointer(//*[@atr='#%%%']) would be valid until it is actually dereferenced and one would have to distinguish between allowable and interpretable values. Which I would find very confusing. What "these specifications" should have said however is that there is a constraint to percent-encode %, #, [ and ] unless used for percent-encoding itself, separating the fragment or delimiting an IPv6 host respectively. Or better: %, #, [ and ] are not percent-encoded before the interpretation, because they are assumed to be only used for percent-encoding, separating the fragment or delimiting an IPv6 host respectively and hence must be escaped if used otherwise as data. Unfortunately they didn't and hopefully referring to LEIRI/HRRI will clarify the situation assuming a value has the constraint of being a LEIRI/HRRI and isn't as now just interpreted as LEIRI/HRRI plus yet again some additional percent-encoding rules. The quietness in "these specifications" from my point of view is confusing but no big problem for % and # when one looks at the grammar for RFC 2396 amended by RFC 2732 and hence recognizes that % can only be used for percent encoding and # only for separating the fragment. The subtle difference with square brackets [] however is that the grammar in RFC 2732 actually allows them in fragments, whereas it's prose didn't and is hence at least to some extend ambiguous and so are the referring specifications. Konrad -- Konrad Lanz, IAIK/SIC - Graz University of Technology Inffeldgasse 16a, 8010 Graz, Austria Tel: +43 316 873 5547 Fax: +43 316 873 5520 https://www.iaik.tugraz.at/aboutus/people/lanz http://jce.iaik.tugraz.at Certificate chain (including the EuroPKI root certificate): https://europki.iaik.at/ca/europki-at/cert_download.htm -- Konrad Lanz, IAIK/SIC - Graz University of Technology Inffeldgasse 16a, 8010 Graz, Austria Tel: +43 316 873 5547 Fax: +43 316 873 5520 https://www.iaik.tugraz.at/aboutus/people/lanz http://jce.iaik.tugraz.at Certificate chain (including the EuroPKI root certificate): https://europki.iaik.at/ca/europki-at/cert_download.htm
Received on Friday, 2 November 2007 14:48:24 UTC