Re: Proposed resolution of HRRI/IRI discussion from Richard Tobin on 2007-11-05 (public-xml-core-wg@w3.org from November 2007)

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Mon, 5 Nov 2007 10:44:22 +0000 (GMT)
To: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>, Richard Tobin <richard@inf.ed.ac.uk>
Cc: Martin Duerst <duerst@it.aoyama.ac.jp>, "Grosso, Paul" <pgrosso@ptc.com>, Richard Ishida <ishida@w3.org>, public-i18n-core@w3.org, public-xml-core-wg@w3.org, public-iri@w3.org
Message-Id: <20071105104422.BB75A27E967@macpro.inf.ed.ac.uk>

> Richard Tobin wrote:
> > That's an attraction of them, but as far as the XML Core WG is 
> > concerned are motivation is merely to simplify and clarify our specs,
> >  several of which describe identifiers of this kind.  So we want 
> > LEIRIs to match those definitions, and they don't currently allow 
> > square brackets unescaped.
> 
> I would argue they do allow them, obviously for IPv6 hosts and more
> interestingly in the fragment of a URI reference by referring to
> RFC 2732 and its amendment to the grammar of RFC 2396.

Sorry, I should have said the existing definitions require
implementations to leave square brakcets untouched, and not escape
them.

> My guess would be that many implementations are lenient and accept non
> percent-encoded square brackets in the fragment anyway so one might just
> as well legalize them for LEIRI / HRRI.

I'd be surprised.  I would expect them to do the %-encoding specified
and then pass the result on to a generic URI-retrieval library
function.

> > I haven't checked the ancient history of this, but even XML 1.0 2nd 
> > edition knew about RFC 2732 and excluded square brackets: 
> > http://www.w3.org/TR/2000/REC-xml-20001006#sec-external-ent
> 
> This section about escaping - found in various RECs like, XPointer
> Framework, XMLDsig, XLink, XIncude, XML Base, RDF ...- *excludes* square
> brackets [] from the list of characters that will be/should be escaped
> for some string to *potentially* be/become a URI reference.

Yes.

> What "these specifications" usually don't say is that the described
> percent-encoding is necessary, but not necessarily sufficient for
> strings to be converted to a valid URI reference. (except for RDF, XLink)

They say what the implementation should do to make them into URIs.
If after that they aren't URIs, then user has made an error.

> What "these specifications" from my reading actually do say however is
> that one does not percent-encode %, #, [ and ] at all.
> 
> And "these specifications" often do not make clear that - as we know
> today - this is to be understood as an algorithm for interpreting the
> value rather than a constraint on the value itself.

The algorithm specifies what the implementation must do in order to
produce a URI.  If that doesn't produce a URI, it's an error.  Perhaps
this is not explicit enough in all the specifications, but that's
certainly the intention.

> Unfortunately they didn't and hopefully referring to LEIRI/HRRI will
> clarify the situation assuming a value has the constraint of being a
> LEIRI/HRRI and isn't as now just interpreted as LEIRI/HRRI plus yet
> again some additional percent-encoding rules.

Yes, the idea is to say that these values must be LEIRIs.

-- Richard

Received on Monday, 5 November 2007 10:45:10 UTC