Re: Proposed resolution of HRRI/IRI discussion from Richard Tobin on 2007-11-05 (public-i18n-core@w3.org from October to December 2007)

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Mon, 5 Nov 2007 15:03:12 +0000 (GMT)
To: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>, Richard Tobin <richard@inf.ed.ac.uk>
Cc: Martin Duerst <duerst@it.aoyama.ac.jp>, "Grosso, Paul" <pgrosso@ptc.com>, Richard Ishida <ishida@w3.org>, public-i18n-core@w3.org, public-xml-core-wg@w3.org, public-iri@w3.org
Message-Id: <20071105150312.84E5827EADC@macpro.inf.ed.ac.uk>

> > Sorry, I should have said the existing definitions require 
> > implementations to leave square brakcets untouched, and not escape 
> > them.
> 
> Let's assume this is what the specs currently say (not LEIRI/HRRI as
> they will be referred to in a not yet defined way) and let's call this
> Assumption I (A-I).
> 
> > [...] I would expect them to do the %-encoding specified and then
> > pass the result on to a generic URI-retrieval library function.
> 
> So all the characters specified will be percent-encoded and
> *potentially* result in a valid URI reference or the generic
> URI-retrieval library would throw an error.
> 
> Let's call this Assumption II (A-II).
...
> So if an implementation receives for instance the value
> "#xpointer(//*[@attr='%#true#%25'])" as input from the user it will
> accept it according to A-I and not percent encode it and put it into the
> XML document.

An implementation of what?  I don't understand what you mean by
"put it in the XML document".  The only time an XML parser is concerned
with this is for system identifiers (which incidentally don't allow
fragments).  The usual case will be that it is an attribute value,
and nothing is done until some higher layer tries to use it.

> Implementations would then however on accessing the value *try* to
> escape the value according to Assumption II and throw an error.

I don't understand "try to escape it".  They should escape it
according to the rules (which do nothing in this case) and then pass
it to their URI library (perhaps to parse it into the URI and fragment
parts, before retrieving the URI), which will presumably throw an
error because it's not a legal URI.

> I would assume however that the cost of throwing the error at this stage
> would be higher than to escape square brackets in the fragment based on
> the assumption that percent encoding is considerably cheaper than
> reporting the error back to the original Author of the value.

I wouldn't be surprised if some implementations do more escaping
than they should, but are you suggesting that they get an error
from their URI library and then try fixing it up?

> I do understand however that making LEIRI and HRRI specs more tolerant
> would make these specs more complicated.

And require normative changes to all the specs, which is what we
want to avoid.

> I would assume however that
> this would be minimal if additionally only square brackets would be
> allowed in the fragment

This seems like a strange thing to do for the benefit of something
that isn't even a recommendation.

> What I'm sill not quite sure about is if the intention currently is to
> 
> A) Throw an error when generating the value and before putting the value
> into the actual XML document (or to throw the error on validation) which
> is equal to saying the value is a LEIRI/HRRI

Again, I don't understand "putting the value into the actual XML
document".  Apart from the case of system identifiers, these are
higher-level errors.  It's just like having "select='^*)%*" in
a stylesheet: it's the XSLT implementation that reports the error.

> B) Throw an error on interpreting/dereferencing/absolutizing/accessing
> the value which conforms to A-I.

Throw an error when using the value in whatever way the higher-level
spec requires.

-- Richard

Received on Monday, 5 November 2007 15:03:51 UTC