Re: Proposed resolution of HRRI/IRI discussion from Konrad Lanz on 2007-11-05 (public-xml-core-wg@w3.org from November 2007)

From: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
Date: Mon, 05 Nov 2007 20:36:26 +0100
To: Richard Tobin <richard@inf.ed.ac.uk>
CC: Martin Duerst <duerst@it.aoyama.ac.jp>, "Grosso, Paul" <pgrosso@ptc.com>, Richard Ishida <ishida@w3.org>, public-i18n-core@w3.org, public-xml-core-wg@w3.org, public-iri@w3.org
Message-ID: <472F70BA.4040400@iaik.tugraz.at>
Hi Richard,

Richard Tobin wrote:
>> So if an implementation receives for instance the value
>> "#xpointer(//*[@attr='%#true#%25'])" as input from the user it will
>> accept it according to A-I and not percent encode it and put it into the
>> XML document.
>>     
>
> An implementation of what?
An implementation of one of the standards currently specifying the
percent-encoding Algorithm in question and that will be potentially
referring LEIRI/HRRI in the future.

Eg. Implementations of Standards using XMLSchema and xs:anyURI which in
turn refers to XLink.

XMLDSig does both it currently specifies the percent-encoding Algorithm
in question and refers to XMLSchema's anyURI, but may in future versions
refer to XMLSchema or LEIRI/HRRI only.

>   I don't understand what you mean by "put it in the XML document".

Setting the value (Lexical representation) of an actual attribute or
text node that is to be interpreted as IRI reference contained in some
XML document conforming to one of the specifications in question.

cf. http://www.w3.org/TR/xmlschema-2/#anyURI-lexical-representation

>   The only time an XML parser is concerned
> with this is for system identifiers (which incidentally don't allow
> fragments).  The usual case will be that it is an attribute value,
> and nothing is done until some higher layer tries to use it.
>   
Exactly, but exactly these higher layers may depend on the grammar of
RFC 2732 as normatively referenced across different standards.

cf. XMLDSig verifying a signature that contains an xpointer in one of
its <ds:References>.

>> Implementations would then however on accessing the value *try* to
>> escape the value according to Assumption II and throw an error.
>>     
>
> I don't understand "try to escape it".  They should escape it
> according to the rules (which do nothing in this case) and then pass
> it to their URI library (perhaps to parse it into the URI and fragment
> parts, before retrieving the URI), which will presumably throw an
> error because it's not a legal URI.
>   
Exactly, that's what I said:
> So all the characters specified will be percent-encoded and
> *potentially* result in a valid URI reference or the generic
> URI-retrieval library would throw an error.
>
> Let's call this Assumption II (A-II).

>> I would assume however that the cost of throwing the error at this stage
>> would be higher than to escape square brackets in the fragment based on
>> the assumption that percent encoding is considerably cheaper than
>> reporting the error back to the original Author of the value.
>>     
>
> I wouldn't be surprised if some implementations do more escaping
> than they should, but are you suggesting that they get an error
> from their URI library and then try fixing it up?
>   

No, they should escape square brackets proactively just like the other
characters before being passed to some URI library, but still allow a
fragment with square brackets to be a valid LEIRI/HRRI and remain valid
URIs as in the RFC 2732 grammar.

>> I do understand however that making LEIRI and HRRI specs more tolerant
>> would make these specs more complicated.
>>     
>
> And require normative changes to all the specs, which is what we
> want to avoid.
>   

Why would allowing what was allowed by the Grammar in RFC 2732 require
to change all the specs?
I think it just requires LEIRI/HRRI to accept square brackets in the
fragment.

>> I would assume however that
>> this would be minimal if additionally only square brackets would be
>> allowed in the fragment
>>     
>
> This seems like a strange thing to do for the benefit of something
> that isn't even a recommendation.
>   
Nevertheless RFC 2732 was normatively referenced in XMLSchema and
others, and xpointers have quite some mentions in the current IRI draft.
(Xpointers are normatively referenced in some places as well although
not being a REC)
>> What I'm sill not quite sure about is if the intention currently is to
>>
>> A) Throw an error when generating the value and before putting the value
>> into the actual XML document (or to throw the error on validation) which
>> is equal to saying the value is a LEIRI/HRRI
>>     
>
> Again, I don't understand "putting the value into the actual XML
> document".  Apart from the case of system identifiers, these are
> higher-level errors.  It's just like having "select='^*)%*" in
> a stylesheet: it's the XSLT implementation that reports the error.
>   

please see above.

>> B) Throw an error on interpreting/dereferencing/absolutizing/accessing
>> the value which conforms to A-I.
>>     
>
> Throw an error when using the value in whatever way the higher-level
> spec requires.
>   

Well a lot of them would then start reject Xpointers when switching away
from RFC 2732 to LEIRI/HRRI, wouldn't they?


Konrad

-- 
Konrad Lanz, IAIK/SIC - Graz University of Technology
Inffeldgasse 16a, 8010 Graz, Austria
Tel: +43 316 873 5547
Fax: +43 316 873 5520
https://www.iaik.tugraz.at/aboutus/people/lanz
http://jce.iaik.tugraz.at

Certificate chain (including the EuroPKI root certificate):
https://europki.iaik.at/ca/europki-at/cert_download.htm
Received on Monday, 5 November 2007 19:37:32 UTC