RE: Comments on new XML Base draft from Michael Kay on 2006-12-08 (www-xml-linking-comments@w3.org from October to December 2006)

From: Michael Kay <mike@saxonica.com>
Date: Fri, 8 Dec 2006 16:46:54 -0000
To: "'Richard Tobin'" <richard@inf.ed.ac.uk>
Cc: <www-xml-linking-comments@w3.org>
Message-ID: <010201c71ae8$779a8a60$6401a8c0@turtle>
Thank you for this detailed response, which is entirely acceptable.

Michael Kay
 

> -----Original Message-----
> From: Richard Tobin [mailto:richard@inf.ed.ac.uk] 
> Sent: 08 December 2006 16:35
> To: Michael Kay
> Cc: www-xml-linking-comments@w3.org
> Subject: Re: Comments on new XML Base draft
> 
> Thank you for your comments on the public XML Base draft.
> 
> The XML Core WG has considered your comments and come to the following
> conclusions:
> 
> > 1. The rules on returning xml:base unescaped seem to have 
> changed too 
> > radically for an erratum: this needs the spec to be versioned.
> 
> We don't regard this as a change, rather as something that 
> was not specified in the original.  As you say, the phrase 
> "processors must encode and escape these characters ..." 
> could be taken as implying that the XML Base processor must 
> do this before returning a value, but the sentence continues 
> "... to obtain a valid URI reference" and this could simply 
> be a statement of what must be done in order to use the value 
> for retrieval.  At least some specifications referring to XML 
> Base already expect to be able to obtain unescaped values, in 
> particular the Infoset, which says:
> 
>  These (i.e. base URI properties) are computed according to 
> [XML Base] ...
>  The value of these properties does not reflect any URI 
> escaping that may  be required for retrieval of the resource
> 
> and the XSLT2 family of specs, whose base-uri property "may 
> contain Unicode characters that are not allowed in URIs" 
> (Data Model, 6.1.3).
> 
> Finally, the rule is a "should" rather than a "must" so 
> existing implementations may consider that 
> backward-copmatibility is a good enough reason to disobey it.
> 
> > 2. There are several deficiencies in the existing spec that aren't
> > addressed:
> > 
> > 2a. When the spec says that the xml:base attribute "may be 
> used", it 
> > should make it clear that the attribute has no special 
> status as far 
> > as DTD or XML Schema validity checking is concerned: it may be used 
> > only if permitted by the DTD or schema.
> 
> We have added a note as follows:
> 
>   This specification does not give the xml:base attribute any special
>   status as far as XML validity is concerned. In a valid document the
>   attribute must be declared in the DTD, and similar considerations
>   apply to other schema languages.
> 
> > 2b. The spec doesn't say which relative URIs in a document are 
> > affected by xml:base. Possible positions on this are
> > 
> > (i) no relative URI is affected by xml:base unless the relevant 
> > specification says it is affected
> > 
> > (ii) relative URIs should be assumed to be affected unless the 
> > relevant specification says otherwise
> > 
> > (iii) relative URIs are affected if and only if they are 
> dereferenced.
> > 
> > (This is a real issue, there have been disagreements for 
> example over 
> > whether xml:base should affect the interpretation of 
> schemaLocation in 
> > XML Schema 1.0).
> 
> RFC 3986 has a section "5.1.  Establishing a Base URI".  One 
> of the ways a base URI can be established is by a "base URI 
> embedded in content".  XML Base fits into this framework by 
> describing the syntax for embedding a base URI in an XML document.
> 
> In this view, XML Base sets the base URI for a part of the 
> document, and any strings within that part which are defined 
> (by some spec) as URI references should be interpreted 
> according to RFC 3986 which means that they use XML Base when 
> they are resolved.
> 
> That leaves the question of which relative references are 
> resolved, and that seems to be an issue for whichever spec 
> that defines them as URI references.
> 
> RFC 3986 says that it's the media type of the document that 
> determines the syntax used for emedding base URIs.  The new 
> XML media type draft
> 
>    
> http://www.w3.org/2006/02/son-of-3023/draft-murata-kohn-lilley
> -xml-02.html
> 
> does point to XML Base for this, and we have added a sentence 
> to the introduction noting this:
> 
>   It is expected that a future RFC for XML Media Types will specify
>   XML Base as the mechanism for establishing base URIs in the media
>   types is defines.
> 
> > 2c. The spec says nothing about leading and trailing spaces in the 
> > xml:base attribute value.
> 
> We concluded that since XML Base says nothing about this, we 
> must assume that no normalisation is done as part of XML Base 
> processing But any normalisation implied by DTD attribute 
> declarations is done as part of parsing, before the attribute 
> is interpreted.
> 
> There is an issue with schema languages that may change 
> normalise attribute values.  For example, if XML Base is used 
> for schemaLocations, it can't take account of any 
> normalisation implied by the type assigned to it in the 
> as-yet unfetched schema.
> 
> We don't propose to make any changes in XML Base on this 
> subject before issuing a PER.
> 
> > 2d. The spec says nothing useful about the situation where the base 
> > URI of the document entity is unknown. (should be OK if xml:base is 
> > absolute)
> 
> According to RFC 3986, if none of the usual mechanisms for 
> determining the base URI apply, the base URI is application 
> dependent (rather than there not being a base URI).  So a 
> relative xml:base is resolved against an 
> application-dependent URI, and the result will of course be 
> application dependent.  I think this follows without XML Base 
> having to say anything about it.
> 
> > 3 (Comment on XLink v1.1 5.4.1) the spec says that to 
> convert an XML 
> > resource identifier to an IRI Reference, the character #0 
> must be escaped.
> > This implies that the character #0 can exist in unescaped 
> form; but it 
> > can't.
> 
> We will treat this as a comment on XLink.  It's conceivable 
> that XML Resource Identifiers could come from some source 
> other than the text of an XML document, so we probably won't 
> remove #x0 from the description, but will perhaps add a note about it.
> 
> > 4. It would be useful if we could all converge on the term 
> > "percent-encoding" as used in the RFCs, rather than 
> "escaping" which 
> > is a much less specific term.
> 
> The paragraph in 3.1 has been changed to read:
> 
>   The value of an xml:base attribute is an XML Resource Identifier,
>   and may contain characters not allowed in URIs. These 
> characters must
>   be escaped by percent-encoding as described in [XLink11] before the
>   value is used for retrieval of a resource. In accordance with the
>   principle that this percent-encoding must occur as late as 
> possible in
>   the processing chain, applications which provide access to the base
>   URI of an element should calculate and return the value without
>   escaping.
> 
> Please let us know whether you are satisfied with our responses.
> 
> -- Richard Tobin, on behalf of the XML Core WG
Received on Friday, 8 December 2006 16:47:27 UTC