Re: Comments on new XML Base draft

Thank you for your comments on the public XML Base draft.

The XML Core WG has considered your comments and come to the following
conclusions:

> 1. The rules on returning xml:base unescaped seem to have changed too
> radically for an erratum: this needs the spec to be versioned.

We don't regard this as a change, rather as something that was not
specified in the original.  As you say, the phrase "processors must
encode and escape these characters ..." could be taken as implying
that the XML Base processor must do this before returning a value, but
the sentence continues "... to obtain a valid URI reference" and this
could simply be a statement of what must be done in order to use the
value for retrieval.  At least some specifications referring to XML
Base already expect to be able to obtain unescaped values, in
particular the Infoset, which says:

 These (i.e. base URI properties) are computed according to [XML Base] ...
 The value of these properties does not reflect any URI escaping that may
 be required for retrieval of the resource

and the XSLT2 family of specs, whose base-uri property "may contain
Unicode characters that are not allowed in URIs" (Data Model, 6.1.3).

Finally, the rule is a "should" rather than a "must" so existing
implementations may consider that backward-copmatibility is a good
enough reason to disobey it.

> 2. There are several deficiencies in the existing spec that aren't
> addressed:
> 
> 2a. When the spec says that the xml:base attribute "may be used", it should
> make it clear that the attribute has no special status as far as DTD or XML
> Schema validity checking is concerned: it may be used only if permitted by
> the DTD or schema.

We have added a note as follows:

  This specification does not give the xml:base attribute any special
  status as far as XML validity is concerned. In a valid document the
  attribute must be declared in the DTD, and similar considerations
  apply to other schema languages.

> 2b. The spec doesn't say which relative URIs in a document are affected by
> xml:base. Possible positions on this are
> 
> (i) no relative URI is affected by xml:base unless the relevant
> specification says it is affected
> 
> (ii) relative URIs should be assumed to be affected unless the relevant
> specification says otherwise
> 
> (iii) relative URIs are affected if and only if they are dereferenced.
> 
> (This is a real issue, there have been disagreements for example over
> whether xml:base should affect the interpretation of schemaLocation in XML
> Schema 1.0).

RFC 3986 has a section "5.1.  Establishing a Base URI".  One of the
ways a base URI can be established is by a "base URI embedded in
content".  XML Base fits into this framework by describing the syntax
for embedding a base URI in an XML document.

In this view, XML Base sets the base URI for a part of the document,
and any strings within that part which are defined (by some spec) as
URI references should be interpreted according to RFC 3986 which means
that they use XML Base when they are resolved.

That leaves the question of which relative references are resolved,
and that seems to be an issue for whichever spec that defines them
as URI references.

RFC 3986 says that it's the media type of the document that determines
the syntax used for emedding base URIs.  The new XML media type draft

   http://www.w3.org/2006/02/son-of-3023/draft-murata-kohn-lilley-xml-02.html

does point to XML Base for this, and we have added a sentence to the
introduction noting this:

  It is expected that a future RFC for XML Media Types will specify
  XML Base as the mechanism for establishing base URIs in the media
  types is defines.

> 2c. The spec says nothing about leading and trailing spaces in the xml:base
> attribute value.

We concluded that since XML Base says nothing about this, we must
assume that no normalisation is done as part of XML Base processing
But any normalisation implied by DTD attribute declarations is done as
part of parsing, before the attribute is interpreted.

There is an issue with schema languages that may change normalise
attribute values.  For example, if XML Base is used for
schemaLocations, it can't take account of any normalisation implied by
the type assigned to it in the as-yet unfetched schema.

We don't propose to make any changes in XML Base on this subject before
issuing a PER.

> 2d. The spec says nothing useful about the situation where the base URI of
> the document entity is unknown. (should be OK if xml:base is absolute)

According to RFC 3986, if none of the usual mechanisms for determining
the base URI apply, the base URI is application dependent (rather
than there not being a base URI).  So a relative xml:base is
resolved against an application-dependent URI, and the result will
of course be application dependent.  I think this follows without
XML Base having to say anything about it.

> 3 (Comment on XLink v1.1 5.4.1) the spec says that to convert an XML
> resource identifier to an IRI Reference, the character #0 must be escaped.
> This implies that the character #0 can exist in unescaped form; but it
> can't.

We will treat this as a comment on XLink.  It's conceivable that XML
Resource Identifiers could come from some source other than the text
of an XML document, so we probably won't remove #x0 from the
description, but will perhaps add a note about it.

> 4. It would be useful if we could all converge on the term
> "percent-encoding" as used in the RFCs, rather than "escaping" which is a
> much less specific term.

The paragraph in 3.1 has been changed to read:

  The value of an xml:base attribute is an XML Resource Identifier,
  and may contain characters not allowed in URIs. These characters must
  be escaped by percent-encoding as described in [XLink11] before the
  value is used for retrieval of a resource. In accordance with the
  principle that this percent-encoding must occur as late as possible in
  the processing chain, applications which provide access to the base
  URI of an element should calculate and return the value without
  escaping.

Please let us know whether you are satisfied with our responses.

-- Richard Tobin, on behalf of the XML Core WG

Received on Friday, 8 December 2006 16:33:49 UTC