Response to Michael Kay's XML Base comments from Richard Tobin on 2006-12-05 (public-xml-core-wg@w3.org from December 2006)

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Tue, 5 Dec 2006 16:05:42 +0000 (GMT)
To: <public-xml-core-wg@w3.org>
Message-Id: <20061205160542.D1E4E176B31@macpro.inf.ed.ac.uk>
Here are my suggested responses, interleaved with Michael's comments.

1. The rules on returning xml:base unescaped seem to have changed too
radically for an erratum: this needs the spec to be versioned.

  As we discussed, Norm will try to talk to Michael about this.

2. There are several deficiencies in the existing spec that aren't
addressed:

2a. When the spec says that the xml:base attribute "may be used", it should
make it clear that the attribute has no special status as far as DTD or XML
Schema validity checking is concerned: it may be used only if permitted by
the DTD or schema.

  I don't see any problem with adding a note to this effect.

2b. The spec doesn't say which relative URIs in a document are affected by
xml:base. Possible positions on this are

(i) no relative URI is affected by xml:base unless the relevant
specification says it is affected

(ii) relative URIs should be assumed to be affected unless the relevant
specification says otherwise

(iii) relative URIs are affected if and only if they are dereferenced.

(This is a real issue, there have been disagreements for example over
whether xml:base should affect the interpretation of schemaLocation in XML
Schema 1.0).

  RFC 3986 has a section "5.1.  Establishing a Base URI".  One of the
  ways a base URI can be established is by a "base URI embedded in
  content".  XML Base describes the syntax for embedding a base URI
  in an XML document.

  Given that it's working in this general framework, it doesn't seem
  appropriate for XML Base to say which relative URIs it affects.
  Rather it sets the base URI for a part of the document, and any
  strings within that part which are defined (by some spec) as URI
  references should be interpreted according to RFC 3986 which means
  that they use XML Base when they are resolved.

  That leaves the question of which relative references are resolved,
  and that seems to be an issue for whichever spec that defines them
  as URI references.

  What I didn't mention in the first paragraph is that RFC 3986 says
  that it's the media type of the document that determines the
  syntax used for emedding base URIs.  The new XML media type draft

   http://www.w3.org/2006/02/son-of-3023/draft-murata-kohn-lilley-xml-02.html

  (is that the latest version?) does point to XML Base for this.

2c. The spec says nothing about leading and trailing spaces in the xml:base
attribute value.

  Since it says nothing I think we must assume that no normalisation is
  done as part of XML Base processing (we had agreement on that on the
  last WG telcon I think).  Any normalisation implied by DTD attribute
  declarations is done as part of parsing, before the attribute is
  interpreted.  The question of schema normalisation seems harder:
  clearly if XML Base is used for schemaLocations, it can't take account
  of any normalisation implied by the type assigned to it in the
  as-yet unfetched schema.

2d. The spec says nothing useful about the situation where the base URI of
the document entity is unknown. (should be OK if xml:base is absolute)

  According to RFC 3986, if none of the usual mechanisms for determining
  the base URI apply, the base URI is application dependent (rather
  than there not being a base URI).  So a relative xml:base is
  resolved against an application-dependent URI, and the result will
  of course be application dependent.  I think this follows without
  XML Base having to say anything about it.

3 (Comment on XLink v1.1 5.4.1) the spec says that to convert an XML
resource identifier to an IRI Reference, the character #0 must be escaped.
This implies that the character #0 can exist in unescaped form; but it
can't.

  I think we agreed that XLink should note this.  It's conceivable
  that XML Resource Identifiers could come from some source other
  than the text of an XML document, so we shouldn't remove #x0 from
  the description.

4. It would be useful if we could all converge on the term
"percent-encoding" as used in the RFCs, rather than "escaping" which is a
much less specific term.

  XML Namespaces uses the term "%-escaping".  I'm not sure which is
  best but I will change it to one or the other.

-- Richard
Received on Tuesday, 5 December 2006 16:05:02 UTC