Notes on XBase 07-June-2000 from Richard A. O'Keefe on 2000-08-08 (www-xml-linking-comments@w3.org from July to September 2000)

From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
Date: Tue, 8 Aug 2000 14:42:31 +1200 (NZST)
To: www-xml-linking-comments@w3.org
Message-Id: <200008080242.OAA14853@atlas.otago.ac.nz>
Comments on XML Base WD 07-June-2000.

Section 1 states that

    The purpose of XBase is to "provid[e] base URI services to XLink,
    but as a modular specification so that other XML applications
    benefitting from additional control over relative URIs but not
    built upon XLink can also make use of it."

The single new attribute 'xml:base' proposed is not sufficient.

Section 4 states that

    A.  "A relative URI appearing in text content is resolved against
	the base URI described by the xml:base attribute of the
	nearest ancestor element having an xml:base attribute".

	How is an XML processor to discern which portions of text
	are appearances of relative URIs?  If, for argument's sake,
	{x} appearing inside <e> should be so interpreted, what
	about <e>{<![CDATA[x}]></e> or <e><i>{</i><b>x</b><i>}</i><e>?

    B.	"A relative URI appearing in an attribute value is resolved
	against the base specified in the xml:base attribute appearing
	on the element owning the attribute, if one exists, otherwise
	the xml:base attribute of the nearest ancestor of the owning
	element having an xml:base attribute.  Note that this applies
	to xml:base attributes themselves."

	- The last sentence CANNOT be true; it is a vicious circle.
	  Presumably the xml:base attribute is resolved against the
	  base specificed in the xml:base attribute of the nearest
	  PROPER ancestor having such an attribute, not against itself!

	- How is an XML processor to determine WHICH attributes contain
	  relative URIs?  For example, if we have 'temp="98.6"', that
	  has the right form to be a URI, so when EXACTLY is it to be
	  resolved as a URI and when is it to be left alone as a number?

    C.	"A relative URI appearing in the content of a processing
	instruction is resolved against the base URI described by the
	xml:base attribute of the nearest ancestor element having an
	xml:base attribute."

	This does not say
	- what is to be done if there is no such ancestor element
	  (e.g., in a PI occurring before or after the root element).
	- how relative URIs appearing in the content of a processing
	  instruction are to be discerned.  Considered as a string,
	  the relative URI x.pi occurs twice in <?x.pi uri='x.pi'?>;
	  are both affected?

Each of these cases is misleading, because the rule for determing the
applicable base is NOT the stated rule.

Two of the additional rules that are needed, and that do apply in these
cases, are

    "2.	The base URI is that of the encapsulating entity (message,
	document, or none).

	What is an "encapsulating entity", exactly?  The term is not
	defined in the XML 1.0 recommendation.  What _is_ the base URI
	of a "none"?

    "3.	The base URI is that of the URI used to retrieve the entity."

	But WHICH entity is "the" entity?  Is it the entity that
	contains the root element?  Is it the external entity that
	directly contains the point in question?

	Suppose we have
	    <!-- This is in file /a.xml -->

	    <?xml version="1.0"?>
	    <!DOCTYPE root [
		<!ENTITY e "<foo my-uri='x.xml'/>">
		<!ENTITY f SYSTEM "b.xml">
		<!ELEMENT root (foo)>
		<!ELEMENT foo EMPTY>
		<!ATTLIST foo my-uri CDATA #REQUIRED>
	    ]>
	    <root>&f</root>


	    <!-- This is in file /b.xml -->
	    &e

	When an XML processor is resolving my-uri of <foo>,
	which is "the" entity?  If "the" entity is the innermost
	entity containing the relevant point, it's e, and the
	URI used to retrieve e is /a.xml.  But if it is the
	innermost EXTERNAL entity containing the relevant point,
	it's f, and the URI used to retrieve f is /b.xml.
	
	T
One interpretation of the rules for determining the applicable base can
be clarified by stating them as follows:

	Every XML document has an "element" structure and an "entity"
	structure.  Section 4.3.2 "Well-Formed Parsed Entities" of
	the XML 1.0 specification guarantees that these two structures
	are compatible.

	The context of the application determines a default base URI.

	Within the scope of an external entity or a document entity
	that was retrieved using a URI, that URI is the base URI.

	Within the scope of an element having an xml:base attribute,
	the value of that attribute is the base URI.

	Inner scopes of either kind take precedence over outer scopes
	of either kind.

	The URI value of an xml:base attribute is resolved in the
	context just outside its owning element, so does not depend
	on itself.

	The URI value of any other attribute is resolved in the
	context just inside its owning element, so does depend on
	that element's xml:base attribute if it has one.

	A relative URI appearing in a processing instruction is
	resolved in the context immediately containing that PI.
	The means by which such appearances are discerned is
	outside the scope of this recommendation.

	A relative URI appearing in text content is resolved in
	the context immediately containing that text content.
	The means by which such appearances are discerned is
	outside the scope of this recommendation.


Section 4.1 appears to mean that a URI as notated in an XML document
may use disallowed characters, but that an XML processor must convert
URI values to the proper form.  But when, exactly?  Before the URI is
used as a URI, or before any other code, including the application,
sees the text?




The major unsolved problem in this draft of XBase is
"How does an XML processor know which strings are URIs?"
In particular, how do XML processors that do not support
XSchema or XLink know which strings are URIs?

I propose the following solution for attributes only.

    2.5 xml:uri Attribute.

	The attribute xml:uri may be inserted in XML documents
	to specify which attributes of an element are to be
	interpreted as URIs and so resolved according to the
	rules in section 3.

	The value of an xml:uri attribute must match the
	Names production in the XML recommendation.  Each
	attribute of an element whose name appears in the
	value of an xml:uri attribute owned by the same
	element is to be processed as a URI.

	Example.
	<nav xml:uri='first last prev next'
             first='slide001.xml' last='slide024.xml'
             prev='slide023.xml'/>

	As the example shows, the presence of a name in the value
	of an xml:uri attribute does not mean that such an attribute
	MUST appear, only that IF it does, it has a URI as value.

	Example:
	    <!ELEMENT nav EMPTY>
	    <!ATTLIST nav
		xml:uri NMTOKENS #FIXED 'first last prev next'
		first CDATA #REQUIRED
		last  CDATA #REQUIRED
		prev  CDATA #IMPLIED
		next  CDATA #IMPLIED>


Section C leaves another major question open.

    Does an application get a resolved URI *as well as* the text
    it would have got without XBase, or *instead of* that text?

This has a major effect on the XML Infoset and Document Object Model.


The whole specification leaves it unclear just which component of an
XML-aware application is responsible for applying the XBase rules.
Suppose we have a parser communicating with an application using
something like SAX.   When the XBase draft says that "These URI
references [in HTML beyond those expressible in XLink] might be
resolved BY AN APPLICATION relative to the base URI defined by XML
Base", is that a hint that URI resolution in general is the
responsibility of an application, and that an XBase-conforming
parser need only provide the information from which resolution could
be done, rather than doing such resolution itself?
Received on Monday, 7 August 2000 22:42:37 UTC