- From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
- Date: Tue, 8 Aug 2000 14:42:31 +1200 (NZST)
- To: www-xml-linking-comments@w3.org
Comments on XML Base WD 07-June-2000.
Section 1 states that
The purpose of XBase is to "provid[e] base URI services to XLink,
but as a modular specification so that other XML applications
benefitting from additional control over relative URIs but not
built upon XLink can also make use of it."
The single new attribute 'xml:base' proposed is not sufficient.
Section 4 states that
A. "A relative URI appearing in text content is resolved against
the base URI described by the xml:base attribute of the
nearest ancestor element having an xml:base attribute".
How is an XML processor to discern which portions of text
are appearances of relative URIs? If, for argument's sake,
{x} appearing inside <e> should be so interpreted, what
about <e>{<![CDATA[x}]></e> or <e><i>{</i><b>x</b><i>}</i><e>?
B. "A relative URI appearing in an attribute value is resolved
against the base specified in the xml:base attribute appearing
on the element owning the attribute, if one exists, otherwise
the xml:base attribute of the nearest ancestor of the owning
element having an xml:base attribute. Note that this applies
to xml:base attributes themselves."
- The last sentence CANNOT be true; it is a vicious circle.
Presumably the xml:base attribute is resolved against the
base specificed in the xml:base attribute of the nearest
PROPER ancestor having such an attribute, not against itself!
- How is an XML processor to determine WHICH attributes contain
relative URIs? For example, if we have 'temp="98.6"', that
has the right form to be a URI, so when EXACTLY is it to be
resolved as a URI and when is it to be left alone as a number?
C. "A relative URI appearing in the content of a processing
instruction is resolved against the base URI described by the
xml:base attribute of the nearest ancestor element having an
xml:base attribute."
This does not say
- what is to be done if there is no such ancestor element
(e.g., in a PI occurring before or after the root element).
- how relative URIs appearing in the content of a processing
instruction are to be discerned. Considered as a string,
the relative URI x.pi occurs twice in <?x.pi uri='x.pi'?>;
are both affected?
Each of these cases is misleading, because the rule for determing the
applicable base is NOT the stated rule.
Two of the additional rules that are needed, and that do apply in these
cases, are
"2. The base URI is that of the encapsulating entity (message,
document, or none).
What is an "encapsulating entity", exactly? The term is not
defined in the XML 1.0 recommendation. What _is_ the base URI
of a "none"?
"3. The base URI is that of the URI used to retrieve the entity."
But WHICH entity is "the" entity? Is it the entity that
contains the root element? Is it the external entity that
directly contains the point in question?
Suppose we have
<!-- This is in file /a.xml -->
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY e "<foo my-uri='x.xml'/>">
<!ENTITY f SYSTEM "b.xml">
<!ELEMENT root (foo)>
<!ELEMENT foo EMPTY>
<!ATTLIST foo my-uri CDATA #REQUIRED>
]>
<root>&f</root>
<!-- This is in file /b.xml -->
&e
When an XML processor is resolving my-uri of <foo>,
which is "the" entity? If "the" entity is the innermost
entity containing the relevant point, it's e, and the
URI used to retrieve e is /a.xml. But if it is the
innermost EXTERNAL entity containing the relevant point,
it's f, and the URI used to retrieve f is /b.xml.
T
One interpretation of the rules for determining the applicable base can
be clarified by stating them as follows:
Every XML document has an "element" structure and an "entity"
structure. Section 4.3.2 "Well-Formed Parsed Entities" of
the XML 1.0 specification guarantees that these two structures
are compatible.
The context of the application determines a default base URI.
Within the scope of an external entity or a document entity
that was retrieved using a URI, that URI is the base URI.
Within the scope of an element having an xml:base attribute,
the value of that attribute is the base URI.
Inner scopes of either kind take precedence over outer scopes
of either kind.
The URI value of an xml:base attribute is resolved in the
context just outside its owning element, so does not depend
on itself.
The URI value of any other attribute is resolved in the
context just inside its owning element, so does depend on
that element's xml:base attribute if it has one.
A relative URI appearing in a processing instruction is
resolved in the context immediately containing that PI.
The means by which such appearances are discerned is
outside the scope of this recommendation.
A relative URI appearing in text content is resolved in
the context immediately containing that text content.
The means by which such appearances are discerned is
outside the scope of this recommendation.
Section 4.1 appears to mean that a URI as notated in an XML document
may use disallowed characters, but that an XML processor must convert
URI values to the proper form. But when, exactly? Before the URI is
used as a URI, or before any other code, including the application,
sees the text?
The major unsolved problem in this draft of XBase is
"How does an XML processor know which strings are URIs?"
In particular, how do XML processors that do not support
XSchema or XLink know which strings are URIs?
I propose the following solution for attributes only.
2.5 xml:uri Attribute.
The attribute xml:uri may be inserted in XML documents
to specify which attributes of an element are to be
interpreted as URIs and so resolved according to the
rules in section 3.
The value of an xml:uri attribute must match the
Names production in the XML recommendation. Each
attribute of an element whose name appears in the
value of an xml:uri attribute owned by the same
element is to be processed as a URI.
Example.
<nav xml:uri='first last prev next'
first='slide001.xml' last='slide024.xml'
prev='slide023.xml'/>
As the example shows, the presence of a name in the value
of an xml:uri attribute does not mean that such an attribute
MUST appear, only that IF it does, it has a URI as value.
Example:
<!ELEMENT nav EMPTY>
<!ATTLIST nav
xml:uri NMTOKENS #FIXED 'first last prev next'
first CDATA #REQUIRED
last CDATA #REQUIRED
prev CDATA #IMPLIED
next CDATA #IMPLIED>
Section C leaves another major question open.
Does an application get a resolved URI *as well as* the text
it would have got without XBase, or *instead of* that text?
This has a major effect on the XML Infoset and Document Object Model.
The whole specification leaves it unclear just which component of an
XML-aware application is responsible for applying the XBase rules.
Suppose we have a parser communicating with an application using
something like SAX. When the XBase draft says that "These URI
references [in HTML beyond those expressible in XLink] might be
resolved BY AN APPLICATION relative to the base URI defined by XML
Base", is that a hint that URI resolution in general is the
responsibility of an application, and that an XBase-conforming
parser need only provide the information from which resolution could
be done, rather than doing such resolution itself?
Received on Monday, 7 August 2000 22:42:37 UTC