Re: XML Base for relative URIs: Interpretation of 5.1 of RFC 2396

On Tue, Jul 11, 2000 at 02:17:51PM +0900, Martin J. Duerst wrote:
> An issue has recently come up in the resolution of last call
> comments to XML Base http://www.w3.org/TR/2000/WD-xmlbase-20000607
> (don't hesitate to read this, it's really, really short).
> 
> First a bit of background:
> 
> XML Base defines an attribute xml:base for XML documents
> to allow to set the base for relative URI resolution.
> The xml:base attribute cannot only be set at the root element
> (i.e. for the whole document), but also on any other element.
> In that case, the base applies for the other attributes on
> the element, and for everything within the element.
> If the value of xml:base itself is relative, it is in turn
> resolved based on an xml:base higher up in the element
> hierarchy.
> 
> XML 1.0 also defines entities http://www.w3.org/TR/REC-xml#sec-physical-struct.
> There are various kinds of entities, but relevant for this
> discussion are both internal general entities and external
> parsed general entities. An external parsed general entity
> is declared e.g. as follows:
> 
> <!ENTITY  entityName  SYSTEM "http://www.example.com/example.xml">
> 
> An internal general entity is declared as follows:
> 
> <!ENTITY  entityName  "entity Content">

Which means that an internal general entity doesn't care what
the Base is since it doesn't define by identifying but by
explicit value, right?

> In both cases, the entity is invoked by &entityName;.
> [Of course, it's not possible to use the same name for two
> different entities.]
> 
> In some way, entities are similar to C preprocessor instructions,
> internal entities correspond to #define, and external entities
> to #include. But of course it's not exactly the same.
> 
> The core of the current problem comes from the following
> sentence [http://www.w3.org/TR/xmlbase#IDwkAq1]:
> 
> The scope of xml:base does not extend into external entities,
> but it does extend into internal entities.

Why do internal entities care what the Base is?

> The alternative proposal currently under discussion would be
> to say that xml:base extends into external and internal entities.

I would suggest an alternate: external entities require an absoluteURI
as their system identifier... but I know how popular that probably is...

> The following is an example:
> 
> File /example/a.xml:
> 
> <!DOCTYPE example
> [
> <!ENTITY entity1 SYSTEM "/include/entity1.xml">
> ]
> <example xml:base='subdir1'>
> &entity1;
> </example>
> 
> 
> File /include/entity1.xml:
> 
> <a href='link.xml'>That's the question!</a>
> 
> 
> Assuming that the href attribute in the example document
> is governed by the XML Base specification, what should it
> refer to?

IMHO, nothing. It should be considered a parsing error. 
Situations like this (at least to me) seem artificial and
dangerous. To have a spec like XMLBase tear itself apart trying
to fix something like this instead of declaring it a syntax
error is the wrong thing to do. 

> If xml:base extends into external entities, it would
> refer to /example/subdir1/link.xml. If xml:base doesn't
> extend into external entities, it would refer to
> /include/link.xml.
> 
> What to do in the absence of xml:base would have to be
> aligned with the decision. This means that in the above just
> example without xml:base, href would refer to /example/link.xml
> in the case things extend into external entities, and would
> refer to /include/link.xml in the case things don't extend
> into external entities.

My suggestion of external entities requiring absoluteURIs 
sounds much cleaner and less error prone. If people are
worrying about 'document portability' then might I suggest
they look at location (network node) independent URIs.

> Various attempts of interpreting Section 5.1 of RFC 2396
> (see e.g. http://www.innosoft.com/rfc/rfc2396.html#sec-5.1)
> have been undertaken, with no clear results.
> 
> First, Section 5.1 speaks about a single base per document,
> having multiple bases in different areas of a document
> doesn't seem to have been a concern, or maybe was explicitly
> rejected.

We didn't consider that case because it really wasn't an issue
for URIs but for applications that are defining complex
documents. I.e. its not URI problem, its an XML problem. 
If XML wants to define something where Bases are
defined on a tag by tag basis then fine. If it then wants
to allow nested documents then XML must figure out
whether or not definitions pass into such nested blocks.

> Second, Section 5.1 doesn't seem to consider the case of
> inclusion in the way this happens with entities or similar
> cases.

2396 doesn't and should never define a document model (IMHO
it should have never defined relative URIs and fragments but
I lost that debate). 

> Third, the words 'entity' and 'document' are used both
> in XML and in RFC 2396, but it is not clear how to relate
> these together. 

If it did then that's probably a mistake in 2396...

> A document in the XML sense includes
> all the entities (including external ones). In RFC 2396,
> the only case that seems to have been considered is that
> documents can be encapsulated in entities. Nevertheless,
> an XML external entity has it's own URI, and therefore
> in many ways behaves like an entity as described in
> RFC 2396.

I think its a mistake to read 2396 as though it has some type of
document model in mind. No one ever thought in those terms and,
at least in my mind, the mention of documents and such were
for pedagogical reasons only. If I remember correctly the 
use of the term 'entity' in 2396 is in no way related to how 
XML or SGML use the term. By 'entity' we simply meant "some generic
thing in the metaphysical sense", not some syntactic element somewhere.

> Any opinions and any help on this issue is greatly appreciated.

You got 'em. I doubt if they were what you wanted though...

-MM

-- 
--------------------------------------------------------------------------------
Michael Mealling	|      Vote Libertarian!       | www.rwhois.net/michael
Sr. Research Engineer   |   www.ga.lp.org/gwinnett     | ICQ#:         14198821
Network Solutions	|          www.lp.org          |  michaelm@netsol.com

Received on Tuesday, 11 July 2000 09:24:40 UTC