Re: XML Base for relative URIs: Interpretation of 5.1 of RFC 2396 from by way of on 2000-07-12 (uri@w3.org from July 2000)

From: by way of <duerst@w3.org>
Date: Wed, 12 Jul 2000 10:47:42 +0900
To: uri@w3.org
Message-Id: <4.2.0.58.J.20000712104725.0352de00@sh.w3.mag.keio.ac.jp>
On Tue, Jul 11, 2000 at 02:17:51PM +0900, Martin J. Duerst wrote:
 > Dear Members of the URI mailing list,
 >
 > An issue has recently come up in the resolution of last call
 > comments to XML Base http://www.w3.org/TR/2000/WD-xmlbase-20000607
 > (don't hesitate to read this, it's really, really short).
 >
 > First a bit of background:
 >
 > XML Base defines an attribute xml:base for XML documents
 > to allow to set the base for relative URI resolution.
 > The xml:base attribute cannot only be set at the root element
 > (i.e. for the whole document), but also on any other element.
 > In that case, the base applies for the other attributes on
 > the element, and for everything within the element.
 > If the value of xml:base itself is relative, it is in turn
 > resolved based on an xml:base higher up in the element
 > hierarchy.

In contrast to Michael Mealling, who doesn't seem to like relative URIs,
I think they are great.  Everything is relative anyway, even things that
appear to be absolute.

One of the problems I've had with relative URIs in HTML documents
is that one cannot specify multiple bases.  This makes it difficult
to use any server-side mechanism to include documents that might
have their own base.  With only one BASE provided to clients (whether
explicitly provided in the document or assumed from the context), all
relative URIs must be interpreted relative to only that one base, so
any inclusion mechanism must translate every relative URI to the base
it is intended to be relative to.   With a client-side inclusion
mechanism, such as IFRAMEs, the client can be smart enough to correctly
interpret relative URIs in the included document relative to the base
of the included document.

So I am glad to see that XML might have a nested base feature.
It is important to notice that there would still be an implicit base
for any document that applies if there is not an explicit base
specified in the document.  WD-xmlbase-20000607 says:

   The attribute xml:base may be inserted in XML documents to
   specify a base URI other than the base URI of the document
   or external entity.

Also notice that "the base URI of the document or external entity" is
derived not from the base of any document that merely *uses* that URI.  I
believe this can be concluded from a careful reading of RFC 2396.

   http://www.innosoft.com/rfc/rfc2396.html#sec-5.1

The "Base URI of the encapsulating entity", which seems to be
the bone of contention, is about an entity (as seen by a client,
not a server) that contains the document that contains the relative
URI.  The language is admittedly confusing, but reading 5.1.1,
5.1.2, and 5.1.3, it should be clear that the encapsulation is
specified on the client side before considering how the document
or its encapsulation was retrieved or obtained using the URI.

Perhaps the problem is really about whether the encapsulation
of one document by another happens on the client-side or server-side,
and moreover, as the boundary between client and server melts away
because, for example, the encapsulation of XML could be done on either
the client or the server, this distinction becomes problematic.

A solution to this problem might be to distinguish not based on client-side
vs server-side resolution but based on whether the encapsulation is
by reference or by immediate inclusion.  Any reference needs to be
interpreted relative to some context, but an immediate inclusion requires
no interpretation to find the content - i.e. here it is.

But I think this is not what is meant when considering a multi-part MIME
package of a single top-level document and several parts that are
"encapsulated" by the top-level document.  Each part is referenced, but
it is a special kind of local reference.  Instead of "here it is" we
get a local reference meaning "there it is", and "it" is contained in
the same package somehow, so no other remote resolution is required.


 > XML 1.0 also defines entities 
http://www.w3.org/TR/REC-xml#sec-physical-struct.
 > There are various kinds of entities, but relevant for this
 > discussion are both internal general entities and external
 > parsed general entities. An external parsed general entity
 > is declared e.g. as follows:
 >
 > <!ENTITY  entityName  SYSTEM "http://www.example.com/example.xml">
 >
 > An internal general entity is declared as follows:
 >
 > <!ENTITY  entityName  "entity Content">

Michael asks why the internal general entity is a problem regarding
relative URIs and bases.  I'm not sure I understand either.
A new entityName is specified here, so perhaps the question is what
name space this name is defined in, and a BASE specifies a new name
space both for resolution of name uses and for new name definitions.


 > The core of the current problem comes from the following
 > sentence [http://www.w3.org/TR/xmlbase#IDwkAq1]:
 >
 > The scope of xml:base does not extend into external entities,
 > but it does extend into internal entities.

Again, I don't know what it means to have xml:base extend to
internal entities.  It is clear what it means to have the xml:base
extend to external entities, but I don't believe it should.

An important question is "When is it useful to have the
document that references an external entity specify what
the base is for relative URIs in that external entity?"
Relative URIs that are interpreted based on how we use the
document (e.g. the external entity) containing the URIs
make them into something like "your local weather station".

I can imagine that this kind of "local" identifier would be useful,
but we don't have anything else like it now.  Even a 'news' URI
with no domain name specifies a particular newsgroup or
message, though it doesn't specify which news server to use.

Actually, there are (at least) two kinds of "local" identifier
that might be useful.  One is local to the user; the other is local
to the context of the document that uses the identifier.  The second
is the kind that is being considered here.

 > Various attempts of interpreting Section 5.1 of RFC 2396
 > (see e.g. http://www.innosoft.com/rfc/rfc2396.html#sec-5.1)
 > have been undertaken, with no clear results.
 >
 > First, Section 5.1 speaks about a single base per document,
 > having multiple bases in different areas of a document
 > doesn't seem to have been a concern, or maybe was explicitly
 > rejected.

You are right that a single base is assumed.  I don't know whether
multiple bases where considered, but the "encapsulating entity"
whatever that is starts to get at allowing multiple bases.

Look at RFC 2557 for a case of how they use encapsulation.

   http://www.innosoft.com/rfc/rfc2557.html



 > Second, Section 5.1 doesn't seem to consider the case of
 > inclusion in the way this happens with entities or similar
 > cases.

It's not terribly clear what kind of inclusion they had in mind.

 > Third, the words 'entity' and 'document' are used both
 > in XML and in RFC 2396, but it is not clear how to relate
 > these together. A document in the XML sense includes
 > all the entities (including external ones). In RFC 2396,
 > the only case that seems to have been considered is that
 > documents can be encapsulated in entities. Nevertheless,
 > an XML external entity has it's own URI, and therefore
 > in many ways behaves like an entity as described in
 > RFC 2396.

It is clear to me that we need definitions of and more consistent use
of the terms "entity", "document", "resource", "identifier" and others.
We need a better model for encapsulation and multiple contexts, and
resolution of identifiers relative to contexts.   We have many
conflicting models now, each addressing a part of the problem,
but it seems doubtful that anyone has a complete consistent model
at this time.  Building this model is the kind of thing I was hoping
a W3C URI activity would take on, but I don't care so much where
this work is done - it needs to be done anyway.


--
Daniel LaLiberte
liberte@crystaliz.com
liberte@holonexus.org
Received on Wednesday, 12 July 2000 04:20:07 UTC