- From: Michael Mealling <michael@bailey.dscga.com>
- Date: Tue, 11 Jul 2000 09:14:04 -0400
- To: "Martin J. Duerst" <duerst@w3.org>
- Cc: uri@w3.org, www-xml-linking-comments@w3.org
On Tue, Jul 11, 2000 at 02:17:51PM +0900, Martin J. Duerst wrote: > An issue has recently come up in the resolution of last call > comments to XML Base http://www.w3.org/TR/2000/WD-xmlbase-20000607 > (don't hesitate to read this, it's really, really short). > > First a bit of background: > > XML Base defines an attribute xml:base for XML documents > to allow to set the base for relative URI resolution. > The xml:base attribute cannot only be set at the root element > (i.e. for the whole document), but also on any other element. > In that case, the base applies for the other attributes on > the element, and for everything within the element. > If the value of xml:base itself is relative, it is in turn > resolved based on an xml:base higher up in the element > hierarchy. > > XML 1.0 also defines entities http://www.w3.org/TR/REC-xml#sec-physical-struct. > There are various kinds of entities, but relevant for this > discussion are both internal general entities and external > parsed general entities. An external parsed general entity > is declared e.g. as follows: > > <!ENTITY entityName SYSTEM "http://www.example.com/example.xml"> > > An internal general entity is declared as follows: > > <!ENTITY entityName "entity Content"> Which means that an internal general entity doesn't care what the Base is since it doesn't define by identifying but by explicit value, right? > In both cases, the entity is invoked by &entityName;. > [Of course, it's not possible to use the same name for two > different entities.] > > In some way, entities are similar to C preprocessor instructions, > internal entities correspond to #define, and external entities > to #include. But of course it's not exactly the same. > > The core of the current problem comes from the following > sentence [http://www.w3.org/TR/xmlbase#IDwkAq1]: > > The scope of xml:base does not extend into external entities, > but it does extend into internal entities. Why do internal entities care what the Base is? > The alternative proposal currently under discussion would be > to say that xml:base extends into external and internal entities. I would suggest an alternate: external entities require an absoluteURI as their system identifier... but I know how popular that probably is... > The following is an example: > > File /example/a.xml: > > <!DOCTYPE example > [ > <!ENTITY entity1 SYSTEM "/include/entity1.xml"> > ] > <example xml:base='subdir1'> > &entity1; > </example> > > > File /include/entity1.xml: > > <a href='link.xml'>That's the question!</a> > > > Assuming that the href attribute in the example document > is governed by the XML Base specification, what should it > refer to? IMHO, nothing. It should be considered a parsing error. Situations like this (at least to me) seem artificial and dangerous. To have a spec like XMLBase tear itself apart trying to fix something like this instead of declaring it a syntax error is the wrong thing to do. > If xml:base extends into external entities, it would > refer to /example/subdir1/link.xml. If xml:base doesn't > extend into external entities, it would refer to > /include/link.xml. > > What to do in the absence of xml:base would have to be > aligned with the decision. This means that in the above just > example without xml:base, href would refer to /example/link.xml > in the case things extend into external entities, and would > refer to /include/link.xml in the case things don't extend > into external entities. My suggestion of external entities requiring absoluteURIs sounds much cleaner and less error prone. If people are worrying about 'document portability' then might I suggest they look at location (network node) independent URIs. > Various attempts of interpreting Section 5.1 of RFC 2396 > (see e.g. http://www.innosoft.com/rfc/rfc2396.html#sec-5.1) > have been undertaken, with no clear results. > > First, Section 5.1 speaks about a single base per document, > having multiple bases in different areas of a document > doesn't seem to have been a concern, or maybe was explicitly > rejected. We didn't consider that case because it really wasn't an issue for URIs but for applications that are defining complex documents. I.e. its not URI problem, its an XML problem. If XML wants to define something where Bases are defined on a tag by tag basis then fine. If it then wants to allow nested documents then XML must figure out whether or not definitions pass into such nested blocks. > Second, Section 5.1 doesn't seem to consider the case of > inclusion in the way this happens with entities or similar > cases. 2396 doesn't and should never define a document model (IMHO it should have never defined relative URIs and fragments but I lost that debate). > Third, the words 'entity' and 'document' are used both > in XML and in RFC 2396, but it is not clear how to relate > these together. If it did then that's probably a mistake in 2396... > A document in the XML sense includes > all the entities (including external ones). In RFC 2396, > the only case that seems to have been considered is that > documents can be encapsulated in entities. Nevertheless, > an XML external entity has it's own URI, and therefore > in many ways behaves like an entity as described in > RFC 2396. I think its a mistake to read 2396 as though it has some type of document model in mind. No one ever thought in those terms and, at least in my mind, the mention of documents and such were for pedagogical reasons only. If I remember correctly the use of the term 'entity' in 2396 is in no way related to how XML or SGML use the term. By 'entity' we simply meant "some generic thing in the metaphysical sense", not some syntactic element somewhere. > Any opinions and any help on this issue is greatly appreciated. You got 'em. I doubt if they were what you wanted though... -MM -- -------------------------------------------------------------------------------- Michael Mealling | Vote Libertarian! | www.rwhois.net/michael Sr. Research Engineer | www.ga.lp.org/gwinnett | ICQ#: 14198821 Network Solutions | www.lp.org | michaelm@netsol.com
Received on Tuesday, 11 July 2000 09:24:39 UTC