XML Base issue

Back in the dim dark mists of time, before the glaciers came and
receded...

>     Norm raised an issue about xml:base at
>     http://lists.w3.org/Archives/Public/public-xml-core-wg/2014Sep/0004
>     wherein he concluded:
>
>      We could say that xml:base and xml:lang aren't
>      copied because there are separate controls for them.
>
>      Another option is to say that if xml:base is copied,
>      the absolute base URI is used as the attribute's value.
>
>     There was some follow up discussion on the list.
>
> Norm points out this whole problem only occurs when there
> is an explicit xml:base attribute on the xi:include element.
>
> Two other options:
>
> We could say not to copy anything in the xml space.
>
> [People were not enthusiatic about this one at first,
> though it did resurface briefly again.]
>
> We could say that having xml:base copied when base URI
> fixup is turned off may give you undesired results, so
> don't do that.
>
> We could just add a note to this effect.
>
> But this would make something that is valid with an xinclude 1.0
> processor invalid with a 1.1 processor, and we want to
> avoid backwards incompatible behavior.
>
> Norm leans toward saying that one has to make the xml:base
> attribute absolute before copying it.
>
> Henry asks how this differs from just making base URI fixup
> mandatory.
>
> Then Henry points out that we have base URI's sort of going
> both ways: from the includer to the included and vice versa.
>
> We need to be clearer about how xml:id, xml:lang, and xml:base
> are handled when they occur on the xi:include element including
> how they get their semantics.
>
> ACTION to Norm:  Write a proposal for how to address this problem.

Some observations:

0. The problems we're discussing here occur *only* if the document
   author has explicitly put an xml:id, xml:base, or xml:lang
   attribute *on* the xi:include element:

   <xi:include xml:id="myid" xml:base="foo/bar/" xml:lang="en-GB"
               parse="xml" href="path/to/my/document.xml"/>

   There is no common, practical reason to put any of those xml:*
   attributes on the xi:include element except attribute copying.

1. The ability to copy xml:id down is handy. It allows the user to
   easily reuse content without generating xml:id conflicts.

   1.a. This is limited to the case where the only (significant)
        xml:id is on the root element of the document being included.

2. Saying that xml:id attributes are copied but xml:base (or xml:lang)
   attributes are not seems like it would be very confusing.

3. Copying the literal value of xml:id and xml:lang works, but copying
   the literal value of xml:base does not. (The relative location of
   the content actually included may differ from the relative location
   of the xi:include element so in the new context, the xml:base
   resolves in a very surprising way.)

4. We must consider how xml:base and xml:lang fixup come into play.

Consider:

   http://example.com/a/b/c/doc.xml:

   <doc xmlns:xi="http://www.w3.org/2001/XInclude">
     <xi:include xml:id="foo" xml:base="d2/" href="d1/chap.xml"/>
   </doc>

   http://example.com/a/b/c/d1/chap.xml

   <chap>
     <image href="picture.png"/>
   </chap>

Here are a couple of options that feel coherent to me:

A. xml:id, xml:base, and xml:lang attributes are not copied. If you
   want to change the value of any of those attributes on the included
   content, you will have to use my:id, my:base, and my:lang
   attributes on the xi:include element and then have a subsequent
   step in your processing pipeline coerce the my:* values to xml:*
   values. Fixup is uneffected.

   <doc xmlns:xi="http://www.w3.org/2001/XInclude">
     <chap xml:base="d1/chap.xml">
       <image href="picture.png"/>
     </chap>
   </doc>

   The "intrinsic" base URI of chap is different from the base URI of
   the xi:include element so base URI fixup has inserted an xml:base
   attribute to fix that.

B. xml:id, xml:base, and xml:lang attributes are copied.

   B.1. xml:id values are copied literally.

   B.2. xml:lang values are copied literally. I don't actually think
        language fixup matters.

   B.3. xml:base values are copied:

        B.3.i.  If base URI fixup is enabled, the value of the xml:base
                attribute on the xi:include element is made absolute
                relative to the xi:include element and that (absolutized)
                value is used as the value of the xml:base attribute on
                top-level included items.

                <doc xmlns:xi="http://www.w3.org/2001/XInclude">
                  <chap xml:id="foo" xml:base="http://example.com/a/b/c/d2/">
                    <image href="picture.png"/>
                  </chap>
                </doc>

                Now picture.png will be resolved relative to d2/ which I
                assume is the least surprising thing given the original
                markup.

                For completeness, let's consider what it would mean to
                copy the values literally.

                <doc xmlns:xi="http://www.w3.org/2001/XInclude">
                  <chap xml:id="foo" xml:base="d2/">
                    <image href="picture.png"/>
                  </chap>
                </doc>

                The chap element would get an xml:base attribute of
                "d2/". But that relative xml:base value would (have to)
                be made absolute with respect to the base URI of the
                chap element which is
                "http://example.com/a/b/c/d1/chap.xml" so the effective
                base URI of chap, and it's contained image, becomes
                "http://example.com/a/b/c/d1/d2/" which I think is very
                unlikely to be useful.

        B.3.ii. If base URI fixup is suppressed, the value of the xml:base
                attribute on the xi:include element is copied literally.

                <doc xmlns:xi="http://www.w3.org/2001/XInclude">
                  <chap xml:id="foo" xml:base="d2/">
                    <image href="picture.png"/>
                  </chap>
                </doc>

        In short: if you're asking for base URI fixup, then the value
        of the xml:base attribute on the included content will be the
        least surprising thing. If you're not asking for fixup, you
        get the literal values, you figure out what they mean.

On the whole, I think B is the more useful proposal, but A is *so*
much simpler.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676
www.marklogic.com

Received on Wednesday, 12 November 2014 13:27:15 UTC