Re: Liaison statement on fragment identifiers from Linking WG from Bill Smith on 1999-05-17 (www-html-editor@w3.org from April to June 1999)

From: Bill Smith <bill.smith@sun.com>
Date: Mon, 17 May 1999 06:06:30 -0700
To: Steven Pemberton <Steven.Pemberton@cwi.nl>
Cc: w3c-xml-cg@w3.org, w3c-html-wg@w3.org, www-html-editor@w3.org, w3c-xml-linking-wg@w3.org
Message-Id: <4.0.1.19990514103713.00e79530@jurassic.eng.sun.com>
At 12:44 AM 5/14/99 +0200, Steven Pemberton wrote:
> > The simple, but hard fact is that HTML 4.0 is not an XML application. In
> > particular, HTML's reliance on the CDATA type NAME attribute as the
> > referent of a fragment identifier makes the transition to XML difficult at
> > best. As was described in our liaison document, XPointer intends to apply
> > the semantic that "naked" fragment identifiers refer to elements that have
> > a type ID attribute whose value matches that specified in the fragment
> > identifier. I do not believe it possible to have structured fragment
> > identifiers for XML if the referent attribute is of type CDATA.
>
>No, indeed, and that is why in XHTML 1.0 (as serve as text/xml) that
>is the intention, that naked fragment identifiers indeed refer to an
>element with a matching ID. That is why we also give the guideline
>that when served as text/html (to old browsers in other words) authors
>had better add a NAME attribute to the <a> element, so that you get
>the same effect.

This is a reasonably presuasive argument for new content but I'm concerned
about the XHTML (XML) content that will be "created" through simpleminded
transformation of HTML to XHTML. No matter how good HTML Tidy is, not
everyone will use it, or will be happy with the results.

I'm not familiar with HTML Tidy but do know that conversion anchors from
HTML 4.0 NAME syntax to XML ID is problematic. Remember, NAME is of type
CDATA and ID is of type ID. CDATA permits a much broader range of
characters than ID and it will therefor be necessary to "escape" the
disallowed CDATA characters when converting to an ID attribute. At best,
fragment identifiers become ugly. At worst (as pointed out in our liaison
document) the conversion casuses massive breakage when one considers the
the web as a whole (and all the URLs with fragment identifiers pointing
into distant resources).

If the perception is that conversion to XML is difficult and causes things
to break, the web community may decide that it is not worth the effort. A
better course of action in my mind, is to declare that XHTML is an
application of XML just as HTML is an application of XML. 

This provides the HTML WG with the freedom to use HTML application syntax
and semantics as necessary/appropriate *and* to incorporate generic XML
mechanisms when appropriate. Further, it enables the XML activity to
proceed with defining generic mechanisms that can build on the HTML legacy
while having some freedom to deprecate/abandon application specific
features without fear of "breaking the web".

>I was under the impression that the fact that the "space before /"
>quirk in empty elements (like <br />) works on old browsers was one of
>the reasons for choosing the <br/> syntax for XML. That was one of the
>motivations for creating the guidelines.

It may have been. I don't recall that it was but the WG (I wasn't a member)
may have discussed this. My understanding is that the syntax was selected
because it was "legal" SGML and it enabled detection of empty elements
without a DTD. If the syntax happens to work in downlevel HTML browsers,
that would be interesting (and good) but not necessarily a reason to have
selected the syntax.

>Well, we have moved the syntax to XML; we are awaiting an Xlink
>recommendation to formalise the linking semantics, and possibly a CSS3
>recommendation will allow us to formalise (with that notation) the
>form semantics. But I don't actually see how expressing HTML in XML
>actually prevents us from doing anything semantically. Can you
>elaborate on the problem?

Expressing HTML in XML doesn't prevent the HTML WG from doing anything
semantically. The problem runs in the opposite direction. The sematics of
XHTML, if served simply as mime-type xml, may force de facto semantic
meaning onto XML by virtue of its (likely) widespread adoption and use. As
a consequence, XML loses its status as an application-neutral meta language
and becomes HTML with user tags sprinkled about.

>Actually adopting the Linking group's suggestion would have broken the
>use of CSS with XHTML, and we wanted to avoid that.

Frankly I'm less concerned with that. I know others see this *very*
differently and even I would like to see CSS work with XML. However, the
applicatino-neutral status of XML has primacy in my mind.

>No! There is absolutely no need or requirement for an XHTML agent to
>understand or process the quirks. They are only there to allow old
>agents to process the documents, but they are not required to be in
>documents, nor do they change the syntax or semantics of the documents.

But the old agents have a way of becoming the new agents - that's been our
history. My concern is that the "new" agents will behave exactly like the
old agents when presented "new" content.

>
> > As a consequence, the semantics of a "naked" XML fragment identifier will
> > be to refer to an element that has either a type ID attribute that matches
> > the fragment identifier or to an element that has a type CDATA attribute
> > NAME whose value matches the fragment identifier.
>
>The semantics will be to match an ID. There may be a NAME attribute
>with identical value, and the XML processor will ignore it.
>
> > This effectively
> > precludes the use of the attribute name NAME in any XML vocabulary unless
> > the user intention is for the specific semantic specified in HTML 4.0.
>
>As I hope is now clear, this is not the case.

I recognize that the HTML WG does not intend for the NAME attribute to be
usurped for all of XML. My concern is that this will be come the case, de
facto, no matter how hard we try, de jure, to say otherwise.
Received on Monday, 17 May 1999 09:43:57 UTC