Re: Liaison statement on fragment identifiers from Linking WG from Steven Pemberton on 1999-05-13 (www-html-editor@w3.org from April to June 1999)

From: Steven Pemberton <Steven.Pemberton@cwi.nl>
Date: Fri, 14 May 1999 00:44:10 +0200 (MET DST)
To: Bill Smith <bill.smith@sun.com>
Cc: w3c-xml-cg@w3.org, w3c-html-wg@w3.org, www-html-editor@w3.org, w3c-xml-linking-wg@w3.org
Message-Id: <UTC199905132244.AAA17514.steven@schoener.cwi.nl>
 > The simple, but hard fact is that HTML 4.0 is not an XML application. In
 > particular, HTML's reliance on the CDATA type NAME attribute as the
 > referent of a fragment identifier makes the transition to XML difficult at
 > best. As was described in our liaison document, XPointer intends to apply
 > the semantic that "naked" fragment identifiers refer to elements that have
 > a type ID attribute whose value matches that specified in the fragment
 > identifier. I do not believe it possible to have structured fragment
 > identifiers for XML if the referent attribute is of type CDATA.

No, indeed, and that is why in XHTML 1.0 (as serve as text/xml) that
is the intention, that naked fragment identifiers indeed refer to an
element with a matching ID. That is why we also give the guideline
that when served as text/html (to old browsers in other words) authors
had better add a NAME attribute to the <a> element, so that you get
the same effect.

 > However, if XHTML is in some way "abnormal" with regard to XML, the HTML WG
 > will have done a disservice to those responsible for the development of XML
 > and the larger web community.

Exactly our point, which is why we chose to use ID as the target
attribute.

 > The Strawman 1 proposal requires less work, requiring only an adjustment to
 > a single Working Draft (XPointer). However, it represents a sacrifice of
 > simplicity and consistency in the general case in order to address a very
 > specific problem. This is serious, since XML is simple and flexible enough
 > that there is a real possibility of it serving as the basis for distributed
 > hypertext for a very long time indeed.
 > 
 > Strawman 2 requires co-ordinated updates to XHTML and to XML 1.0, and
 > increases the conversion cost for those who want to serve legacy HTML as
 > XML. However, it preserves the simplicity of the general case, and better
 > meets the goal of "leading the Web to its full potential."

The solution we chose was to use ID and give no meaning to NAME except
when serving to old browsers. We think that this is the easiest and
most effective solution: we don't need to change XML 1.0 or Xpointer,
and we move HTML to being a real XML application. The HTML 4.0 rec
started this process off by allowing the use of ID, and we are
finishing it by requiring it.

I can only assume since we are having this discussion that we have
failed to make this sufficiently clear.

 > In brief, our goal is to develop standards and recommendations for use that
 > have broad applicability well into the future. We are well-aware of
 > backwards compatibility issues and the need to provide transition
 > strategies. However, mechanisms and recommendations that rely on quirks or
 > application-specific semantics should be avoided.

I was under the impression that the fact that the "space before /"
quirk in empty elements (like <br />) works on old browsers was one of
the reasons for choosing the <br/> syntax for XML. That was one of the
motivations for creating the guidelines.

 > As I mentioned above, HTML 4.0 is not an XML application. Further the
 > application specific syntactic and semantic mechanisms employed in HTML 4.0
 > are at times at odds with the general purpose syntactic and semantic
 > mechanisms being developed as companions to XML. As a consequence, the
 > transition to XHTML 1.0 (as XML) may not be as simple as some predicted.

Well, we have moved the syntax to XML; we are awaiting an Xlink
recommendation to formalise the linking semantics, and possibly a CSS3
recommendation will allow us to formalise (with that notation) the
form semantics. But I don't actually see how expressing HTML in XML
actually prevents us from doing anything semantically. Can you
elaborate on the problem?


 > >In many ways this is a hack (and probably introduced to allow a
 > >transition to using ID). We regard the NAME attribute of A elements as
 > >a historical anomaly that should be phased out.
 > 
 > One possibility is to phase it out with the transition to XML.

That is exactly our plan. I'm sorry that wasn't clear.

 > It is
 > possible (if not likely) that other things will break with this transition
 > and now might be the best time to incur the pain of change.

Actually adopting the Linking group's suggestion would have broken the
use of CSS with XHTML, and we wanted to avoid that.

 > While I have significant concern for new documents served to old user
 > agents, I have an equal concern for new documents served to new user
 > agents. In particular, I am quite concerned that a new class of (XML) user
 > agents will be developed that "understand" XHTML but these new user agents
 > will support the syntax and semantics of HTML 4.0 in order to "aid the
 > transition process". This "quirk" will have long-lasting and far-reaching
 > impact.

No! There is absolutely no need or requirement for an XHTML agent to
understand or process the quirks. They are only there to allow old
agents to process the documents, but they are not required to be in
documents, nor do they change the syntax or semantics of the documents.

 > As a consequence, the semantics of a "naked" XML fragment identifier will
 > be to refer to an element that has either a type ID attribute that matches
 > the fragment identifier or to an element that has a type CDATA attribute
 > NAME whose value matches the fragment identifier.

The semantics will be to match an ID. There may be a NAME attribute
with identical value, and the XML processor will ignore it.

 > This effectively
 > precludes the use of the attribute name NAME in any XML vocabulary unless
 > the user intention is for the specific semantic specified in HTML 4.0.

As I hope is now clear, this is not the case.

 > Deprecation is always best when done over the long-term as opposed to the
 > short-term. However, this is not simply deprecating an attribute of
 > historical anomoly. Moving the syntax/semantics associated with the HTML
 > 4.0 attribute NAME into general XML usage (whether explicit or implied)
 > will have long-lasting and far-reaching consequences.

As I said, we have moved the syntax, but not the semantics.

 > 
 > It is my belief that the guideline/admonition will be generally ignored by
 > users and as a consequence, application developers will ascribe the HTML
 > 4.0 semantic to the NAME attribute for all XML documents. This would be
 > most unfortunate and I urger the HTML WG to reconsider this decision.
 > 
 > An option might be to register XHTML as it's own mime-type rather than as
 > generic XML.
 > 
 > >
 > >Steven Pemberton
 > >May 1999
 > > 
 >
Received on Thursday, 13 May 1999 18:44:15 UTC