Re: Liaison statement on fragment identifiers from Linking WG from Bill Smith on 1999-05-13 (www-html-editor@w3.org from April to June 1999)

From: Bill Smith <bill.smith@sun.com>
Date: Thu, 13 May 1999 11:50:37 -0700
To: Steven Pemberton <Steven.Pemberton@cwi.nl>, w3c-xml-cg@w3.org
Cc: w3c-html-wg@w3.org, www-html-editor@w3.org, w3c-xml-linking-wg@w3.org
Message-Id: <199905131852.LAA921312@jurassic.eng.sun.com>
Comments below are my personal views and may not be shared by other members
of the XML Linking WG. This message has been copied to the XML Linking WG
where I will encourage further discussion since it appears the HTML WG's
decision to not adopt our proposal may have significant impact on XML
linking specifically and the XML activity in general.

At 10:30 PM 5/12/99 +0200, Steven Pemberton wrote:
>HTML WG Comments on "Liaison statement on fragment identifiers from
>Linking WG" Tue, 20 April 1999
>
>http://lists.w3.org/Archives/Member/w3c-xml-cg/1999Apr/0061.html
>
>The HTML WG appreciates the input from the Linking WG, and discussed
>it fully at our recent FtF. We entered the meeting believing we would
>implement the suggested changes.
>
>However after some discussion we came to the realisation that the
>suggestions were based on a misunderstanding of the role of XHTML 1.0.
>
>To summarise:
>
>	The intention of XHTML 1.0 is that it is primarily an XML
>	application, and should be served as an XML application.
>
>	However, we believe it is desirable that in the short term to
>	have a transition period, to ease the transition from HTML to
>	XHTML, and to leverage on the preponderance of old-HTML user
>	agents being used in the world. Thanks to a quirk in most old
>	user agents, if you follow a small number of guidelines you
>	can serve the XML version of HTML as (old) HTML to non XML
>	user agents.

Based on the statements above, I doubt that there was any misunderstanding
on the part of the XML Linking WG as to the HTML WG's intended role of
XHTML. We are well-aware that XHTML 1.0 is intended to be an XML
application. Further, we are aware that it is desirable to ease the
transition from HTML to XHTML specifically or to XML generally. We are
further aware that other users of XML (RDF in particular) have adopted
syntax/semantics for XML fragment identifiers that are in conflict with
XHTML's intended use.

Several members of the XML Linking WG and IG participated in the original
formulation of XML and the lengthy discussions on acceptance, transition,
and market acceptance. If memory serves me, all linking WG/IG members that
participated in those discussions, at one time or another argued strongly
in favor of making accommodations in XML that would facilitate its
acceptance by HTML users. We continue to endeavor to ease
transition/acceptance but have an equal responsibility to ensure that XML
and its related standards are general purpose and well-suited to broad
application usage scenarios.

The simple, but hard fact is that HTML 4.0 is not an XML application. In
particular, HTML's reliance on the CDATA type NAME attribute as the
referent of a fragment identifier makes the transition to XML difficult at
best. As was described in our liaison document, XPointer intends to apply
the semantic that "naked" fragment identifiers refer to elements that have
a type ID attribute whose value matches that specified in the fragment
identifier. I do not believe it possible to have structured fragment
identifiers for XML if the referent attribute is of type CDATA.

>Since XHTML is principally an XML application, we want to act as far
>as possible like a normal XML application. (This is the reason we
>thought we would be adopting the Linking WG's suggestions). This
>includes using attributes of type ID as targets of fragment
>identifiers.

It is highly desirable if not imperative that XHTML behave like a normal
XML application. HTML is arguably the most widely used markup language and
its use is growing rapidly. If a transition from HTML to XHTML occurs, and
I see no reason that it should not, XHTML could become the most widely used
markup language based on XML. This would be an important step in the broad
adoption of XML and would provide the basis for the development of a broad
class of web-based applications.

However, if XHTML is in some way "abnormal" with regard to XML, the HTML WG
will have done a disservice to those responsible for the development of XML
and the larger web community. XML is general purpose and we are endeavoring
to develop similarly general purpose companion standards like XPointer and
XLink to enable the broadest possible class of applications while allowing
users the widest choice of language/vocabulary. I quote two paragraphs from
our liaison document below:

_____________

The Strawman 1 proposal requires less work, requiring only an adjustment to
a single Working Draft (XPointer). However, it represents a sacrifice of
simplicity and consistency in the general case in order to address a very
specific problem. This is serious, since XML is simple and flexible enough
that there is a real possibility of it serving as the basis for distributed
hypertext for a very long time indeed.

Strawman 2 requires co-ordinated updates to XHTML and to XML 1.0, and
increases the conversion cost for those who want to serve legacy HTML as
XML. However, it preserves the simplicity of the general case, and better
meets the goal of "leading the Web to its full potential."
_____________

In brief, our goal is to develop standards and recommendations for use that
have broad applicability well into the future. We are well-aware of
backwards compatibility issues and the need to provide transition
strategies. However, mechanisms and recommendations that rely on quirks or
application-specific semantics should be avoided.

>Now, XHTML 1.0 is just an XMLised version of HTML 4.0, as close as
>possible to HTML 4.0 taking into account the requirements of XML.

As I mentioned above, HTML 4.0 is not an XML application. Further the
application specific syntactic and semantic mechanisms employed in HTML 4.0
are at times at odds with the general purpose syntactic and semantic
mechanisms being developed as companions to XML. As a consequence, the
transition to XHTML 1.0 (as XML) may not be as simple as some predicted.

This dissonance is quite normal and I've seen it many times before whenever
I try to generalize an otherwise specific solution/application. Those
things that were natural, easy, or obvious in the specific case become
unnatural, difficult, or convoluted in the general case. Frequently, it is
necessary to abandon the application specific shortcuts in favor of the
general purpose mechanisms.

>HTML 4.0 allows the use of both NAME (on A elements) and ID (on any
>element) as the target for fragment identifiers. Here are some
>extracts from the HTML 4 recommendation (http://www.w3.org/TR/REC-html40/):
>
>    7.5.2 Element identifiers: the id and class attributes
>
>    The id attribute has several roles in HTML: 
>
>	As a style sheet selector. 
>	As a target anchor for hypertext links. 
>	...
>
>    12.2 The A element 
>
>    name = cdata [CS] 
>	This attribute names the current anchor so that it may be the
>	destination of another link. The value of this attribute must be
>	a unique anchor name. The scope of this name is the current
>	document. Note that this attribute shares the same name space as
>	the id attribute.
>
>    12.2.1 Syntax of anchor names
>
>    An anchor name is the value of either the name or id attribute when
>    used in the context of anchors. Anchor names must observe the
>    following rules: ...
>
>    12.2.3 Anchors with the id attribute
>
>    The id attribute may be used to create an anchor at the start tag of
>    any element (including the A element).
>
>In many ways this is a hack (and probably introduced to allow a
>transition to using ID). We regard the NAME attribute of A elements as
>a historical anomaly that should be phased out.

One possibility is to phase it out with the transition to XML. It is
possible (if not likely) that other things will break with this transition
and now might be the best time to incur the pain of change. Another
possibility would be to adopt Strawman 2 from the liaison document and
accept the offer to help coordinate the requisite changes to various
recommendations.

>Since HTML documents have to be rewritten (or transformed) to put them
>in XHTML form anyway, we decided to adhere to XML Linking suggestions
>of using ID as targets, and not NAME anymore. However, to aid the
>transition process mentioned above, we left NAME in the DTD for
>documents that need to be served to old user agents.

While I have significant concern for new documents served to old user
agents, I have an equal concern for new documents served to new user
agents. In particular, I am quite concerned that a new class of (XML) user
agents will be developed that "understand" XHTML but these new user agents
will support the syntax and semantics of HTML 4.0 in order to "aid the
transition process". This "quirk" will have long-lasting and far-reaching
impact.

As a consequence, the semantics of a "naked" XML fragment identifier will
be to refer to an element that has either a type ID attribute that matches
the fragment identifier or to an element that has a type CDATA attribute
NAME whose value matches the fragment identifier. This effectively
precludes the use of the attribute name NAME in any XML vocabulary unless
the user intention is for the specific semantic specified in HTML 4.0. I
can imagine many other intended uses for the an attribute named NAME and it
would be unfortunate to limit XML's otherwise open vocabulary policy.

>Our chief reasons therefore for not adopting the Linking WG's
>suggestions are:
>
>	We want to phase NAME out in the longer term. We think that
>	fragment identifiers should refer to the ID attribute.
>
>	If something is called ID, it should be of type ID.
>
>	Making NAME to be of type ID, and ID to be of another type
>	would break the use of CSS on XHTML documents, and create a
>	semantic difference in how CSS works on a document served as
>	HTML and XHTML.
>
>This is also the reason we provide a guideline to use both NAME= and
>ID= on A elements when you wish to serve XHTML as HTML, to ensure
>consistent behaviour across user agents.

Deprecation is always best when done over the long-term as opposed to the
short-term. However, this is not simply deprecating an attribute of
historical anomoly. Moving the syntax/semantics associated with the HTML
4.0 attribute NAME into general XML usage (whether explicit or implied)
will have long-lasting and far-reaching consequences.

It is my belief that the guideline/admonition will be generally ignored by
users and as a consequence, application developers will ascribe the HTML
4.0 semantic to the NAME attribute for all XML documents. This would be
most unfortunate and I urger the HTML WG to reconsider this decision.

An option might be to register XHTML as it's own mime-type rather than as
generic XML.

>
>Steven Pemberton
>May 1999
>
Received on Thursday, 13 May 1999 14:52:29 UTC