Re: Liaison statement on fragment identifiers from Linking WG from Dan Connolly on 1999-05-20 (www-html-editor@w3.org from April to June 1999)

From: Dan Connolly <connolly@w3.org>
Date: Wed, 19 May 1999 23:12:32 -0500
To: shane@themacs.com
CC: Bill Smith <bill.smith@sun.com>, Tim Bray <tbray@textuality.com>, Steven Pemberton <Steven.Pemberton@cwi.nl>, w3c-xml-cg@w3.org, w3c-html-wg@w3.org, www-html-editor@w3.org, w3c-xml-linking-wg@w3.org
Message-ID: <37438BB0.6BFAF21A@w3.org>
"Shane P. McCarron" wrote:
> 
> Bill Smith wrote:
> >   In HTML 4.0 the following is legal:
> >     <A NAME="bill's-address">

absolutely (unfortunately).

> Let me be clear:
> 
> HTML 4.0 requires that the NAME and ID attributes share a namespace.

meaning: the value of each NAME attribute must be distinct
from the values of all ID attributes, and vice versa.
Nothing more.
(c.f.
http://www.w3.org/TR/1998/REC-html40-19980424/struct/links.html#h-12.2.1)

> It
> further defines the ID attribute as type ID. While the DTD does not
> explicitly define the type ID, I infer it to mean the SGML/XML
> definition of that type. This is backed up by the fact that the HEADERS
> attribute of the TD element and the FOR attribute of the LABEL element
> are both of type IDREF.

yes...

> Therefore, the allowable set of identifiers in HTML 4.0 for the ID
> attribute should, necessarily, be the same as the allowable set of
> identifiers for the NAME attribute in HTML 4.0.

Huh? I don't see how this follows at all.

> Even though NAME is
> declared as CDATA, its set needs to be constrained by the set for ID or
> fragment references from LABEL and TD would not work as expected.

LABEL and TD don't point to <A NAME=xxx>; just to IDs:

	"When present, the value of this attribute must be the
	same as the value of the id attribute of some other
	control in the same document."
http://www.w3.org/TR/1998/REC-html40-19980424/interact/forms.html#adef-for


> So, even though your example was likely legal in HTML 3.2, and might
> even work in some implementations, I do not believe it is legal usage in
> HTML 4.0.

I disagree.

> In any event, I am happy to assert that it is NOT legal usage
> through prose to that effect, or through a redeclaration of the NAME
> attribute as some other type that makes it clear it as a restricted
> portable character set (if such a type can be found).

We should have made A NAME an ID attribute in HTML 4.0 strict --
heck, we saw this coming back in the HTML 2.0 discussions --
and deprecated usage such as Bill's example. But we didn't.

So I think the right thing to do is to say:
	if your A NAME attribute values are XML names, you win,
		i.e. you can translate to XHTML without pain.
	else, you lose. Sorry. (perhaps we could recommend a kludge
		for consistency).


> > I've used "-" as an escape character in this example. It's a valid
> > character in attribute values of type ID and should allow us to manually
> > translate CDATA NAME atribute values to ID ID attribute values. If I've
> > thought about this correctly, I now have a document instance that can be
> > served as HTML or XHTML.
> 
> Sort of.  HTML and XHTML require that the NAME and ID namespaces be
> shared. To me this means (also) that the NAME and ID attributes of a
> single element must necessarily be the same.

That's er... creative thinking.

>  However, this is not
> explicit in HTML 4.0.

It's quite explicit to the contrary:

	An anchor name is the value of either the name or id
	attribute when used in the context of
	anchors. Anchor names must observe the following rules:

	    Uniqueness: Anchor names must be unique within a
		document. Anchor names that differ only in case
		may not appear in the same document.


> We could certainly make it explicit in XHTML 1.0
> if that would assist with translation.

I don't think it would help.

>  Anyway, I don't think that your
> example meets this requirement. It is not an Strictly Conforming XHTML
> 1.0 Document, as we have it defined. Therefore, the behavior of user
> agents that process it is unspecified.
> 
> > But all of the documents that refer to this document instance will still
> > have fragment IDs of the form "bill's-address"

i.e. #bill%XXs-address
(where XX is the ASCII code for ' in hex)

> > and these fragment IDs will
> > fail when the resource retrieved is of type XML - unless some form of
> > fallback to HTML 4.0 behavior is specified for all XML.
> >
> > Basically, the transition (for fragments IDs) from an attribute of type
> > CDATA to one of type ID will be problematic unless XML-generic processing
> > of these fragment IDs follows HTML 4.0 specific semantics. Webs of
> > documents will possibly cease to function properly when the conversion
> > occursor at some point in the future after everyone working on the
> > conversion has moved on.
> 
> Well - there may be that danger. However, I don't at this time see an
> easy way around it other than making it clear that the set of characters
> that can be used in NAME are restricted to the set that can be used in
> an ID.  I believe this is what was meant by HTML 4.0,

I disagree.

> and I know this is
> what we mean in XHTML 1.0.

That makess sense.

> Would making such a restriction explicit help assuage your concerns?


-- 
Dan Connolly, W3C
http://www.w3.org/People/Connolly/
tel:+1-512-310-2971 (office, mobile)
mailto:connolly.pager@w3.org (put your tel# in the Subject:)
Received on Thursday, 20 May 1999 00:12:53 UTC