Re: Liaison statement on fragment identifiers from Linking WG

Bill Smith wrote:
> It may be that I misunderstand how this technology works but I fail to see
> how HTML tidy, when run over a single document instance, will cause all
> referring URLs (from other documents) to be properly updated. A simple
> example:
> 
>   In HTML 4.0 the following is legal:
>     <A NAME="bill's-address">
> 
>   In XHTML 1.0 this becomes (with the help of a tool like HTML tidy)
>     <A NAME="bill's-address" ID="bill-0039s--address">

Okay - Good example.  I get it now.  And I see how this might be an
issue if implementors and users use bad fragment identifiers. However,
we cannot define behavioral requirements for incorrect implementations
or incorrect usage.

Let me be clear:

HTML 4.0 requires that the NAME and ID attributes share a namespace. It
further defines the ID attribute as type ID. While the DTD does not
explicitly define the type ID, I infer it to mean the SGML/XML
definition of that type. This is backed up by the fact that the HEADERS
attribute of the TD element and the FOR attribute of the LABEL element
are both of type IDREF.

Therefore, the allowable set of identifiers in HTML 4.0 for the ID
attribute should, necessarily, be the same as the allowable set of
identifiers for the NAME attribute in HTML 4.0. Even though NAME is
declared as CDATA, its set needs to be constrained by the set for ID or
fragment references from LABEL and TD would not work as expected.

So, even though your example was likely legal in HTML 3.2, and might
even work in some implementations, I do not believe it is legal usage in
HTML 4.0. In any event, I am happy to assert that it is NOT legal usage
through prose to that effect, or through a redeclaration of the NAME
attribute as some other type that makes it clear it as a restricted
portable character set (if such a type can be found).

> I've used "-" as an escape character in this example. It's a valid
> character in attribute values of type ID and should allow us to manually
> translate CDATA NAME atribute values to ID ID attribute values. If I've
> thought about this correctly, I now have a document instance that can be
> served as HTML or XHTML.

Sort of.  HTML and XHTML require that the NAME and ID namespaces be
shared. To me this means (also) that the NAME and ID attributes of a
single element must necessarily be the same.  However, this is not
explicit in HTML 4.0. We could certainly make it explicit in XHTML 1.0
if that would assist with translation.  Anyway, I don't think that your
example meets this requirement. It is not an Strictly Conforming XHTML
1.0 Document, as we have it defined. Therefore, the behavior of user
agents that process it is unspecified.

> But all of the documents that refer to this document instance will still
> have fragment IDs of the form "bill's-address" and these fragment IDs will
> fail when the resource retrieved is of type XML - unless some form of
> fallback to HTML 4.0 behavior is specified for all XML.
> 
> Basically, the transition (for fragments IDs) from an attribute of type
> CDATA to one of type ID will be problematic unless XML-generic processing
> of these fragment IDs follows HTML 4.0 specific semantics. Webs of
> documents will possibly cease to function properly when the conversion
> occursor at some point in the future after everyone working on the
> conversion has moved on.

Well - there may be that danger. However, I don't at this time see an
easy way around it other than making it clear that the set of characters
that can be used in NAME are restricted to the set that can be used in
an ID.  I believe this is what was meant by HTML 4.0, and I know this is
what we mean in XHTML 1.0.

Would making such a restriction explicit help assuage your concerns?

--
Shane P. McCarron                              phone: +1 612 434-4431
Testing Research Manager                         fax: +1 612 434-4318
                                              mobile: +1 612 799-6942
                                              e-mail: shane@themacs.com

OSF/1, Motif, UNIX and the "X" device are registered trademarks in
the US and other countries, and IT DialTone and The Open Group are
trademarks of The Open Group.

Received on Tuesday, 18 May 1999 08:37:18 UTC