How to encode special characters in <A NAME="...">?

[If this is the wrong mailing list for this kind of question, please
point me to a better place for asking]

Hello,

I've read the HTML 4 specification, and are now searching for
the right way to encode the NAME attribute of an A tag, e.g.

    <a name="function(arg1, arg2)">funcdef</a>

How should parentheses, whitespace and comma (and potentially other
characters) be encoded in the "name" attribute's value?

(a) In chapter 12.12 ("The A element") the "name" attribute is listed
    as type "cdata" (case-sensitive), which would allow for the use
    of character entities (e.g. "&nbsp;") within the attribute's value.
    [http://www.w3.org/TR/html4/struct/links.html#h-12.2]

(b) Chapter 12.2.1 ("Syntax of anchor names") states that anchor names
    "...should be restricted to ASCII characters. Please consult the
    appendix for more information about non-ASCII characters in URI
    attribute values."
        [http://www.w3.org/TR/html4/struct/links.html#h-12.2.1]

    The appendix then points out how values for "href" should encode
    special characters using UTF-8, and then use the "urlencoding" scheme
    on the result (e.g. representing '*' as "%2A").
    [http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars]

(c) 6.2 ("SGML basic types") limits the tokens used to "...[A-Za-z]
    [...] followed by any number of letters, digits ([0-9]),
    hyphens ("-"), underscores ("_"), colons (":"), and periods (".")."
        [http://www.w3.org/TR/html4/types.html#type-cdata]

If I didn't miss something, we now have three different descriptions
of how the value of a "name" attribute is to be encoded:

    (a) only character entities ("&something;") are allowed
    (b) "urlencoding" is allowed (e.g. "%2A")
    (c) after an initial character, the following chars are allowed:
        [A-Za-z0-9_:.-]

I think part of the confusion here is caused by the different
usage of the NAME tag. In some places it's used as an id for
a particular element (it even shares the same name space as "id" does).
In other places it's used as a name that will become part of a fragment
identifier of an URL:

    <a name="intro"></a>

could be used in an URL like this:

    <a href="http://host.com/doc.html#intro">...</a>

What is the right way to encode the "name" attribute value
if it is used in a case like this?

For simplicity and flexibility I'd prefer the "urlencoding" scheme,
which make the encoding of the "fragment identifier" consistent with
the encoding of other parts of the URL.

From reading the specification (a) (character entities) seems to be
the appropriate encoding scheme.

Any advice or insights on this?

Heiner

Received on Tuesday, 18 November 2003 04:51:27 UTC