- From: Heiner Steven <heiner.steven@sun.com>
- Date: Tue, 18 Nov 2003 10:35:30 +0100
- To: www-html-editor@w3.org
[If this is the wrong mailing list for this kind of question, please point me to a better place for asking] Hello, I've read the HTML 4 specification, and are now searching for the right way to encode the NAME attribute of an A tag, e.g. <a name="function(arg1, arg2)">funcdef</a> How should parentheses, whitespace and comma (and potentially other characters) be encoded in the "name" attribute's value? (a) In chapter 12.12 ("The A element") the "name" attribute is listed as type "cdata" (case-sensitive), which would allow for the use of character entities (e.g. " ") within the attribute's value. [http://www.w3.org/TR/html4/struct/links.html#h-12.2] (b) Chapter 12.2.1 ("Syntax of anchor names") states that anchor names "...should be restricted to ASCII characters. Please consult the appendix for more information about non-ASCII characters in URI attribute values." [http://www.w3.org/TR/html4/struct/links.html#h-12.2.1] The appendix then points out how values for "href" should encode special characters using UTF-8, and then use the "urlencoding" scheme on the result (e.g. representing '*' as "%2A"). [http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars] (c) 6.2 ("SGML basic types") limits the tokens used to "...[A-Za-z] [...] followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".")." [http://www.w3.org/TR/html4/types.html#type-cdata] If I didn't miss something, we now have three different descriptions of how the value of a "name" attribute is to be encoded: (a) only character entities ("&something;") are allowed (b) "urlencoding" is allowed (e.g. "%2A") (c) after an initial character, the following chars are allowed: [A-Za-z0-9_:.-] I think part of the confusion here is caused by the different usage of the NAME tag. In some places it's used as an id for a particular element (it even shares the same name space as "id" does). In other places it's used as a name that will become part of a fragment identifier of an URL: <a name="intro"></a> could be used in an URL like this: <a href="http://host.com/doc.html#intro">...</a> What is the right way to encode the "name" attribute value if it is used in a case like this? For simplicity and flexibility I'd prefer the "urlencoding" scheme, which make the encoding of the "fragment identifier" consistent with the encoding of other parts of the URL. From reading the specification (a) (character entities) seems to be the appropriate encoding scheme. Any advice or insights on this? Heiner
Received on Tuesday, 18 November 2003 04:51:27 UTC