Re: id attribute in HTML 4.01

Masayasu Ishikawa wrote:
 
> "Russell O'Connor" <roconnor@Math.Berkeley.EDU> wrote:
> 
>> The id attrubute has type ID, and as far as I can tell is case
>> insensitive.
>> 
>> This means that fragement identifiers should be matched
> case insenstively.
> 
> See section 12.2.1 "Syntax of anchor names", at:
> 
>    http://www.w3.org/TR/html401/struct/links.html#h-12.2.1

That section states, "Comparisons between fragment identifiers 
and anchor names must be done by exact (case-sensitive) match."

What is left unwritten is that
    * HTML4 requires folding non-entity *
    * names to upper case during parsing. *
This requirement covers, among other things, ID attribute values. 
 Inspect the HTML4 SGML declaration [1]:
         NAMING   LCNMSTRT ""
                  UCNMSTRT ""
                  LCNMCHAR ".-_:"    
                  UCNMCHAR ".-_:"
                  NAMECASE GENERAL YES
                           ENTITY  NO

So, the HTML4 tag keyed as
    <div id="a-to-z">
represents the start of a 'DIV' element with an 'ID' attribute 
whose value is 'A-TO-Z'.  SGML offers many conveniences for 
the direct keying of a document.  But, just as the lack of an 
end-tag on a 'P' element does not mean that the 'P' element has 
no end, the case of non-entity name characters as keyed does 
not influence the resulting name.

At least, this is the case if we take seriously the claims [2, 3, 4] 
that HTML4 is an application [5] of SGML.  It was foolish 
to proclaim HTML4 an application of SGML.  The 
proclamation requires conformant implementations of HTML4 
to obey the details of ISO 8879, which is a complex specification.  
It would have been perfectly reasonable to include in the 
HTML4 specification a non-normative reference to ISO 8879 and 
to create rules that would ensure that every valid 
HTML4 document be a valid SGML document.

[1]  HTML4 section 20.1, "SGML Declaration".
  <http://www.w3.org/TR/html4/sgml/sgmldecl.html>.

[2]  HTML4 cover page.  <http://www.w3.org/TR/html4/>.
"HTML 4.01 is an SGML application conforming to International
Standard ISO 8879 -- Standard Generalized Markup Language"

[3]  HTML4 section 5.1, "Document Character Set".
  <http://www.w3.org/TR/html4/charset.html>
"To promote interoperability, SGML requires that each application
(including HTML) specify its document character set."

[4]  HTML4 section 4.2, "SGML".
  <http://www.w3.org/TR/html4/conform.html>
"HTML 4.01 is an SGML application conforming to International
Standard ISO 8879 -- Standard Generalized Markup Language"

[5]  ISO 8879:1986 sub-clause 15.2, "Conforming 
SGML Application".
"A conforming SGML application's conventions can affect only 
areas that are left open to specification by applications."

-- 
Etan Wexler

Received on Friday, 25 January 2002 07:50:25 UTC