[whatwg] Custom elements and attributes from Henri Sivonen on 2006-10-23 (public-whatwg-archive@w3.org from October 2006)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 23 Oct 2006 14:43:40 +0300
Message-ID: <12AD0564-1932-4151-BA26-924CCF9F26C9@iki.fi>
On Oct 18, 2006, at 01:27, ?istein E. Andersen wrote:

> I just tried to check out how custom element and attribute names  
> work in current browsers and how they are supposed to work in  
> HTML5, and some issues seem unclear to me.
...
> 4) Sivonen's HTML5 validator (http://hsivonen.iki.fi/validator/ 
> html5/ as opposed to http://hsivonen.iki.fi/validator/) says:
>> Attribute name must not start with ?xml?.
> I fail to find any mention of this in the HTML5 draft. Has it been  
> borrowed from X(HT)ML?
> Such a limitation would make it more difficult to create conformant  
> legacy-comptaible documents.

Any attribute or element not specifically allowed in the spec is non- 
conforming. Therefore, all "custom attributes" and "custom elements"  
are non-conforming. Some non-conforming attributes are caught in the  
parser. Others are caught on the RELAX NG level. This is an  
implementation detail.

The implementation detail becomes an issue only if you want to use  
the conformance checker machinery with a custom schema. Using custom  
schemas with the HTML parser is for experts only and produces very  
wrong results unless the schema is suitable. Hence, I have not  
optimized for that use case.

Please note that the parser is not a conforming HTML5 parser but a  
special-purpose parser that is designed to work together with  
particular RELAX NG schemas for the specific purpose of conformance  
checking.

> 5) The same validator does not allow : or ? in either element or  
> attribute names, whereas the current HTML5 draft seems to allow all  
> Unicode characters except whitespace, <, >, = and /. Would someone  
> please clarify this?

*Conforming* element names and attributes happen to consist of ASCII- 
only name tokens without a colon. As an implementation detail, names  
that do not have such a form are caught early by the special-purpose  
parser.

This is done in order to
  1) prevent colonified names from entering into the namespace-aware  
SAX pipeline
  2) deal with case folding efficiently and in a way that prevents  
accidentally folding e.g. ?NPUT to input
  3) prevent names that are not well-formed XML names from entering  
into the SAX pipeline

> 6) According to the current draft, authors seem to have the  
> possibility to use custom element and attribute names of their choice.

Could you please cite the part of the spec that says so?

Such usage wasn't *conforming* when I last checked (a few months  
ago). Has the spec changed in a dramatic way when I wasn't looking?  
Note that not everything that results in a DOM according to the  
parsing algorithm is conforming.

The conformance checker is foremost for checking conformance.  
Supporting custom schemas for privately extended HTML5-like languages  
is a nice feature to have, but personally I am not at all sympathetic  
to extending HTML5 with names that contain non-ASCII (due to case  
folding issues), non-XML characters (due to XML serializability  
issues) or the colon (due to Namespaces in XML compatibility issues).

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
Received on Monday, 23 October 2006 04:43:40 UTC