[whatwg] Custom elements and attributes from Henri Sivonen on 2006-10-31 (public-whatwg-archive@w3.org from October 2006)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 31 Oct 2006 13:46:15 +0200
Message-ID: <96817B1E-2290-4ABB-96FF-B4A1CC0CC4D6@iki.fi>
On Oct 31, 2006, at 01:03, ?istein E. Andersen wrote:

> On 23 Oct 2006, at 12:43PM, Henri Sivonen wrote:
>
>> Using custom schemas with the HTML parser is for experts only
>> and produces very wrong results unless the schema is suitable.
>
> Indeed so, but then any tool can potentially be misused.
> Still, I do realise that this is not a priority, of course.

It isn't about me being worried about misuse. Rather, I have not  
taken steps to prevent users of custom schemas from shooting  
themselves in the foot. (Taking those steps would involve a non- 
trivial amount of work.)

There are no gotchas with using a custom schema with the XML parser.  
There are also no gotchas in making a copy of one of the schemas that  
the service offers for use with the HTML parser and adding custom  
*attributes*, except the attributes have to be legal in XML also,  
constrained to ASCII, written in the schema in lower case and must  
not collide with case-folded or boolean attributes on other HTML  
elements.

If you add custom *elements* and use the HTML parser, the system does  
not ensure that the custom elements would not adversely interact with  
tag inference or error handling in browsers. That is, the schema  
might validate a tree, but there's no guarantee that you'd get the  
same tree in a browser. If you add custom elements, you just have to  
know what you are doing in order to keep the results useful for the  
purpose of authoring for browsers.

But in any case, using a custom schema is no longer checking HTML5  
conformance but checking your private dialect.

>> personally I am not at all sympathetic to extending HTML5 with  
>> names that
>> contain non-ASCII (due to case folding issues),
>
> It might be interesting to see how current browsers handle element  
> names
> containing such characters:

> The current draft seems to describe Firefox's behaviour on this point.

Which is good for security, since Unicode case folding involves  
security issues similar to non-shortest forms in UTF-8.

>> non-XML characters (due to XML serializability issues)
>
> Which are those characters? Do you mean <, >, ", ' and &?

I mean characters that do not match the production named Char in XML  
1.0.
http://www.w3.org/TR/REC-xml/#NT-Char
For example, \0, form feed and U+FFFF are non-XML characters.

Of course, the production is rather arbitrary, but XML 1.0 is written  
in stone.

Actually, I should have said that the minimum condition that I think  
is necessary for a name of a custom attribute or element to be  
reasonable is that the name matches the NCName production from  
Namespaces in XML 1.0 and only contains characters from the Basic  
Latin (ASCII) block.
http://www.w3.org/TR/REC-xml-names/#NT-NCName

The NCName production is arbitrary, too, but, again, Namespaces in  
XML 1.0 is written in stone.

>> Any attribute or element not specifically allowed in the spec is  
>> non-conforming.
>> Therefore, all "custom attributes" and "custom elements" are non- 
>> conforming.
>
> Custom attributes are (I believe, though I do not have any  
> statistics to support this) quite common in the wild

I don't know how common they are.

> and can certainly be useful in combination with
> scripting. Furthermore, current browsers handle custom attributes  
> effortlessly.

On these points, I agree.

> I therefore find it unfortunate that custom attributes are not  
> allowed in a
> conforming HTML5 document.

It does not necessarily follow that custom attributes have to be  
conforming. The alternative is that advanced scripters make an  
informed decision not to conform in a harmless way at a particular  
point.

Not that I like designing specs to be violated in an informed way,  
but the alternative is not that elegant, either.

> Still, allowing /any/ attribute name would of course
> make it impossible to add new attributes later on (HTML6?);

Another problem is that making a conformance checker silently pass  
unknown attributes would also make it useless in catching typos in  
attribute names.

> that is why I
> propose explicitly to reserve attribute names starting with  
> "x-" (inspired by
> codes for custom languages, but any prefix would be fine) for use by
> authors and to make documents containing custom attributes of this  
> form fully
> conforming.

That could work. In my case, I could put a filter between the parser  
and the rest of the conformance checking back end and drop "x-"  
attributes. It would probably cause the addition of one more checkbox  
in the UI, though.

However, I'd expect XML folks to scream, because their wildcard  
tooling is tuned for unknown namespaces rather than magic prefixes  
within the local name.

> Ideally, I would like the same principle to apply for element  
> names; such
> elements should probably be parsed as phrasing elements and be  
> allowed to
> contain strictly inline-level content only to be conforming.

Given the off-the-shelf technologies that I have chosen for the  
conformance checker, I don't see an *elegant* way to implement that.  
I do see an inelegant way, though, but it would produce confusing  
error messages unless fixed with even more inelegance. (See point  
about XML tooling above.) Of course, it doesn't follow that the spec  
couldn't go there.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 31 October 2006 03:46:15 UTC