- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 31 Oct 2006 13:46:15 +0200
On Oct 31, 2006, at 01:03, ?istein E. Andersen wrote: > On 23 Oct 2006, at 12:43PM, Henri Sivonen wrote: > >> Using custom schemas with the HTML parser is for experts only >> and produces very wrong results unless the schema is suitable. > > Indeed so, but then any tool can potentially be misused. > Still, I do realise that this is not a priority, of course. It isn't about me being worried about misuse. Rather, I have not taken steps to prevent users of custom schemas from shooting themselves in the foot. (Taking those steps would involve a non- trivial amount of work.) There are no gotchas with using a custom schema with the XML parser. There are also no gotchas in making a copy of one of the schemas that the service offers for use with the HTML parser and adding custom *attributes*, except the attributes have to be legal in XML also, constrained to ASCII, written in the schema in lower case and must not collide with case-folded or boolean attributes on other HTML elements. If you add custom *elements* and use the HTML parser, the system does not ensure that the custom elements would not adversely interact with tag inference or error handling in browsers. That is, the schema might validate a tree, but there's no guarantee that you'd get the same tree in a browser. If you add custom elements, you just have to know what you are doing in order to keep the results useful for the purpose of authoring for browsers. But in any case, using a custom schema is no longer checking HTML5 conformance but checking your private dialect. >> personally I am not at all sympathetic to extending HTML5 with >> names that >> contain non-ASCII (due to case folding issues), > > It might be interesting to see how current browsers handle element > names > containing such characters: > The current draft seems to describe Firefox's behaviour on this point. Which is good for security, since Unicode case folding involves security issues similar to non-shortest forms in UTF-8. >> non-XML characters (due to XML serializability issues) > > Which are those characters? Do you mean <, >, ", ' and &? I mean characters that do not match the production named Char in XML 1.0. http://www.w3.org/TR/REC-xml/#NT-Char For example, \0, form feed and U+FFFF are non-XML characters. Of course, the production is rather arbitrary, but XML 1.0 is written in stone. Actually, I should have said that the minimum condition that I think is necessary for a name of a custom attribute or element to be reasonable is that the name matches the NCName production from Namespaces in XML 1.0 and only contains characters from the Basic Latin (ASCII) block. http://www.w3.org/TR/REC-xml-names/#NT-NCName The NCName production is arbitrary, too, but, again, Namespaces in XML 1.0 is written in stone. >> Any attribute or element not specifically allowed in the spec is >> non-conforming. >> Therefore, all "custom attributes" and "custom elements" are non- >> conforming. > > Custom attributes are (I believe, though I do not have any > statistics to support this) quite common in the wild I don't know how common they are. > and can certainly be useful in combination with > scripting. Furthermore, current browsers handle custom attributes > effortlessly. On these points, I agree. > I therefore find it unfortunate that custom attributes are not > allowed in a > conforming HTML5 document. It does not necessarily follow that custom attributes have to be conforming. The alternative is that advanced scripters make an informed decision not to conform in a harmless way at a particular point. Not that I like designing specs to be violated in an informed way, but the alternative is not that elegant, either. > Still, allowing /any/ attribute name would of course > make it impossible to add new attributes later on (HTML6?); Another problem is that making a conformance checker silently pass unknown attributes would also make it useless in catching typos in attribute names. > that is why I > propose explicitly to reserve attribute names starting with > "x-" (inspired by > codes for custom languages, but any prefix would be fine) for use by > authors and to make documents containing custom attributes of this > form fully > conforming. That could work. In my case, I could put a filter between the parser and the rest of the conformance checking back end and drop "x-" attributes. It would probably cause the addition of one more checkbox in the UI, though. However, I'd expect XML folks to scream, because their wildcard tooling is tuned for unknown namespaces rather than magic prefixes within the local name. > Ideally, I would like the same principle to apply for element > names; such > elements should probably be parsed as phrasing elements and be > allowed to > contain strictly inline-level content only to be conforming. Given the off-the-shelf technologies that I have chosen for the conformance checker, I don't see an *elegant* way to implement that. I do see an inelegant way, though, but it would produce confusing error messages unless fixed with even more inelegance. (See point about XML tooling above.) Of course, it doesn't follow that the spec couldn't go there. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 31 October 2006 03:46:15 UTC