- From: Michael[tm] Smith <mike@w3.org>
- Date: Thu, 3 Oct 2013 16:05:07 +0900
- To: Robin Berjon <robin@w3.org>
- Cc: "HTML WG (public-html@w3.org)" <public-html@w3.org>
Robin Berjon <robin@w3.org>, 2013-09-19 17:18 +0200: > I opened a bug about this[0] but I'd like it to see broader discussion. > > As per Web Components today[1], elements that contain a hyphen in their > names are clearly laid open for third-party extensibility (a few exceptions > are listed, grandfathered from MathML and SVG). > > I'd be more comfortable if HTML were the one to make that promise, and also > if it made it more explicitly. Furthermore, I think that the logical > conclusion from that is that where validators are concerned, elements > containing hyphens ought to always just be considered valid. I don't think that would be a good idea. The HTML spec defines the known element and attribute names recognized as being standard names in the HTML language and that are "portable" in the sense that among other things we know that UAs and other tools recognize them as such. And additional specifications such as the existing ITS2 and ARIA specs define other standard names for other specific purposes. Custom element names and attribute names that someone privately mints and uses in their own documents are not at all portable in the same sense and not standard names that are core to the HTML language or any other standard. So it's right that non-standard names be handled differently than standard names, and appropriate that conformance checkers should treat them differently than they do standard names. Validators shouldn't just silently ignore them or drop them on the floor. Conformance checkers should report non-standard names so a person checking a document can actually be aware it contains non-standard names. Then that person can decide what they choose to do about it (e.g., choose to ignore the reports because they're the ones who created the document and are already aware it contains custom names). To be clear, the person checking a document with a validator is not always the same person who created the document. See [A] at the end of this message for more details about validator implementation of what you're proposing. > Additionally, I believe we should make the same commitment for attributes. Why? What's the use case? It seems like you're scope-creeping your own proposal... Maybe I'm missing something but at least as far as I know, Custom Elements in particular and Web Components more broadly don't have any special need for attributes with hyphens. Anyway, unlike for the element-name case, for the attribute-name cases we already have standards that define a bunch of hyphen-containing attribute names -- specifically, the set of aria-* attributes defined in the ARIA spec, and the set of its-* attributes defined in the ITS2 spec. So if we allow any attribute containing a hyphen, we're going to make it a lot harder for authors to catch mistakes like ist-allowed-characters, aira-describedby, aria-describesby, aria-described-by, its-annotaters-ref, etc. Because it'll be a lot more difficult for tools to distinguish those typoed names from arbitrary custom hypen-containing attribute names. See [B] at the end of this message for more details about how this relates to data-* attributes and validator implementation of support for those. > not usable for general purpose extensions, and that's a good thing). > > Making hyphen attributes valid in all cases (minus existing ones such as > aria-* and those from SVG/MathML) would allow third parties to produce their > own using weak (but sufficient) namespacing (e.g. epub-*). Third parties can already do that. The ITS2 specification and ARIA specifications are examples of exactly that. We already provide just fine for that to happen, without needing to change the HTML spec to say that any arbitrary name containing a hyphen is valid. If for example the epub WG wants to have attributes that use the prefix epub-*, then they could produce a spec that defines the specific name for each new attribute, in the same way the ITS2 spec defines specific its-* attributes or the ARIA spec defines specific aria-* attributes. Then we can add a schema to the validator to check for those specific attribute names. > It won't hurt HTML since we're already committed to not use hyphens in > names anyway. > > Naturally, it would be useful to advise such third parties that they would > be better off getting any generally useful such attribute standardised. But > it provides a valuable escape hatch A really big escape hatch. Punching "a huge anything-goes hole in all of HTML". > for when there is no agreement to introduce such features into HTML. > Right now the alternatives are: > > 1) Use namespaces; bad idea, forces XHTML. > 2) Use data-$prefix-*; bad idea, not meant for that. There's a third alternative, which is that they just be treated as unknown names that are not part of the HTML language and not part of any other known specification. Which is exactly what they are. Then the validator will keep reporting them as unknown the same way it does now. > The impact on user agents is zero, only validators are impacted. The validation impact is large. But even larger is the impact on authors. I don't think allowing unknown non-standard element names (or attributes names) to be treated as valid simply because they have hyphens in their names would be a win for authors. We don't want to make it harder for authors to know when they have documents that contain names which aren't part of any standard, and we don't want to make it harder for authors to catch misspelled attribute names, and we don't want authors to end up being even further limited in the choice of tools they can use -- limited to only using tools that are complex enough to understand all the magic we're introducing. --Mike > [0] https://www.w3.org/Bugs/Public/show_bug.cgi?id=23254 > [1] https://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/custom/index.html#dfn-custom-element-type [A] With regard to implementation of names-containing-hyphens in conformance checkers, all the normal schema formalisms I'm aware of that are in wide use and that have any tool support -- RelaxNG, W3C XML Schema, DTDs -- don't provide any way to express "If an element name contains a hyphen, consider it valid." Most schema validation languages are grammar-based and the basic mechanism they work from is, they treat any element name as invalid by default (compare to "Deny: *" or whatever) and require you to explicitly state in the schema what element names are actually allowed, and exactly where they are allowed. And they require the element names to be stated explicitly; they don't provide for expressing wildcards or patterns for element names. So if you were to have the spec say that any element name containing a hyphen is valid, you would be introducing a condition that's not expressible in current schema languages, and so not checkable with most current off-the-shelf validation tools. [B] It's true we have sort of a precedent in the spec for wildcard names for attributes, in the form of data-* attributes. But that's actually a really different case than what you're proposing. Authors aren't going to get caught up with consequences of data-* attributes not being distinguishable from misspelled aria-* or its-* attributes. And some tools are capable of dealing with data-* attributes. I say some tools because it's true that at least the validator.nu code and W3C validator do treat data-* attributes as valid -- but that's possible only because we have custom Java code in the validator that causes the data-* attributes to actually be dropped from the document before the document is exposed to the core validation mechanism (a RelaxNG grammar that gets evaluated using James Clark's Jing too). And because we don't actually ever check the values of data-* attributes. But there are many other contexts other than the validator in which it would be useful to have some level of HTML conformance checking. For example, you might want to have HTML conformance checking in an interactive editing application -- anything from a text editor like Emacs to some WYSWIG Web-authoring application. And it's unreasonable to expect that all those tools will implement custom attribute-name-filtering mechanisms like the one I described above that we're using for data-* attributes. -- Michael[tm] Smith http://people.w3.org/mike
Received on Thursday, 3 October 2013 07:05:15 UTC