Re: Making all elements and attributes that contain hyphens valid from Michael[tm] Smith on 2013-10-03 (public-html@w3.org from October 2013)

From: Michael[tm] Smith <mike@w3.org>
Date: Thu, 3 Oct 2013 16:05:07 +0900
To: Robin Berjon <robin@w3.org>
Cc: "HTML WG (public-html@w3.org)" <public-html@w3.org>
Message-ID: <20131003070506.GB2302@sideshowbarker>
Robin Berjon <robin@w3.org>, 2013-09-19 17:18 +0200:

> I opened a bug about this[0] but I'd like it to see broader discussion.
> 
> As per Web Components today[1], elements that contain a hyphen in their
> names are clearly laid open for third-party extensibility (a few exceptions
> are listed, grandfathered from MathML and SVG).
> 
> I'd be more comfortable if HTML were the one to make that promise, and also
> if it made it more explicitly. Furthermore, I think that the logical
> conclusion from that is that where validators are concerned, elements
> containing hyphens ought to always just be considered valid.

I don't think that would be a good idea.

The HTML spec defines the known element and attribute names recognized as
being standard names in the HTML language and that are "portable" in the
sense that among other things we know that UAs and other tools recognize
them as such. And additional specifications such as the existing ITS2 and
ARIA specs define other standard names for other specific purposes.

Custom element names and attribute names that someone privately mints and
uses in their own documents are not at all portable in the same sense and
not standard names that are core to the HTML language or any other standard.

So it's right that non-standard names be handled differently than standard
names, and appropriate that conformance checkers should treat them
differently than they do standard names. Validators shouldn't just silently
ignore them or drop them on the floor.

Conformance checkers should report non-standard names so a person checking
a document can actually be aware it contains non-standard names. Then that
person can decide what they choose to do about it (e.g., choose to ignore
the reports because they're the ones who created the document and are already
aware it contains custom names). To be clear, the person checking a document
with a validator is not always the same person who created the document.

See [A] at the end of this message for more details about validator
implementation of what you're proposing.

> Additionally, I believe we should make the same commitment for attributes.

Why? What's the use case?

It seems like you're scope-creeping your own proposal... Maybe I'm missing
something but at least as far as I know, Custom Elements in particular and Web
Components more broadly don't have any special need for attributes with hyphens.

Anyway, unlike for the element-name case, for the attribute-name cases we
already have standards that define a bunch of hyphen-containing attribute
names -- specifically, the set of aria-* attributes defined in the ARIA
spec, and the set of its-* attributes defined in the ITS2 spec.

So if we allow any attribute containing a hyphen, we're going to make it a
lot harder for authors to catch mistakes like ist-allowed-characters,
aira-describedby, aria-describesby, aria-described-by, its-annotaters-ref,
etc. Because it'll be a lot more difficult for tools to distinguish those
typoed names from arbitrary custom hypen-containing attribute names.

See [B] at the end of this message for more details about how this relates
to data-* attributes and validator implementation of support for those.

> not usable for general purpose extensions, and that's a good thing).
> 
> Making hyphen attributes valid in all cases (minus existing ones such as
> aria-* and those from SVG/MathML) would allow third parties to produce their
> own using weak (but sufficient) namespacing (e.g. epub-*).

Third parties can already do that. The ITS2 specification and ARIA
specifications are examples of exactly that. We already provide just fine
for that to happen, without needing to change the HTML spec to say that any
arbitrary name containing a hyphen is valid.

If for example the epub WG wants to have attributes that use the prefix
epub-*, then they could produce a spec that defines the specific name for
each new attribute, in the same way the ITS2 spec defines specific its-*
attributes or the ARIA spec defines specific aria-* attributes. Then we can
add a schema to the validator to check for those specific attribute names.

> It won't hurt HTML since we're already committed to not use hyphens in
> names anyway.
> 
> Naturally, it would be useful to advise such third parties that they would
> be better off getting any generally useful such attribute standardised. But
> it provides a valuable escape hatch

A really big escape hatch. Punching "a huge anything-goes hole in all of HTML".

> for when there is no agreement to introduce such features into HTML.
> Right now the alternatives are:
> 
>     1) Use namespaces; bad idea, forces XHTML.
>     2) Use data-$prefix-*; bad idea, not meant for that.

There's a third alternative, which is that they just be treated as unknown
names that are not part of the HTML language and not part of any other
known specification. Which is exactly what they are. Then the validator
will keep reporting them as unknown the same way it does now.

> The impact on user agents is zero, only validators are impacted.

The validation impact is large. But even larger is the impact on authors.
I don't think allowing unknown non-standard element names (or attributes
names) to be treated as valid simply because they have hyphens in their
names would be a win for authors.

We don't want to make it harder for authors to know when they have
documents that contain names which aren't part of any standard, and we
don't want to make it harder for authors to catch misspelled attribute
names, and we don't want authors to end up being even further limited in
the choice of tools they can use -- limited to only using tools that are
complex enough to understand all the magic we're introducing.

  --Mike

> [0] https://www.w3.org/Bugs/Public/show_bug.cgi?id=23254
> [1] https://dvcs.w3.org/hg/webcomponents/raw-file/tip/spec/custom/index.html#dfn-custom-element-type

[A] With regard to implementation of names-containing-hyphens in
conformance checkers, all the normal schema formalisms I'm aware of that
are in wide use and that have any tool support -- RelaxNG, W3C XML Schema,
DTDs -- don't provide any way to express "If an element name contains a
hyphen, consider it valid."

Most schema validation languages are grammar-based and the basic mechanism
they work from is, they treat any element name as invalid by default
(compare to "Deny: *" or whatever) and require you to explicitly state in
the schema what element names are actually allowed, and exactly where they
are allowed. And they require the element names to be stated explicitly;
they don't provide for expressing wildcards or patterns for element names.

So if you were to have the spec say that any element name containing a
hyphen is valid, you would be introducing a condition that's not
expressible in current schema languages, and so not checkable with most
current off-the-shelf validation tools.

[B] It's true we have sort of a precedent in the spec for wildcard names
for attributes, in the form of data-* attributes. But that's actually a
really different case than what you're proposing. Authors aren't going to
get caught up with consequences of data-* attributes not being
distinguishable from misspelled aria-* or its-* attributes. And some tools
are capable of dealing with data-* attributes.

I say some tools because it's true that at least the validator.nu code and
W3C validator do treat data-* attributes as valid -- but that's possible
only because we have custom Java code in the validator that causes the
data-* attributes to actually be dropped from the document before the
document is exposed to the core validation mechanism (a RelaxNG grammar
that gets evaluated using James Clark's Jing too). And because we don't
actually ever check the values of data-* attributes.

But there are many other contexts other than the validator in which it
would be useful to have some level of HTML conformance checking. For
example, you might want to have HTML conformance checking in an interactive
editing application -- anything from a text editor like Emacs to some
WYSWIG Web-authoring application. And it's unreasonable to expect that
all those tools will implement custom attribute-name-filtering mechanisms
like the one I described above that we're using for data-* attributes.

-- 
Michael[tm] Smith http://people.w3.org/mike
Received on Thursday, 3 October 2013 07:05:15 UTC