Re: Making all elements and attributes that contain hyphens valid from Michael[tm] Smith on 2013-10-04 (public-html@w3.org from October 2013)

From: Michael[tm] Smith <mike@w3.org>
Date: Fri, 4 Oct 2013 17:21:50 +0900
To: Jirka Kosek <jirka@kosek.cz>
Cc: Robin Berjon <robin@w3.org>, "HTML WG (public-html@w3.org)" <public-html@w3.org>
Message-ID: <20131004082146.GO7778@sideshowbarker>
Hi Jirka,

Jirka Kosek <jirka@kosek.cz>, 2013-10-03 12:40 +0200:

> On 3.10.2013 9:05, Michael[tm] Smith wrote:
> > The HTML spec defines the known element and attribute names recognized as
> > being standard names in the HTML language and that are "portable" in the
> > sense that among other things we know that UAs and other tools recognize
> > them as such. And additional specifications such as the existing ITS2 and
> > ARIA specs define other standard names for other specific purposes.
> 
> OTOH, Web Components can attach behaviour to such elements and then
> elements become portable even if there is no additional specification
> like ARIA/ITS2 which defines behaviour of such elements.

Sure, but having a standard cross-browser way to attach behavior to custom
elements with arbitrary names doesn't make them any less custom and doesn't
make their arbitrary names any more standard.

And it also doesn't make them portable in the same sense as standard HTML
elements unless along with our browsers all our editing apps and all other
HTML tools also implement support for Web Components and Custom Elements.

> Definition is in code only. I understand your concerns, but I think that
> web platform needs more open extensibility story, something pioneered by
> XBL1/2 and now proposed by Web Components.

Of course it does. But it doesn't follow at all that in order to have that,
we must also have the HTML spec require that all arbitrary element names
containing hyphens be treated as valid.

> > Conformance checkers should report non-standard names so a person checking
> > a document can actually be aware it contains non-standard names. Then that
> > person can decide what they choose to do about it (e.g., choose to ignore
> > the reports because they're the ones who created the document and are already
> > aware it contains custom names). To be clear, the person checking a document
> > with a validator is not always the same person who created the document.
> 
> We can solve this by having several conformance levels. Strict level
> will allow only HTML (+ other well recognized vocabularies like
> MathML/SVG/...) elements and extended level could allow
> these-dash-based-extensions.

That seems to me like a cure that's worse than the disease. Or a "and now
you have two problems" kind of solution. At least as far as what the spec
has to say about conformance levels.

What tools provide is another story.

For example, by default the validator checks documents against the HTML
standard and other standards (SVG, MathML, ARIA, ITS2) that define markup
that can be included in HTML documents. But that doesn't prevent us from
providing an additional non-default option for also checking documents that
contain non-standard markup that's nevertheless in wide use (e.g.,
documents containing AngularJS directives). We don't need for the HTML spec
to change in order to provide that option.

> > It seems like you're scope-creeping your own proposal... Maybe I'm missing
> > something but at least as far as I know, Custom Elements in particular and Web
> > Components more broadly don't have any special need for attributes with hyphens.
> 
> Such attributes are already used in wild, for example see AngularJS
> (http://angularjs.org/) and its ng-* attributes.
> 
> It would be nice if editing AngularJS injected HTML template wouldn't
> put red underling to every second attribute inside code editor.

I don't have any brilliant ideas for solving that problem but I can say
that ignoring all attributes containing hyphens is also not a brilliant
solution to it.

If you were authoring a document containing ITS2 markup and you had a
misspelled attribute like "ist-allowed-characters", I think you'd want that
to show up with a red underline so you could correct it.

(Also I think if you're editing files in your HTML editor for use with some
other existing client-side UI-data-binding/templating/MVC libraries out
there, your HTML editor is going to choke on other parts of their template
syntax long before it gets around to underlining attribute names in red.)

> > So if we allow any attribute containing a hyphen, we're going to make it a
> > lot harder for authors to catch mistakes like ist-allowed-characters,
> > aira-describedby, aria-describesby, aria-described-by, its-annotaters-ref,
> > etc. Because it'll be a lot more difficult for tools to distinguish those
> > typoed names from arbitrary custom hypen-containing attribute names.
> 
> This could be at least partially solved by having several conformance
> checking levels or by requirement do declare somehow which extensions
> page tries to use -- for example having <link> or <meta> element which
> say that page wants to use particular prefix. (Welcome back, namespace
> prefix declarations :-)

That doesn't sound pretty or elegant. IMHO it's far easier and better to
just provide some option in tools for those who want it. We don't have to
solve every problem through adding more markup.

> > If for example the epub WG wants to have attributes that use the prefix
> > epub-*, then they could produce a spec that defines the specific name for
> > each new attribute, in the same way the ITS2 spec defines specific its-*
> > attributes or the ARIA spec defines specific aria-* attributes. Then we can
> > add a schema to the validator to check for those specific attribute names.
> 
> My impression is that with Web Components many libraries similar to
> AngularJS will emerge,

I'm not sure that Web Components getting implemented is necessarily going
to cause such an explosion. If anything I'd think it'd result in libraries
introducing more custom element names -- not more custom attribute names --
than we already have in libraries now.

Clearly at least the AngularJS developers don't need for Web Components to
be implemented in order for them to be minting new attribute names.

And I'm not familiar with a lot of libraries in this space, but it seems to
me that AngularJS is exceptional in the degree to which it makes use of
custom attribute names. I'm not sure other libraries use them as much.

> each with its own set of element/attributes.
> Given potential number of such libraries and their development cycle it
> would be quite heroic effort to keep validator schemas updated.

I think there are always only a discrete number of such libraries in wide
use. We wouldn't need to support every single library that somebody wanted
to come up with that used tons of customer attributes.

Right now it seems to me we have three relevant libraries in this space
that are in widest use: AngularJS, Backbone.js, and Ember.js.

I think AngularJS defines about 50 directives that can take the form of
ng-* attributes. I'm not familiar enough with Backbone or Ember to know if
they do something similar but my impression is that they don't. But even if
they did, I'd expect it wouldn't be more than AngularJS defines.

So, thus far the problem of providing a way to recognize names of custom
attributes out there that are actually widely uses seems pretty manageable.

I could for example do it in the validator by adding a non-default option
that users can choose if they have docs containing AngularJS attributes or
attributes from any other common libraries that mint custom attributes.
Behind the scenes it would rely on a schema that defines the attributes.
Other tool makers like BlueGriffon could reuse that schema or add some
other way to provide specific support for AngularJS attribute names.

That approach seems to me a way better solution than punching "a huge
anything-goes hole in all of HTML" to say that all attribute names
containing hyphens should just be ignored.

> > There's a third alternative, which is that they just be treated as unknown
> > names that are not part of the HTML language and not part of any other
> > known specification. Which is exactly what they are. Then the validator
> > will keep reporting them as unknown the same way it does now.
> 
> I have one concern. Look back to 2000. XHTML was promoted as an
> *eXtensible* version of HTML. New elements could be used (in their own
> namespace) and that should lead to controlled innovation which has some
> bounds preventing clashes with existing and future standard HTML element
> names. Unfortunately XHTML spec was actually disallowing such usage, DTD
> validation prevented using such extensions. The outcome is well known --
> XHTML provided more strict syntax, no new features, just problems in
> legacy browsers.

You're making a big leap there. That XHTML approach didn't fail because DTD
validation prevented it. That failed because it was a bad fit for the Web
and for Web browsers, and because Namespaces in XML is itself a bad fit for
the Web and for Web browsers. And XHTML failed overall because it was
driven by an obsession with syntax and with things like modularization
instead of with solving actual real problems.

Unlike that we all here are trying to solve real problems. And to be clear,
I'm not suggesting we should be restricting features we add to HTML to only
being features that are expressible in some existing formalism like
RelaxNG. But I am saying that if we're going to do that we should make sure
we have a really compelling reason for doing it.

> I think that we should try to seek balance between strict approach (only
> elements baked with W3C spec are recognized) and lenient approach
> (any-dashed-markup-is-allowed).

We can do that in tools without baking it into the spec.

> > So if you were to have the spec say that any element name containing a
> > hyphen is valid, you would be introducing a condition that's not
> > expressible in current schema languages, and so not checkable with most
> > current off-the-shelf validation tools.
> 
> In a long term I can imagine that RELAX NG 2.0 can support this if there
> is strong enough use-case.

I'm not going to hold my breath on RELAX NG 2.0 appearing, nor on any other
kind of general formalism being the solution.

> There are already plans to lift existing
> restriction on name classes to be full patterns in RELAX NG
> (https://lists.oasis-open.org/archives/relax-ng/200611/msg00025.html).
> Such change would actually simplify adding new extensions like ITS2/ARIA
> into existing HTML schemas a lot.

Yeah, it would definitely be nice to have for a lot of cases.

> But as there is no big demand for such new features
> (https://wiki.oasis-open.org/relax-ng/FutureRequirements) development of
> RELAX NG 2.0 is stalled now.
> 
> > But there are many other contexts other than the validator in which it
> > would be useful to have some level of HTML conformance checking. For
> > example, you might want to have HTML conformance checking in an interactive
> > editing application -- anything from a text editor like Emacs to some
> > WYSWIG Web-authoring application. And it's unreasonable to expect that
> > all those tools will implement custom attribute-name-filtering mechanisms
> > like the one I described above that we're using for data-* attributes.
> 
> Although currently this is not supported by tools, I have no doubts that
> wide usage of such extension elements/attributes in HTML5 will lead to
> change in tools. It will take some time of course.

Right. My argument is, Let's please try not do things that make it take
even more time for tool vendors to catch up with changes we've been adding
in the language, unless it's something that really seems necessary.

  --Mike

-- 
Michael[tm] Smith http://people.w3.org/mike
Received on Friday, 4 October 2013 08:22:05 UTC