Re: Making all elements and attributes that contain hyphens valid from Smylers on 2013-10-16 (public-html@w3.org from October 2013)

From: Smylers <Smylers@stripey.com>
Date: Wed, 16 Oct 2013 17:30:51 +0100
To: public-html@w3.org
Message-ID: <20131016163051.GB2199@stripey.com>
Michael[tm] Smith writes:

> Smylers <Smylers@stripey.com>, 2013-10-16 14:10 +0100:
> 
> > Michael[tm] Smith writes:
> >
> > > If you look at it from the perspective of authors using the
> > > attributes from your library, I think the question should be, why
> > > would authors want to use data-xxx-foo rather than xxx-foo.
> > 
> > It gives them the assurance that those attributes won't clash with
> > future additions to HTML, nor with any other applicable standard.
> 
> I think in practice the library authors don't need to use the data-
> prefix to provide that kind of assurance, and authors using the
> libraries don't place much value on that assurance. Consider the case
> of Angular's ng- prefixed attributes. I've not seen suggestions that
> things would be better for authors if Angular were using a data-ng-
> prefix instead.

If HTML in the future introduces ng-foo attributes with conflicting
meaning, then authors will be inconvenienced (and sites written now but
no longer actively maintained will break). It may be that many authors
today haven't considered that situation, or are busy with immediate
issues so aren't prioritizing avoiding future inconvenience.

But whether authors currently appreciate this protection or not doesn't
affect that they are protected by it, and doesn't mean that they won't
be irritated if things break because libraries didn't follow the
guidance that would've protected them.

> > They can safely use them, knowing that no user agent is going to
> > interpret the attribute at all (so there's no risk of it
> > interpreting it in an unintended way).
> 
> I think in practice authors can also safely use, e.g., Angular's ng-
> attributes while knowing that no user agent is going to interpret the
> attribute at all.

Yes, AngularJS is sufficiently well known and widespread that HTML wouldn't
ever create new attributes that clash with it. So in reality this makes
ng-* reserved. And in some ways, the AngularJS documentation has
effectively becomes an ‘applicable specification’ for many web
developers. In which case ng-* attributes are valid by anybody who
recognizes that.

But for a less-well-known library, that wouldn't necessarily be the
case. It isn't realistic for HTML to avoid any attribute name that may
have been used by any library anywhere. And there are many more
lesser-known libraries than popular ones.

So it still seems safer for a library to choose to use data-foo-* and
get the clash-protection that provides.

> > The data-* namespace is safe for the site (including any libraries
> > it uses) to do whatever it wants with, in the way that arbitrary
> > attributes can't be.
> 
> As far as I can see, the only way in which the data-* namespace is
> guaranteed to be safe is in the sense that UAs are prohibited from
> implementing any native support for particular data- prefixed
> attributes.

Yes. And future additions to the HTML spec won't use attributes called
data-*, because those are already allocated to authors.

> But I can't see that data-foo-* attributes are any more safe than
> foo-* attributes in any other ways. For example, using a data-foo-*
> prefixed name in your site content doesn't ensure that you'll never
> run into a name- collision problem of some other library coming along
> later that you want to use in your site too but that has data-foo-*
> names for different purposes.

Indeed not. But in that case the clash is within code that you are
deploying and is fully under your control. Even if neither library
provides the ability to set a different prefix, you could locally fork
one of them to do so.

But suppose instead that the library you're using with the foo-*
attributes remains a very niche technology, with hardly anybody using
it. Meanwhile, another library, which also uses foo-* attributes becomes
very well known. So well known in fact that it's decided to add its
features directly to HTML, so that user agents can provide the behaviour
directly, without requiring loading a JavaScript library. And various
foo-* attributes are added to the HTML spec.

When browsers are updated and start interpreting those foo-* attributes
in accordance with the HTML spec, that will break your site, which was
using them for a conflicting purpose.

If your site used data-foo-* attributes instead, that couldn't happen.
The attributes added to HTML would have different names, so your site
would continue to work fine.

> > (Or, if all attributes with hyphens in were deemed safe for authors
> > to use like this,
> 
> What is "like this"? If you mean, if all attributes with hyphens were
> deemed safe from every having UAs implement any native support for them
> (the way that data-* attributes are),

Yes.

> then I don't think anybody so far has proposed that.

Well the subject line of this thread is proposing making them all be
valid. It would be weird to have a spec that says “these are all valid
for you to do what you want with them, but we might change our minds on
that in future and retrospectively assign standard behaviours to
attributes which we said you could use for your own purposes”.

‘Valid but no guarantee of being usable’ doesn't seem like an
improvement over the current ‘invalid’.

> > that would prevent any future applicable spec from defining any new
> > attributes with hyphens in them: they may clash with attributes
> > authors or libraries are already using. That'd either be a massive
> > restriction on future applicable specs, preventing any of them from
> > using their own foo-* prefix on attributes.)
> 
> I think in practice the way we'd avoid such naming collisions is by
> trying to use names that aren't known to have any conflicts with names
> that are already in use.

Any name used by any library, really?

What constitutes a library, anyway? If I stick some JavaScript functions
I use on a site in a separate file, does that make it a library? If I
use that on multiple sites, is it a library then? If I let a friend use
it on a site, is that a library?

> And to help with that we could even provide some kind of registry for
> prefixes. I think such a registry has already been suggested.

If such a registry existed, it would effectively be a specification
extending HTML with various new attributes. They would cease to be
unknown attributes.

There's a big difference between ‘here is a finite list of attribute
prefixes; prefixes are only valid if they are on the list’ and ‘all
attribute names with hyphens in them are fair game’.

> > > Along with that, by providing authors with data-xxx-foo attributes
> > > to use instead of xxx-foo attributes, you're making things harder
> > > for authors to catch some kinds of authoring mistakes they might
> > > make. The only way they're like to be able to catch
> > > syntax/datatype errors with data-xxx-foo is by testing the code by
> > > running and hoping that it will fail in some obvious way if they
> > > have a syntax error in a data-xxx-foo value.
> > 
> > I don't see how that follows. In order for a validator to assist
> > authors using libraries, it needs to know specifically about each
> > library and what values are valid for its attributes. For each
> > library that a validator knows about, it could provide options for
> > users to indicate that the site is using, say, the Kapow library,
> > and that the prefix in use for that library's attributes is
> > data-kapow-.
> > 
> > Then the validator can check the values, just as it could if the
> > attributes were called kapow-*.
> 
> That becomes more difficult if by default the validator is essentially
> ignoring data-* attributes altogether.

I'm presuming you're meaning more difficult for the validator
developers. In which case, that's possibly true, but one-off work for a
small number of validator developers is surely less of a priority than
the ongoing inconvenience to authors and breakage of sites if attributes
used by their libraries end up clashing with future HTML attributes.

> Which is a reasonable thing for the validator to do, given that the
> spec says, "These attributes are not intended for use by software that
> is independent of the site that uses the attributes."

That's a little bit circular: I mentioned a potential change in
validator behaviour, possibly with a tweak to the rules on validator
conformance. Obviously it's reasonable for validators not to be already
doing what I mentioned — if they were, then there wouldn't've been any
point in my bringing it up!

> So to do what you describe, we'd need to tell the validator, "Ignore
> all data-* attributes except ones that start with data-kapow-,
> data-ng-, etc."

Yes.

Or possibly: ‘When you encounter a data-* attribute, add it to a
separate list of them. At the end, iterate through the list of data-*
attributes to see if any of them are ones you've been told to validate,
and if so check their syntax. Don't report on any of the others.’

Whether you ignore them initially or avoid reporting on their presence
at the end doesn't make any difference.

> And regardless I think having validator support for checking values of
> particular data-* attribute values goes against the semantics and
> authoring expectations for data-* attributes in the same way that
> providing support for checking values of the class attribute would.

I don't see how that would still be true in the situation where an
author had specifically requested checking for a particular set of
data-foo-* attributes, as you appeared to agree:

> > If an author is running a validator on her own site, and has
> > configured that validator to know that particular data-* attributes
> > should obey particular syntax, I think it could be claimed that that
> > run of the validator is effectively part of the author's site: the
> > meaning of the attributes is only known because the author has baked
> > this into the validator in an out-of-band way; the validator
> > wouldn't be deriving any meaning from them based purely on their
> > names.
> 
> Sure, in that hypothetical case.

Good.

> > then we could change HTML to make an explicit affordance for the
> > case of a validators looking at particular data-* elements when
> > instructed to do so by authors.
> 
> I think it'd make as much sense to do that for data-* attributes as
> they are defined now as it would to add some specific affordance in
> the spec for looking at particular class values when instructed to do
> so by authors.

Yes.

Most class names aren't meaningful, but some libraries use particular
class names, or a particular structure of class names, to function. It
seems reasonable to me for a validator, when told to check the page for
use of a particular library, would check class names.

Cheers

Smylers
-- 
Stop drug companies hiding negative research results.
Sign the AllTrials petition to get all clinical research results published.
Read more: http://www.alltrials.net/blog/the-alltrials-campaign/
Received on Wednesday, 16 October 2013 17:16:25 UTC