&foo= in attribute values (and why defining conformance matters)

On Tue, 2 Jun 2009, Simon Pieters wrote:
>
> On Mon, 01 Jun 2009 21:33:33 +0200, Ian Hickson <ian@hixie.ch> wrote:
> 
> > The reason to do this change is that authors make this mistake all the
> > time and yet it is not harmful. By making this change the only practical
> > effect is that authors will get fewer useless annoying errors out of
> > conformance checkers.
> > 
> > 
> > > > Supporting both '&' and ';' seems like a exercise in bug creation.
> > > > Parsing URIs is hard enough to do right as it is without making things
> > > > even more complicated and adding even more edge cases.
> > > 
> > > But that's exactly what you are doing, except here it applies to parsing
> > > href attributes, not URIs.
> > 
> > No, no change to the parsing rules was involved here.
> 
> Writing HTML documents seems to make this valid:
> 
>    <a href="&copy=">
> 
> and claims that the attribute value contains just text and no character 
> references (since character references end with ";").
> 
> Yet, Parsing HTML documents interprets the above the same as <a 
> href="©=">, as far as I can tell.

Oops, I forgot about that case. Ok, reverted the change.


On Tue, 2 Jun 2009, John Foliot wrote:
> 
> I pose a serious question: what is the real benefit of making unescaped 
> ampersands non-conformant? (Of making anything "non-conformant"?)

It defines what QA tools like conformance checkers should highlight as 
problems, as an aid to authors who wish to catch mistakes they did not 
intend. That's it.


> What, in practical terms, will it achieve - how will it modify author 
> behavior?

It's not intended to modify author behaviour, it's intended to help 
authors stay within safe boundaries.


> If there is not a significant penalty attached to non-conformant code, 
> why bother?

By sticking only to conforming content, authors get the following 
benefits (to name but a few):

 * More likely to have their content be accessible. For example, authors 
   will get notified when they use features like <font color=""> instead 
   of features like <h1>.

 * More likely to avoid unfortunate behaviour in tools. For example, by 
   making <i>p<b>q</i>r</b> non-conforming, we help authors who check 
   conformance avoid the cloning parsing behaviour that this triggers, 
   thus helping authors write pages that use less memory.

 * More likely to avoid making authoring mistakes that result in different 
   behaviour than they intended. For example, by making "&foo=" non- 
   conforming, authors that care about conformance are less likely to 
   accidentally write "&copy=" at some future point (which has a very 
   different meaning).

 * More likely to avoid hitting areas of the language that will change 
   meaning in future versions. For example, by making <color> 
   non-conforming, more authors will avoid using that element, thus if we 
   later introduce such an element, we will break fewer pages.

 * More likely to catch flat-out errors, such as having overlapping cells 
   in tables.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 12 June 2009 22:22:52 UTC