W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > March 2010

[Bug 9352] New: Make unescaped & conforming in attribute values in some cases

From: <bugzilla@wiggum.w3.org>
Date: Sat, 27 Mar 2010 22:41:36 +0000
To: public-html-bugzilla@w3.org
Message-ID: <bug-9352-2486@http.www.w3.org/Bugs/Public/>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9352

           Summary: Make unescaped & conforming in attribute values in some
                    cases
           Product: HTML WG
           Version: unspecified
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec bugs
        AssignedTo: dave.null@w3.org
        ReportedBy: mjs@apple.com
         QAContact: public-html-bugzilla@w3.org
                CC: ian@hixie.ch, mike@w3.org, public-html@w3.org


HTML syntax and URL syntax have an unfortunate conflict. HTML interprets & as
the start of an entity reference, while in URLs it has special meaning as a
separator in the query portion of a URL.

HTML5 disallows the & character in attribute values unless it is actually the
start of an entity reference. That means markup like this is nonconforming:

<a href="http://images.google.com/imghp?hl=en&tab=wi">

In this specific case, there is no change that &tab= could be mistaken for an
entity reference, and parsing will proceed exactly as the author expects.

The spec explains that the reason for this syntax error is markup fragility:

"For example, the parsing of certain named character references in attributes
happens even with the closing semicolon being omitted. It is safe to include an
ampersand followed by letters that do not form a named character reference, but
if the letters are changed to a string that does form a named character
reference, they will be interpreted as that character instead."

http://dev.w3.org/html5/spec/Overview.html#conformance-requirements-for-authors

However, for an author to be aware of this kind of error, they must be
regularly using a conformance checker (or equivalently, a tool that ensures
conformance at the output stage). Then the conformance checker can tell them if
they have used a construct that actually will be interpreted as an entity
reference, rather than merely one that might be, if edited.

As a result of getting the error, authors who want the full benefits of
conformance checking must write in a more awkward style, and must bloat their
markup by replacing instances of "&" with "&amp;".

7 of the Alexa top 15 sites have this error:
http://www.w3.org/html/wg/wiki/index.php?title=HTML5_Authoring_Conformance_Study

In many cases it appears an inordinate number of times, close to 100, and is
the single most frequent error on the site.

It seems that many authors, even on prominent sites, have not found the markup
bloat and awkward syntax of consistently using &amp; to be a cost worth paying
for the benefit of speculatively avoiding future errors.

Thus, I think HTML5 should reconsider and only make href="&foo=" an error in
the case where foo is an entity name, since that is the only case where author
expectations will actually be defeated.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Saturday, 27 March 2010 22:41:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 27 March 2010 22:41:38 GMT