W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > March 2010

[Bug 9351] New: Do not interpret & followed by an entity name followed by = as an entity reference in attribute values

From: <bugzilla@wiggum.w3.org>
Date: Sat, 27 Mar 2010 22:41:00 +0000
To: public-html-bugzilla@w3.org
Message-ID: <bug-9351-2486@http.www.w3.org/Bugs/Public/>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9351

           Summary: Do not interpret & followed by an entity name followed
                    by = as an entity reference in attribute values
           Product: HTML WG
           Version: unspecified
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec bugs
        AssignedTo: dave.null@w3.org
        ReportedBy: mjs@apple.com
         QAContact: public-html-bugzilla@w3.org
                CC: ian@hixie.ch, mike@w3.org, public-html@w3.org


It's been suggested that an unterminated entity (one not followed by a
semicolon) that is followed by an equal sign in an attribute value should not
be treated as an entity reference.

It seems that rather few pages overall would be affected by changing this, one
study found 50 occurrences  out of approximately 425k pages:
http://lists.w3.org/Archives/Public/public-html/2009Jun/0463.html

It was also reported that most of these occurrences appeared to be cases where
the author did not expect their text to be interpreted as an entity reference,
and review of these 50 instances seems to confirm that impression.

It seems like there is at least some content that would be broken by changing
the interpretation:
http://lists.w3.org/Archives/Public/public-html/2009Jul/0421.html

Specifically, it seems that &amp= may occasionally be intended as "&amp;"
rather than as "&amp;amp=" or "&amp;=".

On the whole, it still seems like the proposed change would still fix more
content than it breaks.

One possible variable is to exclude &amp= from this change. However, it seems
that if authors write that when they mean "&amp;", then their content will not
work as intended even under either the existing parsing rule or the proposed
new one. However, content that writes "&amp=" when it means "&amp;amp=" would
be fixed. There were instances of both in the data set.

In addition to the direct benefits of fixing content, this change would also
make it safer to change parsing rules to allow unescaped & in attributes.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Saturday, 27 March 2010 22:41:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 27 March 2010 22:41:15 GMT