Re: Bug 7034 from Sam Ruby on 2010-03-22 (public-html@w3.org from March 2010)

From: Sam Ruby <rubys@intertwingly.net>
Date: Mon, 22 Mar 2010 15:07:41 -0400
To: Philip Taylor <pjt47@cam.ac.uk>
CC: HTMLwg WG <public-html@w3.org>
Message-ID: <4BA7BFFD.7010804@intertwingly.net>

On 03/22/2010 02:37 PM, Philip Taylor wrote:
> Sam Ruby wrote:
>> I happen to believe that [...some other thing...]
>> would be far more useful to authors of content intended to be served
>> as text/html than flagging the use of unescaped ampersands in URIs is.
>
> I think one of the criteria for determining conformance rules is that it
> should be possible to give an exact definition of how to write
> conforming HTML documents, and the definition should be possible to
> understand and follow (e.g. it shouldn't be necessary to
> reverse-engineer the parsing algorithm).
>
> Another criteria is that markup which very likely indicates an authoring
> mistake and will result in unexpected behaviour, should be flagged as an
> error by syntax-checking tools in order to help authors write markup
> that works (and therefore it should be a document conformance error
> since that's the mechanism the spec uses to specify the behaviour of
> syntax-checking tools).
>
> Because of the second one, markup like
> <a href="create-file.php?name=a.txt&copy=b.txt">
> should be a conformance error (the author probably didn't intend
> "name=a.txt©=b.txt", and if they really did then they could have used
> "&copy;" instead).
>
> Currently the spec says (http://whatwg.org/html#syntax-attribute-value):
>
> "Attribute values are a mixture of text and character references, except
> with the additional restriction that the text cannot contain an
> ambiguous ampersand."
>
> "An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is
> followed by some text other than a space character, a U+003C LESS-THAN
> SIGN character (<), or another U+0026 AMPERSAND character (&)."
>
> If you do want to allow "...dfclick?db=sina&bid=8...", but don't want to
> allow "...&copy=...", then this description would need to be changed to
> something like:
>
> "An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is
> followed by one of the thousands of named character references in this
> table, unless it is one of the hundreds that don't end with ';' and it
> is subsequently followed by an alphanumeric character (unless it is
> "not" and it is followed by "in;")."
>
> which is much harder for authors to follow because they'll have to
> remember the list of thousands of strings to avoid.

By this reasoning the bug report needed would be to add [...some other 
thing...] as a conformance criteria.

http://intertwingly.net/stories/2010/03/21/www.google.cn#optional
http://intertwingly.net/stories/2010/03/21/www.nytimes.com#unmatched_close

- Sam Ruby

Received on Monday, 22 March 2010 19:08:16 UTC