- From: Sam Ruby <rubys@intertwingly.net>
- Date: Mon, 22 Mar 2010 15:07:41 -0400
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: HTMLwg WG <public-html@w3.org>
On 03/22/2010 02:37 PM, Philip Taylor wrote: > Sam Ruby wrote: >> I happen to believe that [...some other thing...] >> would be far more useful to authors of content intended to be served >> as text/html than flagging the use of unescaped ampersands in URIs is. > > I think one of the criteria for determining conformance rules is that it > should be possible to give an exact definition of how to write > conforming HTML documents, and the definition should be possible to > understand and follow (e.g. it shouldn't be necessary to > reverse-engineer the parsing algorithm). > > Another criteria is that markup which very likely indicates an authoring > mistake and will result in unexpected behaviour, should be flagged as an > error by syntax-checking tools in order to help authors write markup > that works (and therefore it should be a document conformance error > since that's the mechanism the spec uses to specify the behaviour of > syntax-checking tools). > > Because of the second one, markup like > <a href="create-file.php?name=a.txt©=b.txt"> > should be a conformance error (the author probably didn't intend > "name=a.txt©=b.txt", and if they really did then they could have used > "©" instead). > > Currently the spec says (http://whatwg.org/html#syntax-attribute-value): > > "Attribute values are a mixture of text and character references, except > with the additional restriction that the text cannot contain an > ambiguous ampersand." > > "An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is > followed by some text other than a space character, a U+003C LESS-THAN > SIGN character (<), or another U+0026 AMPERSAND character (&)." > > If you do want to allow "...dfclick?db=sina&bid=8...", but don't want to > allow "...©=...", then this description would need to be changed to > something like: > > "An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is > followed by one of the thousands of named character references in this > table, unless it is one of the hundreds that don't end with ';' and it > is subsequently followed by an alphanumeric character (unless it is > "not" and it is followed by "in;")." > > which is much harder for authors to follow because they'll have to > remember the list of thousands of strings to avoid. By this reasoning the bug report needed would be to add [...some other thing...] as a conformance criteria. http://intertwingly.net/stories/2010/03/21/www.google.cn#optional http://intertwingly.net/stories/2010/03/21/www.nytimes.com#unmatched_close - Sam Ruby
Received on Monday, 22 March 2010 19:08:16 UTC