- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Mon, 22 Mar 2010 18:37:09 +0000
- To: Sam Ruby <rubys@intertwingly.net>
- CC: HTMLwg WG <public-html@w3.org>
Sam Ruby wrote: > I happen to believe that [...some other thing...] > would be far more useful to authors of > content intended to be served as text/html than flagging the use of > unescaped ampersands in URIs is. I think one of the criteria for determining conformance rules is that it should be possible to give an exact definition of how to write conforming HTML documents, and the definition should be possible to understand and follow (e.g. it shouldn't be necessary to reverse-engineer the parsing algorithm). Another criteria is that markup which very likely indicates an authoring mistake and will result in unexpected behaviour, should be flagged as an error by syntax-checking tools in order to help authors write markup that works (and therefore it should be a document conformance error since that's the mechanism the spec uses to specify the behaviour of syntax-checking tools). Because of the second one, markup like <a href="create-file.php?name=a.txt©=b.txt"> should be a conformance error (the author probably didn't intend "name=a.txt©=b.txt", and if they really did then they could have used "©" instead). Currently the spec says (http://whatwg.org/html#syntax-attribute-value): "Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand." "An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by some text other than a space character, a U+003C LESS-THAN SIGN character (<), or another U+0026 AMPERSAND character (&)." If you do want to allow "...dfclick?db=sina&bid=8...", but don't want to allow "...©=...", then this description would need to be changed to something like: "An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one of the thousands of named character references in this table, unless it is one of the hundreds that don't end with ';' and it is subsequently followed by an alphanumeric character (unless it is "not" and it is followed by "in;")." which is much harder for authors to follow because they'll have to remember the list of thousands of strings to avoid. -- Philip Taylor pjt47@cam.ac.uk
Received on Monday, 22 March 2010 18:37:39 UTC