- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Wed, 20 Apr 2005 07:58:26 +0300 (EEST)
- To: www-validator@w3.org
On Tue, 19 Apr 2005, David Dorward wrote: > I suggest the following be appended to the first paragraph of the > message: > > Session handling code in PHP is a common perpetrator of this error > as explained in <a > href="http://dorward.me.uk/www/php-sessions/ampersand/">Ampersands, > PHP Sessions and Valid HTML</a>, a document which also proposes a > number of solutions. As a general idea, it seems quite useful to add references to documents that discuss such technical problems. > I'd also welcome feedback on the document itself. In the first paragraph, you say: "Such characters cannot be simply typed into a document if you wish them to display - how could the user agent tell the difference between < (meaning start a new tag) and < (meaning a literal less than character)." The first part is incorrect for HTML: there are many situations where I can type "<" and have it displayed. The rhetoric part fails for the same reason: it could be a genuine question in an exam on SGML, and the correct answer is _not_ "you can't, ever". I would suggest something like the following: "Such characters cannot always be simply typed into a document if you wish them to display. For example, if you would like to show the mathematical expression b<a, you cannot type it as such, since browsers would take the <a part as starting a tag." Along the same lines, the following paragraph, too, says too much: "Ampersand characters used as argument separators pose no problem in plain old URLs, however in URLs encoded in HTML they still mean start of character reference." I would suggest saying "they might still start" instead of "they still mean start". In HTML, the ampersand may appear as such unless followed by a name character or by "#", and in a context like   the ampersand does not start a character reference. There's a common objection to "escaping" ampersands: they usually appear in URLs resulting from a form submission with method="get" or a similar operation, and in that case they contain things like id=42©=1 so that although the "&" is followed by a name character, the name is not terminated by a semicolon. Further confusion is caused by the XML rule that makes the semicolon mandatory. Maybe you could find a way to address this issue in a manner that does not confuse people too much. The point is that e.g. id=42©=1 is treated as id=42©=1 (with © replaced by the copyright sign) by most browsers, and this is the correct processing by HTML rules. And you might that there is a large number of predefined entity names in HTML and nobody wants to remember them by heart. At the end of the document you say "You appear to be using Internet Explorer or a browser based on its underlying engine. Please read my browser support page." I know this is not relevant to the topic at hand, but I still comment on it, since _you_ should know that too. It is foolish to ask a user read your browser support page, even if you ask politely, since it is quite irrelevant. How does it benefit me, or someone who tried to validate a page, checked the error message, and consulted your document, to learn - after reading the document - that there are some completely unspecified "Many rendering glitches" on the browser I use? -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Wednesday, 20 April 2005 04:58:29 UTC