- From: Daniel W. Connolly <connolly@hal.com>
- Date: Thu, 09 Feb 1995 12:54:24 -0600
- To: html-wg@oclc.org, uri@bunyip.com
- Cc: www-talk@info.cern.ch
There's an unfortunate interaction between the x-www-urlencoded syntax for form data submission and SGML attribute value literal syntax. This came up shortly after I started running the validation service, and I thought we had discussed the problem, but it seems to be getting worse, and not better. An example of the problem: Given this document: =============================================== <!doctype html public "-//IETF//DTD HTML//EN"> <title>testing & in HREF</title> <p>Here we go: <a href="http://foo.org/cgi-bin/do-something.pl?x=a&y=b">link</a> =============================================== Trying to validate it yields: =============================================== connolly@ulua ../connolly[1114] html-validate test.html sgmls: SGML error at test.html, line 5 at "y": No declaration for entity "y"; reference ignored =============================================== Section 7.9.3 "Attribute Value Specification" of the SGML standard says: An attribute value literal is interpreted as an attribute value by replacing references within it, ignoring Ee and RS, and replacing an RE or SEPCHAR with a SPACE. So the attribute value literal: "http://foo.org/cgi-bin/do-something.pl?x=a&y=b" has an error it it: &y references an undeclared entity. This should definitely go in as a NOTE: or something in the HTML spec, and perhaps it's worth mentioning in the URL spec (though that's stretching it). There are a couple ways to represent the string: http://foo.org/cgi-bin/do-something.pl?x=a&y=b as an attribute value literal: "http://foo.org/cgi-bin/do-something.pl?x=a&y=b" "http://foo.org/cgi-bin/do-something.pl?x=a"y=b" but neither of those is interpreted correctly by existing browsers. In the interest of interoperability, I'd like to move toward using ';' rather than (or in addition to) '&' to separate form name/value pairs. That way, the URL for this query can be: http://foo.org/cgi-bin/do-something.pl?x=a;y=b You can put this in an HTML document by writing: HREF="http://foo.org/cgi-bin/do-something.pl?x=a;y=b" A quick check through the Mosaic 2.4 source code shows that a ';' characetr in an input field _will_ be %xx-ified, so this doesn't introduce any ambiguity. The way to start the transition is to enhance cgi scripts to support separating form values by ';' as well as '&'. Then folks that want to validate their HTML can change '&' to ';' in their HREF attributes. But folks will continue to copy-and-paste these form query URLs into their HTML without quoting the '&' chars. So eventually, browsers should start using ';' in the form encoding process in the first place (as well as supporting " inside attribute values!), and then the issue will go away. There's something of a chicken-and-egg problem here: who will support the first browser to use ';' rather than '&' to encode form stuff? That won't happen until the vast majority of CGI scripts have been enhanced to support it. And that might not won't happen until folks that want to validate their HTML start complaining. But it's a really cheap fix on the CGI side, no? I keep seeing more and more use of '&'to separate stuff in URLs, and while this is really just a bug in the attribute value parsing in the browsers, at can be avoided by using ';' in stead (or in addition). Dan
Received on Thursday, 9 February 1995 14:06:24 UTC