Re: How should URL attribute values be parsed?

Christopher R. Maden (crm@ebt.com)
Mon, 26 Aug 1996 14:37:53 GMT


From: "Christopher R. Maden" <crm@ebt.com>
Date: Mon, 26 Aug 1996 14:37:53 GMT
Message-Id: <199608261437.OAA15878@phaser.EBT.COM>
To: www-html@w3.org
In-reply-to: <01I8O4W5BZB6000R4L@SCI.WFBR.EDU> (message from Foteos Macrides on Sat, 24 Aug 1996 22:28:19 -0500 (EST))
Subject: Re: How should URL attribute values be parsed?

Foteos Macrides writes about URL attributes.

There is a great deal of confusion about the difference between an
"attribute value specification" (AVS, for this mail message), and an
"attribute value" (AV).

A marked-up document contains only AVSs.  After, and only after, the
AVS has been parsed, is an AV arrived at, and passed by the parser to
the application.

All AVSs are replaceable character data (RCD), in which entity
references should be resolved.

In RCD, any ampersand followed by a name start character is an entity
reference.  If it is not a reference to a defined entity, it is an
error.

After resolution of any entities, including the handling of any
errors, an AV is arrived at, and passed to the application.

Practical application:

In an HTML document, the string in quotes following the string "href="
is an AVS.  Any entity references should be recognized and resolved.
Any string of &[a-zA-Z] that is not a known entity reference is an
error.  Preferably, the unknown reference should be kept as data.

After the resolution of entities, the AV is passed to the application,
e.g., Lynx.  This AV should be a valid URL, in the case of the <a
href=""> attribute.  The AVS does *not* have to be a valid URL.

In a URL, i.e., in the resolved AV, URL-significant characters should
be hex-escaped.

In the href= attribute, i.e., in the AVS, SGML-significant characters
should be entity-escaped.

If I have a script on a DOS-based server called moe&larry, I need to
hex-escape the ampersand, because it is a literal, not a semantic
character:

<URL:http://www.mycom.com/cgi-bin/moe%26larry>

If I want to pass that script arguments of guitar=fender and amp=g-k,
I do *not* escape the ampersand, because it is a semantic character in
the URL:

<URL:http://www.mycom.com/cgi-bin/moe%26larry?guitar=fender&amp=g-k>

If I want to encode this URL as a CDATA attribute in an SGML document,
say, as the href attribute of an <a> element in an HTML document, I
must turn the ampersand into an entity reference, so that the AVS,
*after* parsing and resolution to an AV, will correspond to the same
URL.

<a
href="http://www.mycom.com/cgi-bin/moe%26larry?guitar=fender&amp;amp=g-k>

A proper HTML parser will turn this AVS into an AV - a URL:

http://www.mycom.com/cgi-bin/moe%26larry?guitar=fender&amp=g-k

When the link is selected, an HTTP connection will be established with
the server "www.mycom.com", and the HTTP request sent:

GET /cgi-bin/moe%26larry?guitar=fender&amp=g-k HTTP/1.0

The server will run the script "moe&larry", and pass it the parameter
string "guitar=fender&amp=g-k".  Most CGI scripts will use the & to
separate parameters.

I know this explanation was pedantic, but I hope it helped at least
one person understand the relationship between URLs and the href
attribute.

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//GCA//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//EBT//NONSGML Christopher R. Maden//EN" SYSTEM
"<URL>http://www.ebt.com <TEL>+1.401.421.9550 <FAX>+1.401.521.2030
<USMAIL>One Richmond Square, Providence, RI 02906 USA" NDATA SGML.Geek>