RE: 'Unknown entity' messages within href= URLs containing query data

> -----Original Message-----
> From: www-amaya-request@w3.org [mailto:www-amaya-request@w3.org]On
> Behalf Of Chris Beall
> Sent: Wednesday, March 01, 2006 11:58 AM
> To: Amaya users
> Subject: 'Unknown entity' messages within href= URLs containing query
> data
>
>
>
> When fed a page containing something like:
> <a href="http://finance.yahoo.com/q/bc?s=IBM&t=5y">
> Amaya places into the PARSING.ERR file the message "line nnn, char nn:
> Unknown entity".
>
> This seems to be because it interprets the '&' within the query data to
> represent the beginning of an entity reference.  When it hits the "=", it
> deems the entity reference to have been terminated and further
> deems "&t" to
> be unknown as an entity reference.
>
> I believe this behavior is incorrect.  Referring to
> http://www.gbiv.com/protocols/uri/rfc/rfc3986.html
>
> I see:
> query       = *( pchar / "/" / "?" )
>
> where pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
> and  sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
>                / "*" / "+" / "," / ";" / "="
>
> Since the '&' (ampersand) is a amongst the sub-delims, I deduce
> that it does
> NOT need to be percent-encoded when used within the query portion
> of a URI.
> It seems to be common practice not to encode it.
>
> In addition, between the quotes of an href=, we are no longer dealing with
> HTML, where character entity references live, but with a URI.
>
> It therefore appears to me that Amaya should not look for entity
> references
> within URIs and should not issue the error message cited above.
>
> Chris Beall

Thanks to Dave Woolley for setting me straight on this.

In spite of the fact that the syntax I provided is VERY common in the wild,
Amaya is correct to flag it as an error.

See http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2.  The only
trick here is that the HTML spec refers to FORM submission and it may not be
obvious that whenever you put a URI containing query data into HTML, you
are, in effect, submitting a form, from the perspective of the server.

Chris Beall

Received on Wednesday, 1 March 2006 23:16:37 UTC