'Unknown entity' messages within href= URLs containing query data

When fed a page containing something like:
<a href="http://finance.yahoo.com/q/bc?s=IBM&t=5y">
Amaya places into the PARSING.ERR file the message "line nnn, char nn:
Unknown entity".

This seems to be because it interprets the '&' within the query data to
represent the beginning of an entity reference.  When it hits the "=", it
deems the entity reference to have been terminated and further deems "&t" to
be unknown as an entity reference.

I believe this behavior is incorrect.  Referring to
http://www.gbiv.com/protocols/uri/rfc/rfc3986.html

I see:
query       = *( pchar / "/" / "?" )

where pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
and  sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
               / "*" / "+" / "," / ";" / "="

Since the '&' (ampersand) is a amongst the sub-delims, I deduce that it does
NOT need to be percent-encoded when used within the query portion of a URI.
It seems to be common practice not to encode it.

In addition, between the quotes of an href=, we are no longer dealing with
HTML, where character entity references live, but with a URI.

It therefore appears to me that Amaya should not look for entity references
within URIs and should not issue the error message cited above.

Chris Beall

Received on Wednesday, 1 March 2006 17:01:00 UTC