When fed a page containing something like: <a href="http://finance.yahoo.com/q/bc?s=IBM&t=5y"> Amaya places into the PARSING.ERR file the message "line nnn, char nn: Unknown entity". This seems to be because it interprets the '&' within the query data to represent the beginning of an entity reference. When it hits the "=", it deems the entity reference to have been terminated and further deems "&t" to be unknown as an entity reference. I believe this behavior is incorrect. Referring to http://www.gbiv.com/protocols/uri/rfc/rfc3986.html I see: query = *( pchar / "/" / "?" ) where pchar = unreserved / pct-encoded / sub-delims / ":" / "@" and sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" Since the '&' (ampersand) is a amongst the sub-delims, I deduce that it does NOT need to be percent-encoded when used within the query portion of a URI. It seems to be common practice not to encode it. In addition, between the quotes of an href=, we are no longer dealing with HTML, where character entity references live, but with a URI. It therefore appears to me that Amaya should not look for entity references within URIs and should not issue the error message cited above. Chris BeallReceived on Wednesday, 1 March 2006 17:01:00 UTC
This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:30:50 UTC