W3C home > Mailing lists > Public > www-validator@w3.org > May 2006

Re: Feed Validator : Parsing error in Atom [entity preceding closing tag]

From: Sam Ruby <rubys@intertwingly.net>
Date: Thu, 25 May 2006 07:39:00 -0400
Message-ID: <44759754.9010208@intertwingly.net>
To: www-validator@w3.org
CC: Neil Smith <Neil_Smith@hargreaveslansdown.co.uk>

David Dorward wrote:
> On Thu, May 25, 2006 at 11:15:27AM +0100, Neil Smith wrote:
>>When submitting a document in Atom format to the feed validator service 
>>Inclusion of an &amp; entity followed by a single character in the
>>range a-zA-Z only, before the closing <title /> element tag causes
>>the feed validator to report " EOF in middle of entity" :
> I'm not an expert on ATOM, but I believe this is what is happening:
> Your title element has a type attribute that specifies it contains
> HTML and so the text must have special characters represented by
> character references.
> This HTML is being represented in XML, so any special characters in
> the HTML source must also be represented as character entities.
> Thus: foo&bar in text becomes
>       foo&amp;bar in HTML and
>       foo&amp;amp; in XML encoded HTML
> You've only encoded the ampersand once, so are getting a warning.


>>Use of more than one alpha character after the &amp; entity does not
>>cause this error in the validator.  It should of course be
>>reasonable to end a title element in for example E&amp;O, or in our
>>case the abbreviation for a company, i.e A&amp;L
> I'm now entering the realm of guesswork, but I suspect that you can't
> have named entities with only one letter, so the parser knows that &O;
> isn't a real entity, but that &Ox; could be.

it seems that the parser doesn't like unclosed entites at the end of the
string.  If you have access to Python, you can experiment with the
following code:


text="Viridian results higher on Irish businessE&amp;O"

from HTMLParser import HTMLParser, HTMLParseError
from xml.sax.saxutils import unescape

  print 'ok'
except HTMLParseError, error:
  print error


> (I read the mailing list, please address responses there and do not CC
> me.)

OK ;-)

- Sam Ruby
Received on Thursday, 25 May 2006 11:39:43 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:48 UTC