- From: Laurent Carcone <Laurent.Carcone@inrialpes.fr>
- Date: Mon, 10 Dec 2001 16:53:56 +0100
- To: www-amaya@w3.org
> In-Reply-To: <20011030113041.EE974C@maiana.inrialpes.fr> > > I've come up with two related fixes for the expat parser, following your > suggestion, Irene, to trace the function EndOfAttributeValue () in the > module Amaya/amaya/Xml2thot.c. One issue appears to be a bug in expat, > so I'll present it first. The symptom is that multiple spaces are > incorrectly preserved in attribute values. > > This line of code is clearly intended to suppress multiple spaces, in > libwww\modules\expat\xmlparse\xmlparse.c line 2815, in function > appendAttributeValue(). The logic gets short-circuited as currently > written: > > if (!isCdata && (poolLength(pool) == 0 || poolLastChar(pool) == 0x20)) > > The second test above is skipped due to the expansion of the poolLength() > and poolLastChar() preprocessor #defines. So two additional sets of > parenteses are needed, as follows: > > if ((!isCdata && (poolLength(pool) == 0) || (poolLastChar(pool) == 0x20))) > > By adding the parenteses, multiple spaces are suppressed, as intended, when > parsing the following sample input: > > <meta name="description" content="Software design & > consulting for workstations, servers, & embedded firmware. > Systems programming, quality-crafted applications and > enhancements for Internet, Linux, Windows." /> > > If you would please test and verify this fix, then I'll post the related > fix for preserving linefeeds. Thank you. > > Regards, > Marc Hello Marc, Thanks for your contribution. I think that Expat is right according to the XML specification. The section 3.3.3 (Attribute-Value Normalization) says : " Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm. .... (algorithm)... If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character. All attributes for which no declaration has been read should be treated by a non-validating processor as if declared CDATA. " As Expat is a non-validating processor, he treats attributes as if they were declared as CDATA and preserves multiple spaces. In the other hand, the XHTML 1.0 specification says (section 4.7 Whitespace handling in attribute values) " In attribute values, user agents will strip leading and trailing whitespace from attribute values and map sequences of one or more whitespace characters (including line breaks) to a single inter-word space (an ASCII space character for western scripts). See Section 3.3.3 of [XML]. " That means that the user agent (Amaya) has particularly to suppress multiple whitespace characters in attributes values, that is not the case today. I will modify the code of Amaya to cope with this feature as soon as possible (and after the next release planned for the next week). Regards Laurent Carcone Amaya team
Received on Monday, 10 December 2001 10:54:08 UTC